Measuring a Quantum System’s Classical Information

In the governing thought, I find an equivalence between the classical information in a quantum system and the integral of that system’s energy and time, specifically , in natural units. I solve this relationship in four ways: the first approach starts with the Schrödinger Equation and applies the Minkowski transformation; the second uses the Canonical commutation relation; the third through Gabor’s analysis of the time-frequency plane and Heisenberg’s uncertainty principle; and lastly by quantizing Brownian motion within the Bernoulli process and applying the Gaussian channel capacity. In support I give two examples of quantum systems that follow the governing thought: namely the Gaussian wave packet and the electron spin. I conclude with comments on the discretization of space and the information content of a degree of freedom.


Introduction
Anyone watching the media these days, especially the business news, knows about Big Data. The rate of data generation is growing exponentially and storage is due to multiply by 50 times between 2010 and 2020 [1]. The importance of the digital world is pronounced in almost every industry and every field of science [2]. It is not surprising that information is also important as a physical quantity in physics [3,4].
On that front, physics is challenged with many open questions including "five great problems" that would continue the march toward more knowledge [5]. I intend to contribute new insight vis-a-vis the governing thought that information equals energy times time and focus its application on two well-studied systems, the Gaussian wave packet and the electron.
While a base knowledge in information theory and physics is assumed, the arguments and derivations are intended to flow naturally from current understandings leading to new theory. I find the simplicity of the mathematics gives reason for special consideration. To show this elegance and resilient, the paper derives the governing thought in four different proofs, then goes through two examples where the governing thought applies, and lastly ends with a couple notes in the appendix.
While the proper description of the word "information" is closer to self-information or entropy as developed by Shannon [6,7], I have chosen to use the word "information" to emphasize that a complete statistical description of a measurement of nature can be described with classical bits of information.
Shannon showed that self-information is formally defined as the negative expected log probability,  , and has meaning with both discrete and continuous probability distributions. It can even be derived from the definition of statistical entropy, ( ) log Ω , as given by Boltzmann [8]. The natural logarithm is used in this analysis and thus the units of information are the natural unit, or nats [9]. Energy times time, or more loosely "action", has been a valuable concept in physics for over 2 centuries. It not only survived the migration from classical mechanics to quantum mechanics but rather it thrived with the realization that it was quantized [10]. We will start our investigation, into how information and energy times time are equivalent, here with quantum mechanics and argue that information is one in the same as energy times time.

Schrödinger Equation
The Schrödinger equation, found during the advent of quantum mechanics, dictates how a wave function and its phase evolve through time. The Hamiltonian or energy operator, H, of a system is equal to hbar times the imaginary derivative with respect to time; with the operator's eigenvalue, the energy,  , of the system [10].
The solution to this equation is the complex exponential, One can calculate the probability distribution associated with this wave function via its magnitude squared [10].
Note the phase information is lost. Calculating the information without considering the phase information one would conclude that the information is constant and a function only of its initial state, However if we dig a little deeper an insight appears that when looked at in a few different ways proves resilient.
If the probability is constant, then the size of the space is equal to one over the probability, 1 p Ω = . In this case S is the thermodynamic entropy [8], By being a little more formal we can relax the condition that p is constant and instead only need to have . Consider a large number N of independent steps in the particle, where In this case, the probability of ( ) p t is equal to ( ) d p t raised to the Nth power. I next use the weak law of large numbers through the asymptotic equipartition property (AEP) to focus on the most likely states [9]. (This is important and exemplified by the Gaussian distribution, where the Gaussian has infinite range but most likely states are limited to its standard deviation. If a large number of possible outcomes can occur each with the same probability within the range t ∆ ∆  of a system, then one can prove that information encoded into that system is equal to the negative log of the probability on average.) In this case The AEP and the weak law of large numbers [9] can be used to show the negative log probability approaches the incremental entropy. Calling this the differential information, dI, I have

Dueling Information Rates
The insight comes by breaking up ( ) ( ) There are two information rates that cancel each other out, one equal to the imaginary energy dived by hbar and the other a negative imaginary energy divided by hbar.
If a Minkowski transformation had been performed prior to calculating the probability distribution a different answer would result. The Minkowski transformation takes imaginary time and makes it real. We see this transformation appear in relativity and analytic continuation [11,12]. After applying the Minkowski transforma- This last equation is the governing thought of the paper. Assuming that the mass energy is not a function of time (which is not always the case), the following simple expression results,

Non-Commuting Operators
It is possible to remove the dependency on the Minkowski transformation and arrive at the same result by replacing the energy eigenvalue  , with the energy op-erator. When this is the case, the complex conjugate operation also requires the transpose of the operators since they do not commute. Using the power rule to expand the exponent and the logarithm, I now have, with H the energy operator, ), the negative log probability approaches the information,

Signals
The initial conclusion (before I introduced imaginary time or the commutator) was that there is no information contained in the phase of a wave function; however we know from our analysis of signals that sine and cosine waves are capable of transmitting information in their phases. There are differences between the phase of the wave function and the phase of a signal [4] but it is worthwhile to pursue this approach as well.
Work by Nyquist and Hartley after the turn of the 20th century [13,14] tells us that the bandwidth of the signals is in direct proportion to the width of the signal in the frequency domain.
Gabor was even closer on track in the middle of 20th century when he tiled the time-frequency plane with quantized "logons" on information [15]-see Figure 1. Each logon was one degree of freedom and is represented by a shifted and modulated Gaussian wave packet. These wave packets were used as a basis to represent a signal with bandwidth f and duration t.
A more rigorous analysis of the number of degrees of freedom of a signal limited in bandwidth and time can be found by Slepian, Pollak and Landau [16][17][18].
They concluded the rate of information that can be encoded into a signal is linear in the bandwidth (or frequency). With Planck's work on black body radiation and Einstein's equation for the photo electric effect, where hf =  [10], this proportion reduces to below where d d I t is the information rate of the signal, f is the bandwidth of the signal,  is Planck's constant times f and t is the duration of the signal.
d d From here I can quickly return to the governing thought a third time by solving for the direct proportion in the equation above by diving by the minimum width of the signal (the Heisenberg uncertainty relation).
In Section 5.1, I show that the Gaussian wave packet, which obtains the minimum uncertainty, contains one natural unit of information, thus completing this proof.
An important insight to interpreting this equation is seen by again returning to Figure 1 and looking at the time-frequency plane as a Venn diagram of entropy.

Brownian Motion
Before the two examples, I will re-derive the governing thought yet a 4th way by discretizing space and motion.
Building on the analysis by Kubo on the fluctuation dissipation theorem [19], I formalize the 2 time constants for a diffusing free particle; the collision time, δt and the relaxation time, τ . When the relaxation time is equal to the thermal time, [19][20][21][22] and spatial variance is

Bernoulli Process
Introducing the Bernoulli process as reviewed by Reif and Chandrasekhar [8,23], one can solve for the step size, δt , (or the collision time). The contribution to the spatial variance is balanced between drift and diffusion; when the probability parameter is 1/2 the variance is, Here δx is the spatial step size, K is the number of steps, δ t K t = ⋅ is the duration of the process and is the variance in velocity. From Dirac, we know that δ δ x c t = ⋅ [24] which allows us to calculate . When K is large, the average variance of the sum of K samples of a distribution is equal to the variance of the individual sample divided by K Thus when the relaxation time is equal to one over twice the temperature 2 B k T τ =  , the collision time is one over twice the energy δ 2 t =   , and visa versa.

Information Content
With the details of the Bernoulli process defined, we can move onto the Gaussian channel. Combined with the Shannon-Nyquist's sampling theorem one has the channel capacity per second, C′ [9], P is the signal power, 0 2 N is the noise spectral density and W is the bandwidth of the channel. (In this case, the channel is the vacuum which either has infinite bandwidth or some very large value.) Using the assumption (aided by the insight of appendix A1) that the signal spectral density is equal to the noise spectral density, the signal power, P , is the noise spectral density times twice the bandwidth of the signal, Since the bandwidth of the signal is much smaller than the bandwidth of the channel, w W ∆  , we can re-write the equation above as, The signal is the location of the particle performing the Bernoulli process with a step size of δt , thus the Shannon-Nyquist sampling theorem [25] tells us that the maximum frequency that can be represented by the discrete Bernoulli process is 1 2δ To finish the derivation I will take one more finding from Dirac, who showed that there is both a positive and negative solution to the energy eigenvalue [24]. Because there are two independent particles diffusing and information is generated by each particle a factor of 2 must be included; returning us to the governing thought.

Gaussian Wave Packet
The Gaussian wave packet has many special properties, including 1) its Fourier transform is also a Gaussian [25], 2) the Gaussian obtains the minimum uncertainty relation [12], and 3) the Gaussian maximizes the differential entropy for a given variance [9]. As introduced above, Gabor [15] used the first two properties to tile the time frequency plane with shifted and modulated Gaussian wave packets. I will expand on property 3) and show that the information that can be decoded from one Gaussian wave packet is one nat. First a result from Hirshman [26] where he proposed that to properly measure the information contained in a pair of distributions linked through the Fourier Transform (FT) one must add the differential entropy of the probability distribution in the time domain to the differential entropy of the probability distribution in the standard frequency domain. Given the scale property of the FT and the differential entropy, the sum of the two differential entropies is constant regardless of scale factor. Thus the information you can encode into a Gaussian wave packet is the same regardless of the relative width of the wave packet in the two domains.
Hirshman found that any FT pair contained at least ( ) log e 2 of information and that the Gaussian has exactly ( ) log e 2 . I believe Hirshman missed an extra ( ) log 2 which nature requires. Looking at the governing thought, and applying the Heisenberg uncertainty principle when the energy is not a function of the time, shows that information in nature is greater than or equal to 1 natural unit.
To show how this applies to the Gaussian, I again use the example of a massive particle. Dirac tells us from his work on the relativistic wave equation that there is both a positive and negative Eigen state [24]. Looking at the positive eigenvalue where the mass-energy divided by Planck's constant equals the average frequency the two functions don't overlap, they don't interfere and thus according to Feynman [10] it's their probability distributions that add not the probability amplitudes (or wave functions). See Figure 2. The resulting probability distribution for the frequency domain Taking the inverse FT we have  Coupling this back to Gabor's original analysis, I would conclude that a measurement of the Gaussian wave packet requires one natural unit of classical information to describe. This finite amount is due to the inherent noise associated with the Heisenberg uncertainty principle.

Electron Spin State
One might jump to the conclusion that a spin 1/2 particle has one natural unit of information by taking our governing thought and applying it to the spin angular momentum. However this is not correct as spin is quantized to 2  along of each of the three spatial dimensions. Associating one natural unit of information to each spa- tial dimension for a total of 3 natural units is also not correct since the measurements are not independent. The way to tackle this problem is by looking at the three spatial dimensions and the time dimension, then using our governing thought on the magnitude of the all four spin operators.
Let's start with the mathematical formulism to deal with the spin operator for the time dimension. This operator maps the wave function at the instantaneous moment in time to the discrete value δ Sticking with the spin operator formulism, we need a 2 2 × matrix that has unity eigenvalues and returns the wave function untouched (since we are simply mapping to a discrete time value but not touching any of the spatial dimensions). You can see I have identified this Pauli matrix as the identity matrix. The spin operator t S associated with this identity Pauli matrix is, S is now a fourth spin operator similar to x S , y S , and z S . One implication to adding the identity matrix to the formulism is that the magnitude of the spin angular momentum now takes on a more simple form with s the quantum spin number. Adding to the standard way of calculating S [12] we have,  Now applying the governing thought to angular momentum, I have the information in the spin of a particle equal to twice the magnitude divided by hbar. Let's see how that plays out using our current understanding of quantum information theory.
We know from Schumacher and Westmoreland [4], that the probability of error in inferring a message from a quantum measurement is at least one minus the dimension of the Hilbert space divided by the number of distinct messages, Y, We find this channel capacity is equal to ( ) log 2 when the Hilbert space is a qubit and we choose to send spins in only the 0 , or 1 state. In this case ( ) and Error P can be zero. However nature does not just produce electrons in only the 0 , or 1 state. Sending an electron in one of only two states is a human choice and filtering or initialization is required. For nature to maintain symmetry and balance, the state must have a uniform distribution around the Bloch sphere. Thus the arbitrary state ξ is created We also know that if we classically measure in an arbitrary direction, the wave function collapses to the state defined by that outcome [4]. This means that if we make a measurement in one direction with zero variance, any other non-commuting observable will have maximum variance.
However there is nothing stopping us from calculating the entropy we would expect if a measurement were to happen (even though don't make the measurement as that would collapse the state). Here I use the word entropy instead of information since a measurement of a state with Error 0 P > will produce a partially random Boolean output that is not completely deterministic from knowledge of the initial state. Yet the term information is still relevant since it takes that much information to describe the outcome.
In this way we can add the entropy in each non-commuting observable in the same way Hirshman showed us.
For ξ to be uniformly distributed across the Bloch sphere we need to look at the Jacobian between spatial and spherical coordinates to seek how θ and φ are distributed. We find the determinant of the Jacobian is equal to ( ) sin θ , which means that φ is uniformly distributed but θ has the distribution ( ) ( ) ( )  S , and z S , one gets the same answer. The entropy of the t S operator takes different reasoning to calculate, but the answer is the same. To start we need to review Section 4.1 on the Bernoulli process. If t S acts to confirm a particle is occupying the time, δ K t K t = ⋅ , where K is the step index and δt is the step size, the probability of a positive confirmation is equal to the relative distance the instantaneous time, t is away from K t . A negative confirmation would mean that the particle is found in the state Figure 3 is a picture of a particle at time t uniformly distributed between . Finding the particle in the K t state is equal to K P and finding the particle in the

OPEN ACCESS
It is interesting to note that ξ is described by 2 degrees of freedom. One real number from θ and one real number from φ . Thus we further support the idea that one degree of freedom is associated with one natural unit.

Discussion and Conclusion
The questions that I addressed are, "how much information is in an evolving system, how does one quantify it, and how much classical information is needed to describe it?" You might have noticed that in some cases above I used the term rate, while in other cases I used capacity. We know from Shannon's noisy-channel coding theorem that the rate must be less than the capacity for the probability of error of decoding a message to go to zero; and conversely that if the rate is greater than the capacity, an arbitrary small probability of error is not achievable [6]. However from my analysis it appears that in nature the rate of the underlying particles equals the capacity of the channel that transmits those particles through space-time; which in turn equals the energy of the particle. This insight is another reason why I am using the word information to describe . I propose that the quantum state and the associated degree of freedom keep an extremely high (or possibly infinitly high) precision of its value. Yet when the wavefunction collapses and a classical measurement occurs, the capacity of that channel is only one natural until information per degree of freedom and the entropy rate of that measurement is such that one natural unit of information is needed to describe any measurement process you could ask of it per degree of freedom.
Thus, I find that precision to the infinite decimal point in the classical measurement is neither required nor possible since the classical information is finite. It thus makes sense that space is quantized. Planck showed energy is quantized [10]; Quantum Mechanics showed "action" is quantized; and from the analysis here a finite system is described by a finite amount of information; there is just not enough information in nature (or too much noise) to localize a particle to a continuous value that is not part of a finite set of values.
A broader debate is necessary to understand the physicality of classical information describing a quantum system, since its implications can be seen on both tangible qualities like the entanglement of quantum states and intangible qualities like the collaspe of the wavefunction. Yet, one advantage of knowing the governing thought is that you are able to use information theoretic tools to solve physics problems and vice versa. For example, the principle of least action (which is fundamental to mechanics) can be now looked at as a principle of least information.
There is much more work to do in these areas including application to the unsolved problems of physics [5], or for that matter other unknown unsolved problems. Still having this insight into information is a good start. 2m The signal power is the time derivate of the energy, P E =  . As the signal is wiped-out for frequencies higher than one over twice the relaxation time, 1 This is purely reactive power (as one would expect from an un-damped harmonic oscillator). But for the purposes here, one will recognize the apparent power, or magnitude of the complex power, (apart from the resistive factor) as the Johnson-Nyquist noise spectral density when spread over both positive and negative frequencies [27]. Thus we support the assumption in Section 4.2.

A2. Thermodynamic Derivation
Having in appendix A1, introduced the harmonic oscillator with quantum ground state energy equal to the temperature, it is straightforward to show that, for this example, the governing thought also applies within thermodynamics. Reviewing the power, P , we have Since the power is reactive (imaginary), there is no work done and from the first law of thermodynamics [8] E ∆ will equal the heat, Q . Our expression of for thermodynamic entropy is now S ∆ ( )  [11,12], Haller derives the entropy rate of thermal diffusion for one particle in [21]; here I will use a quicker derivation for both the positive and negative energy states and use the Gaussian channel capacity. The signal in this case is the width of the diffusing particle undergoing the Bernoulli process for one step, δt ,