_{1}

After a historical reconstruction of the main Boltzmann’s ideas on mechanical statistics, a discrete version of Boltzmann’s H-theorem is proved, by using basic concepts of information theory. Namely, H-theorem follows from the central limit theorem, acting inside a closed physical system, and from the maximum entropy law for normal probability distributions, which is a consequence of Kullback-Leibler entropic divergence positivity. Finally, the relevance of discreteness and probability, for a deep comprehension of the relationship between physical and informational entropy, is analyzed and discussed in the light of new perspectives emerging in computational genomics.

Entropy appeared in physics, in the context of thermodynamics. However, after Boltzmann, it became a crucial concept for the statistical mechanics and for the whole physics. Since Shannon’s paper of 1948 [

In this paper, we prove a discrete version of the H-Theorem by using basic physical and informational concepts. Namely, Boltzmann’s difficulties to prove the H-Theorem are due to the information-theoretic nature of this theorem, more than fifty years before the foundation of Information Theory.

The present paper is mainly of methodological nature. In this perspective any discipline based on information-theoretic concepts may benefit from the analy- sis developed in the paper as it concerns with the link between the physical and the informational perspectives of entropy.

In 1824, Sadi Carnot wrote a book about the power of fire in producing energy [

Q 1 T 1 = Q 2 T 2 (1)

therefore, if we denote by S the heat quantity Q 1 when T 1 is the unitary temperature and T 2 = T we obtain:

S = Q / T . (2)

S corresponds to a thermodynamical quantity, later called by Clausius [

The famous formulation of the second law of thermodynamics, by means of entropy S , asserts that in a closed system (that does not exchange energy with the external environment) the entropy cannot decrease in time.

In this scenario, in 1870s years, Ludwig Boltzmann started a research aimed at explaining the second law of thermodynamics in terms of Newtonian mechanics [

The first step of Boltzmann’s project was a mechanical formulation of entropy. This formulation can be found by starting from the fundamental law of ideal gases, where P is the pressure, V the volume, T the absolute (Kelvin) temperature, N the number of gas moles, and R is the gas constant:

P V = N R T . (3)

If we pass from the gas moles N to the number of molecules n in the gas (by the relation N a = n where a is the Avogadro constant), we get an equi- valent formulation, where k = R a is now called the Boltzmann constant:

P V = n k T . (4)

Now, let us assume that the gas takes some heat by expanding from a volume V 1 to a volume V 2 . Then, the quantity Q of this heat is given by:

Q = ∫ V 1 V 2 P d v (5)

and by expressing P according to Equation (3), we get:

Q = ∫ V 1 V 2 n k T / V d v = n k T ∫ V 1 V 2 1 / V d v = n k T ( ln V 2 − ln V 1 ) . (6)

Let assume to start from a unitary volume V 0 = 1 . If in Equation (6) V 1 = V 0 , V = V 2 and T is moved to the left member, then we obtain:

Q / T = n k ln V (7)

that, according to Carnot’s equation (2), gives:

S = n k ln V (8)

that is:

S = k ln V n (9)

where V n expresses the number of possible ways of allocating the n mole- cules in V volume cells. The passage from constant R to constant k and from N moles to n molecules, accordingly, is crucial to the microscopic reading of the formula (very often this is not adequately stressed when Boltzmann’s argument is analyzed).

We can assume that the gas is spatially homogeneous, that is, the same number of molecules are in any volume cell, so that spatial positions of cells do not matter. Therefore, an allocation of n molecules in V volume cells, is completely determined by the (molecule) velocities allocated in each volume cell , apart from multiplicative constants (factorials expressing the indiscernibility of molecules and cells), which in logarithmic terms are additive constants that are omitted. In conclusion, velocities partition the n molecules of gas in a number of different velocity classes: the intervals v 1 ± Δ , v 2 ± Δ , ⋯ , v m ± Δ (with Δ a fixed increment value, and n 1 , n 2 , ⋯ , n m the numbers of molecules having velocities in the v 1 , v 2 , ⋯ , v m centered intervals, respectively). Hence, all the number of allocations of n molecules in the volume V corresponds to the number of different ways W that n molecules can be distributed into m different velocity classes (the number of different micro-states associated to a given thermo- dynamic macro-state). Whence, the final famous equation is obtained:

S = k ln W . (10)

Equation (10) is reported in Boltzmann’s tomb in Wien. In this form, the equation was later given by Max Planck, who followed Boltzmann’s approach in his famous conference on December 14, 1900, from which Quantum Theory emerged [

The computation of W was obtained from Boltzmann by means of the so called Maxwell-Boltzmann statistics, related to the “Wahrscheinlichkeit” prin- ciple, and can be described by the following deduction. Namely, Let us consider all the molecules within the same velocity class as undistinguishable, then the different distributions of molecules into the m velocity classes are given by the following multinomial expression:

W = n ! n 1 ! n 2 ! ⋯ n m ! (11)

therefore, by Equation (10):

S = k ln n ! n 1 ! n 2 ! ⋯ n m ! (12)

that is, by using Stirling approximation n ! ≈ 2 π n n n / e n [

S = k n ln n − k ( n 1 ln n 1 + n 2 ln n 2 + ⋯ n m ln n m )

if, for i = 1 , ⋯ , m , we express n i by means of p i = n i / n , that is, n i = n p i , then we get (remember that p 1 + p 2 + ⋯ + p m = 1 ):

S = k n ln n − k ( n p 1 ln n p 1 + n p 2 ln n p 2 + ⋯ + n p m ln n p m )

S = k n ln n − k n [ p 1 ( ln n + ln p 1 ) + p 2 ( ln n + ln p 2 ) + ⋯ + p m ( ln n + ln p m ) ]

S = k n ln n − k n ln n [ p 1 + p 2 − + ⋯ + p m ] − k n ( p 1 ln p 1 + p 2 ln p 2 + ⋯ + p m ln p m )

S = − k n ∑ i = 1 m p i ln p i .

A discrete version of function H can be defined as:

H = ∑ i = 1 m n i ln n i (13)

hence the equations above show that, apart from additive and multiplicative constants, S = − H , therefore the second law of thermodynamics, asserting the non-decrease of entropy (for closed systems) is equivalent to the law of non- increasing H function.

This was the first time that probability become an essential conceptual framework for physical representations. Surely, Maxwell’s anticipations on gas velocity distribution and on the microscopic interpretation of temperature in terms of average velocity [

If we consider the following informational entropy H S introduced by Shannon [

H S = − ∑ i = 1 m p i lg p i (14)

then, it results from equations above that H S coincides with the thermody- namical entropy apart from a multiplicative constant.

The next Boltzmann’s result was the “H-Theorem”, which he tried to prove since 1872. However, all his attempts of finding a satisfactory proof of this theorem were not completely successful. The reason of these failures is the information-theoretic character of H-Theorem. Therefore, we need to enter in this theory that started in 1948 with the famous Shannon’s paper [

Shannon’s idea was entirely probabilistic. In fact, we can measure the infor- mation of an event e by means of a function F ( p e ) of its probability p e , because the more rare an event is, the more we gain information when it occurs. Moreover, this function has to be additive for events that are independent, that is:

I ( e 1 , e 2 ) = I ( e 1 ) + I ( e 2 ) (15)

if p e 1 , p e 2 are the respective probabilities of e 1 , e 2 , this means that the function F ( p e ) = I ( e ) has to verify:

F ( p e 1 × p e 2 ) = F ( p e 1 ) + F ( p e 2 )

and the simplest functions satisfying this requirement are the logarithmic functions. Therefore, if we use the logarithmic in base 2, denoted by lg , as it is customary in Information Theory, then we can define the information of an event e with probability p e as:

I ( e ) = − lg ( p e ) . (16)

With this probabilistic notion of information, given a discrete probability distribution p = { p i } i = 1 , k , also called by Shannon an information source (a set where a probabilities are associated to its elements), information entropy ex- presses the probabilistic mean of the information quantities associated to the events { e i } i = 1 , k , with respect to the probability distribution p .

An anecdote reports that when Shannon asked John von Neumann to suggest him a name for the quantity S , then von Neumann promptly answered: “Entropy. This is just entropy”, by adding that with this name the success of information theory was quite sure, because only few men knew exactly what entropy was.

Another remarkable fact is the paradoxical nature of entropy, due to its intrinsic probabilistic nature. This paradox is apparent at the beginning of Shannon’s paper [

A concept related to entropy is the entropic divergence between two probability distributions p , q :

D ( p ‖ q ) = ∑ x ∈ X p ( x ) lg ( p ( x ) / q ( x ) ) . (17)

The following proposition (see [

Proposition 1. D ( p ‖ q ) ≥ 0 for any two probability distributions.

In an ideal gas the collisions between molecules are elastic, that is, the sum of the kinetic energies of two colliding molecules does not change after collision.

Proposition 2. In an isolated ideal gas the variance of velocity distributions is constant in time.

Proof. If the gas is isolated then its temperature is constant, and as Maxwell proved [

Let N ( x ) denote the normal probability distribution N ( x ) = 1 2 π σ e − x 2 2 σ 2 ,

and S ( f ) be the Shannon entropy of a probability distribution f (here D ( f ‖ N ) is the continuous version of Kullback-Leibler divergence). The fol- lowing propositions are proven in [

Proposition 3.

S ( N ) = 1 2 ln ( 2 π e σ 2 ) (18)

Proof. ( [

S ( N ) = − ∫ − ∞ + ∞ N ( x ) ln N ( x ) d x = ∫ − ∞ + ∞ − N ( x ) ln e − x 2 2 σ 2 2 π σ 2 d x = ∫ − ∞ + ∞ − N ( x ) [ − x 2 2 σ 2 − ln 2 π σ 2 ] d x = ∫ − ∞ + ∞ N ( x ) x 2 2 σ 2 d x + ln 2 π σ 2 ∫ − ∞ + ∞ N ( x ) d x = E ( x 2 ) 2 σ 2 + ln 2 π σ 2 × 1 = 1 2 + 1 2 l n 2 π σ 2 = 1 2 l n 2 πe σ 2 .

Proposition 4. Normal distributions are those for which entropy reaches the maximum value in the class of probability distributions having the same variance.

Proof. ( [

D ( f ‖ N ) = ∫ − ∞ + ∞ f ( x ) ln f ( x ) N ( x ) d x

D ( f ‖ N ) = ∫ − ∞ + ∞ f ( x ) ln f ( x ) d x − ∫ − ∞ + ∞ f ( x ) ln N ( x ) d x

D ( f ‖ N ) = ∫ − ∞ + ∞ f ( x ) ln f ( x ) d x − ∫ − ∞ + ∞ f ( x ) ln e − x 2 2 σ 2 2 π σ 2 d x

D ( f ‖ N ) = − S ( f ) − ∫ − ∞ + ∞ f ( x ) ln e − x 2 2 σ 2 d x + ln 2 π σ 2 ∫ − ∞ + ∞ f ( x ) d x

D ( f ‖ N ) = − S ( f ) + 1 2 σ 2 ∫ − ∞ + ∞ f ( x ) x 2 d x + 1 2 ln 2 π σ 2 × 1

D ( f ‖ N ) = − S ( f ) + 1 2 σ 2 Var ( f ) + 1 2 ln ( 2 π σ 2 ) ( Var ( f ) ≤ σ 2 )

D ( f ‖ N ) ≤ − S ( f ) + 1 2 + 1 2 ln ( 2 π σ 2 )

D ( f ‖ N ) = − S ( f ) + 1 2 ( l n e + l n ( 2 π σ 2 ) )

D ( f ‖ N ) ≤ − S ( f ) + 1 2 l n ( 2 πe σ 2 ) ( D ( f ‖ N ) ≥ 0 by Proposition 1)

therefore S ( f ) ≤ S ( N ) . □

Proposition 5. In an isolated ideal gas, for each cartesian component, the velocity distribution, when normalized as a probability distribution, tends, in time, to reach the normal distribution (of a given variance).

Proof. The proposition is a consequence of the central limit theorem [

From the previous propositions Boltzmann’s H-theorem follows.

Proposition 6. (H-Theorem) In an isolated ideal gas the H function cannot increase in time.

Proof. By Proposition 5, velocities tend to distribute according to a normal law, and they keep the variance constant, according to Proposition 2. But, according to Proposition 4, this distribution is that one having the maximum entropy in the class of all probability distributions with a given variance. □

A simple experimental validation of the above proposition can be obtained with a simple game, which we call Pythagorean Recombination Game (PRG), based on the following rules [

1) Randomly choose two numbers a , b in G ;

2) Randomly split a into a 1 and a 2 , in such a way that a = a 1 2 + a 2 2 ;

3) Randomly split b into b 1 and b 2 , in such a way that b = b 1 2 + b 2 2 ;

4) Replace in G the pair a , b with the pair a ′ = a 1 2 + b 2 2 , b ′ = b 1 2 + a 2 2 .

The idea of PRG is that of representing a two-dimensional gas where any number is a velocity and the rules 2) and 3) define a collision axis and the exchange the velocity component with respect to this axis. If we play this game for a sufficiently number of steps and we compute the H-function at each step, we can easily check that H approaches to a minimum value, and at same time, the distribution of numbers (velocities within some intervals) clearly appro- ximates to a χ distribution of freedom degree 2. This distribution corresponds to that of a random variable X 2 + Y 2 where both X , Y follow a normal distribution.

Proposition 7. In Pythagorean Recombination Game H-function does not increase and the distribution of velocity components tend to a normal distri- butions as the game goes on.

Proof. All the hypotheses of Proposition 6 are verified in PRG. In fact, collisions are elastic, because after each “collision” the kinetic energy remain unchanged. Moreover, the system is isolated because no energy is exchanged with the external world, and the numbers of colliding velocities and steps can be assumed to have values for which “large number laws” are acting in the process of velocity recombinations during the game. □

The proposition above was experimentally verified by a suitable Matlab program implementing PRG (for suitable numerical parameters).

The schema of H-function definition from Boltzmann’s formula S = k lg W is well known studied and presented in a lot of papers of physical and infor- mational entropy. But there is an hidden aspect that has never properly inve- stigated. The H function that is proportional to H S Shannon entropy coincides, up to Stirling approximation, with:

H S = lg n ! Π { { n i ! } 1 ≤ i ≤ m . (19)

Here we briefly suggest a possible generalization of entropy that could be useful in the analysis of entropy for finite structures, that is something very relevant in the context of Machine Learning approach and computational geno- mics investigations [

Proposition 8 (Digital Information Bounds) The total number digit k ( n ) of digits necessary to encode n objects as strings, over an alphabet of k digits, verifies the following condition:

n ( ⌊ lg k ( n ) ⌋ + 1 ) ≥ digit k ( n ) ≥ n ( ⌊ lg k ( n ) ⌋ − 1 ) . (20)

Proof. We present a simple geometric proof of this proposition. Assume to place the n object we want to encode over k digits along a top line at level 0 (encoding n with a segment of n unitary lengths). Under it, we place shorter lines such that the line of level i has the length of the line above it minus k i . We can depict this by means of the following line arrangement (here lengths are not proportional to the values):

------------------------------------------------ Level 0 ( n objects)

----------------------------------------- Level 1 ( n − k objects)

-------------------------------- Level 2 ( n − k − k 2 objects)

.

.

.

----------------------- Level ⌊ l g k ( n ) ⌋ − 1

----------- Level ⌊ l g k ( n ) ⌋

Now, the minimum number of digits that we need for encoding n objects is given by the sum of the lengths of all the lines of the arrangement. In fact, first line counts the initial number of digits that we need to assign at least one digit to each of the n objects. The second line assigns n − k digits to the objects that need at least two digits to be encoded. Then, iteratively, at the ⌊ l g k ( n ) ⌋ + 1 step we add the last number of digits for the objects encoded with strings of maximum length. More formally we have that:

digit k ( n ) = ∑ i = 0 ⌊ lg k ( n ) ⌋ F n ( i ) (21)

where, for 0 ≤ i ≤ ⌊ l g k ( n ) ⌋ , F i ( 0 ) = n , F i ( i + 1 ) = F ( i ) − k i + 1 .

Therefore, the exact number of digits is given by Equation (21), while the condition stated by the proposition follows from the fact that the number of digits used from level 0 to level ⌊ l g n ⌋ is surely less than n ( ⌊ l g k ( n ) ⌋ + 1 ) , while the number of digits used from level 0 to level ⌊ l g n ⌋ is surely greater than n ⌊ l g n ⌋ minus the missing parts (the gap with respect to n) of levels from 1 to ⌊ l g n ⌋ − 1 , and this missing part provides a sum less than n (as it easily

results by evaluating k + ( k + k 2 ) + ⋯ + ( k + k 2 + ⋯ + n ⌊ l g n ⌋ − 1 ) ). Therefore, the

number of digits used from level 0 to level ⌊ l g n ⌋ − 1 is greater than n ⌊ l g n ⌋ − n , so the condition is proved. □

The proof of H-Theorem given in the present paper substantially differs from classical proofs of the same type. In particular, Jaynes [

In many applications, where entropy is the empirical entropy computed from observed frequencies, the probability estimation is not the main aspect of the problem [

In [

From the above Proposition it follows that the average length of strings encoding n objects is between ⌊ l g k ( n ) ⌋ + 1 and ⌊ l g k ( n ) ⌋ − 1 , therefore l o g k ( n ) can be assumed as (a good approximation to) the average amount of digital information that is intrinsically related to a set of n objects. Now, if we go back to Equation (19), in the light of proposition above, we discover that Shannon Entropy is proportional to the minimal digital information of the set of permutations of n objects into m classes that leave unchanged the cardi- nalities of these (disjoint) classes. This phenomenon can be easily generalized in many ways, by providing new simple methods for computing other kinds of entropies over finite structures.

Let us shorty outline a general notion of Structural Entropy directly related to Equation (19). Let us consider a finite structure Z (of some type) defined over a set A , called the support of Z . Let us consider the class F A of 1-to-1 functions from A to A . The subet F Z of F A is the set of iso-confi- gurations of Z sending Z in itself (or in an equivalent structure with respect to a suitable notion of structural equivalence). Therefore, we define the Structural Entropy of Z (of base k ) as the average digital information of the isoconfigurations F Z (see Proposition 8, | | is set cardinality):

H F ( Z ) = digit k ( | F Z | ) / | F Z | ≈ lg k ( | F Z | ) . (22)

The above definition tells us that Boltzmann’s H-function corresponds to the Structural Entropy of a partition of n undistinguishable objects into m distinct sets, when F is the class all n -permutations. Topological entropy [

The arrow of time is a consequence of central limit theorem plus the infor- mational law of maximum entropy of normal distributions (deduced from Kulback-Leibler entropic divergence positivity). For this reason, time irreversi- bility emerges when physical systems are complex enough and “large numbers” phenomena can act on them. This complexity, which is typical in living systems, is the origin of the intrinsic direction of biological time. Here a discrete version of H-theorem is proved, which shows the informational mechanisms on which it is based.

The proof of H-Theorem by means of concepts from probability and information theory has an interest going beyond the technical aspects of the proof. Namely, the focus is not on the result in itself (today many demonstrative approaches are known [

The ahutor wants to express his gratitude to Andreas Holzinger who encouraged him to write the present paper.

Manca, V. (2017) An Informational Proof of H-Theorem. Open Access Library Journal, 4: e3396. https://doi.org/10.4236/oalib.1103396