Entropy—A Universal Concept in Sciences

Entropy represents a universal concept in science suitable for quantifying the uncertainty of a series of random events. We define and describe this notion in an appropriate manner for physicists. We start with a brief recapitulation of the basic concept of the theory probability being useful for the determination of the concept of entropy. The history of how this concept came into its to-day exact form is sketched. We show that the Shannon entropy represents the most adequate measure of the probabilistic uncertainty of a random object. Though the notion of entropy has been introduced in classical thermodynamics as a thermodynamic state variable it relies on concepts studied in the theory of probability and mathematical statistics. We point out that whole formalisms of statistical mechanics can be rewritten in terms of Shannon entropy. The notion “entropy” is differently understood in various science disciplines: in classical physics it represents the thermody-namical state variable; in communication theory it represents the efficiency of transmission of communication; in the theory of general systems the magnitude of the configurational order; in ecology the measure for bio-diversity; in statistics the degree of disorder, etc. All these notions can be mapped on the general mathematical concept of entropy. By means of entropy, the configurational order of complex systems can be exactly quantified. Besides the Shannon entropy, there exists a class of Shannon-like entropies which converge, under certain circumstances, toward Shannon entropy. The Shannon-like entropy is sometimes easier to handle mathematically then Shannon entropy. One of the important Shannon-like entropy is well-known Tsallis entropy. The application of the Shannon and Shannon-like entropies in science is really versatile. Besides the mentioned statistical physics, they play a fundamental role in the quantum information, communication theory, in the description of disorder, etc.


Introduction
At the most fundamental level, all our further considerations rely on the concept of probability.Although there is a well-defined mathematical theory of probability, there is no universal agreement about the meaning of probability.Thus, for example, there is the view that probability is an objective property of a system and another view that it describes a subjective state of belief of a person.Then there is the frequentist view that the probability of an event is the relative frequency of its occurrence in a long or infinite sequence of trials.This latter interpretation is often employed in the mathematical statistics and statistical physics.The probability means in everyday life the degree of ignorance about the outcome of a random trial.This is why the probability is commonly interpreted as degree of the subjective expectation of an outcome of a random trial.Both subjective and statistical probability are "normed".It means that the degree of expectation that an outcome of a random trial occurs, and the degree of the "complementary" expectation, that it does not, is always equal to one [1]  1 .
Although the concept of probability is here covered in a sophisticated mathematical language, it expresses only the commonly familiar properties of probability used in everyday life.For example, each number of spots at the throw of a simple die represents an elementary random event to which a positive real number is associated called its probability (relation (i)).The probability of two (or more) numbers of spots at the throw of a simple die is equal to the sum of their probabilities (relation (iii)).The sum of probabilities of all possible numbers of spots is normed to one (relation (iv)).
The word "entropy" 2 was first used in 1984 by Clausius in his book Abhandlungen über Wärmetheorie to describe a quantity accompanying a change from the thermal to mechanical energy and it continued to have this meaning in thermodynamics.Boltzmann [2] in his Vorlesungen über Gastheorie presented the statistical interpretation of the thermodynamical entropy.He linked the thermodynamic entropy with the molecular disorder.The general concept of entropy as a measure of uncertainty was first introduced by Shannon and Wiener.Shannon is also credited for the development of a quantitative measure of the amount of information [3].Shannon entropy may be considered as a generalization of entropy, defined by Hartley, when the probability of each event is equal.Nyquist [4] was the first author who introduced a measure of information.His paper has largely remained unnoticed.After publication of Shannon seminal paper in 1948 [3], the use of entropy as measure of uncertainty grew rapidly and was applied with various successes in most area of human endeavor.
Mathematicians were attracted to the possibility of providing axiomatic structure of entropy and to the ramification thereof.The axiomatic approach to the concept of entropy attempts to find a system of postulates which provides a unique mathematical characteristic of entropy 3 and which adequately reflects the properties asked from the probabilistic uncertainty measure in a diversified real situation.This has been very interesting and thought-provoking area for scientists.Khinchin [5] was the first who gave a clear and rigorous presentation of the mathematical foundation of entropy.A good number of works have been done to describe the properties of entropy.An extensive list of works in this field can be found in the book of Aczcél and Daróczy [6].
The fundamental concept for the description of random processes is the notion of the random trial.A random trial is characterized by a set of its outcomes (values) and the corresponding probability distribution.A typical random trial is the throw of a single dice characterized by the following scheme ( 1 The concept of probability was mathematically clarified and rigorously determined about sixty years ago.The probability is interpreted as a complete measure on the σ-algebra γ of the subsets S1, S2,•••Sn of the set of the elementary random events B. The probability measure P fulfils following relations: are such elements of σ-algebra γ, for which 0, i j S S i j = ≠ , then it holds the following equation: .
The σ-algebra, on which the set function P is defined, is called the Kolmogorov probability algebra.The triplet [ ] , , B P γ denotes the probability space.Under a random variable x  we understand each real-valued measurable function defined on the elementary random events B [1]. 2 The word "entropy" stems from the Greek word " øπ τρ η " which means "transformation". 3Entropy is sometimes called "missing information".
To any random trial it is assigned a random variable x  which represents a mathematical quantity assuming a set of values with the corresponding probabilities (see, e.g.[1]).
There are two measures which express the uncertainty of a random trial: (i) The moment measures containing in its definition both the values assigned to trial outcomes and the set of the corresponding probabilities.The moment uncertainty measures are given as a rule by the higher statistical moments of a random variable x  .As it is well-known, the k -th moment about zero (uncorrelated moment) k x , and the central moment of the k -th order k c x assigned to a discrete random variable x  with the proba- bility distribution , , , , is defined as The statistical moments of a random variable are often used as the uncertainty measures of the random trial, especially in the experimental physics, where, e.g., the standard deviation of measured quantities characterizes the accuracy of a physical measurement.The moment uncertainty measures of a random variable are also used by formulating the uncertainty relations in quantum mechanics [7].
(ii) The probabilistic or entropic measures of uncertainty of a random trial contain in their expressions only the components of the probability distribution of a random trial.
To determine the notion of entropy we consider quantities, called as partial uncertainties which are assigned to individual probabilities , 1, 2, , A partial uncertainty we denote by symbol i H .In any probabilistic uncertainty measures, a partial uncertainty is function only of the corresponding probability ( ), where i P and j P are the probability of the i-th and j-th outcome, respectively; (iii) ( ) It was shown that the only function which satisfies these requirements has the form [8] ( ) log .The mean value of the partial uncertainties 1 2 , , , n H H H  is log .
The quantity H  is called information-theoretical or Shannon entropy.We denote it by symbol S. Shannon entropy is a real and positive number.It is a function only of the components of the probability distribution , , , n P P P P ≡  assigned to the set of outcomes of a random trial.Shannon entropy satisfies the following demands (see Appendix): (i) If the probability distribution contains only one component, e.g.
1, 1, 2, ,  , and the rest components are equal to zero, then ( ) 0 S P = .In this case, there is no uncertainty in a random trial because an outcome is realized with certainty.
(ii) The more spread is the probability distribution P , the larger becomes the entropy S .
(iii) For a uniform probability distribution u P , ( ) u H P becomes maximal.In this case, the probabilities of all outcomes are equal, therefore the mean uncertainty of such a random trial becomes maximum.
One uses for the characterization of a random trial a random scheme.If x  is a discrete random variable as- signed to a random trial then its random scheme has the form , , , n S S S  are the outcomes of a random trial (in quantum physics, e.g. the quantum states), We note that there is a set of the probabilistic uncertainty measures defined by means of other functions then ( ) . They are called nonstandard or Shannon-like entropies.We shall deal with them in the next sections.

Entropy as a Qualificator of the Configurational Order
Since the simple rule holds that the smaller the order in a system the larger its entropy, the entropy appears to be the appropriate quantity for the expression of the measure of the configurational order (organization).The orderliness and entropy of a physical system are related to each other inversely so that any increase in the degree of configurational order must necessarily result in the decrease of its entropy.The measure of the configurational order constructed by using entropies is called Watanabe measure and is defined as follows [9]: configurational order of a system = (sum of entropies of the parts of the system) − (entropy of the whole system).
The Watanabe measure for configurational order is related to the other measure of configurational organization well-known in theory of information, called redundancy.Both measures express quantitatively the property of the configurationally organized systems to have order between its elements, which causes that the system as a whole behaves in a more deterministic way than its individual parts.If a system consists only of elements which are statistically independent, the Watanabe measure for the configurational organization becomes zero.If the elements of a system are deterministically dependent, its configurational organization gets the maximum value.A general system has its configurational organization between these extreme values.To the prominent systems which can be organized configurationally belong physical statistical systems (i.e., about all, Ising systems of spins) [10].High configurational organization is exhibited especially by systems which have some spatial, temporal or spatio-temporal structures that have arisen in a process which takes place far from thermal equilibrium (e.g.laser, fluid instabilities, etc.) [11].These systems can be sustained only by a steady flow of energy and matter, therefore they are called open systems [12].A large class of systems, which are generally organized configurationally as well as functionally, comprises the so-called string systems which represent sequences of elements forming finite alphabets.To these systems belong, e.g., language, music, genetic DNA and various bio-polymers.Since many of such systems are goal-directed and have a functional organization as well, they are especially appropriate for the study of the interrelation between the configurational and functional organization [10].

The Concept of Entropy in Thermodynamics and Statistical Physics
A remarkable event in the history of physics was the interpretation of the phenomenological thermodynamics in terms of motion and randomness.In this interpretation, the temperature is related to motion while the randomness is linked with the Clausius entropy.The homeomorphous mapping of the phenomenological thermodynamics on the formalism of mathematical statistics gave rise to two entropy concepts: the Clausius thermodynamic entropy as a thermodynamic state variable of a thermodynamic system and the Boltzmann statistical entropy as the logarithm of probability of state of a physical ensemble.The fact that the thermodynamic entropy is a state variable means that it is completely defined when the state (pressure, volume, temperature, etc.) of a thermodynamic system is defined.This is derived from mathematics, which shows that only the initial and final states of a thermodynamic system determine the change of its entropy.The larger the value of the entropy of a particular state of a thermodynamic system, the less available is the energy of this system to do work.
The statistical concept of entropy was introduced in physics when seeking a statistical quantity homeomorphous with the thermodynamic entropy.As it is well-known, the Clausius entropy of a thermodynamic system t S is linked with ensemble probability W by the celebrated Boltzmann law, log , where W is so-called "thermodynamic" probability determined by the configurational properties of a statistical system and B K is the Boltzmann constant 4 .The Boltzmann law represents the solution to the functional equation between t S and W .Let us consider a set of the isolated thermodynamic systems 1 2 , , , n Σ Σ Σ  . According to Clausius, the total entropy of this system is an additive function of the entropies of its parts, i.e., it holds .
On the other side, the joint "thermodynamic" probability of system ( 3) is .
To obtain the homomorphism between Equations ( 3) and ( 4), it is sufficient that log .
which is just the Boltzmann law [2].
We give some remarks regarding the relationship between the Clausius, Boltzmann and Shannon entropies: (i) The thermodynamic probability W in the Boltzmann law is given by the number of the possibilities how to distribute N particles in n cells having different energies 1 2 , , , We show that the physical entropy given by Boltzmann's law is equal to the sum of Shannon entropies of energies taken as random variables defined on the individual particles, i.e., .
The probability i P that a particle of the statistical ensemble has the i-th value of energy is given by the ratio i N N .Inserting the probabilities ( ) ( ) ( ) , , , into Boltzmann's entropy formula we have Supposing that the number of particles in a statistical ensemble is very large, we can use the asymptotic formula ( ) For very large N, the second term in Equation ( 9) can be neglected and we find 4 The probability as well as the Shannon entropy is dimensionless quantities.On the other side, the thermodynamical entropy has the physical dimension equal to Therefore, in order to get the correct physical dimension for the thermodynamic entropy we must multiply the Shannon entropy by the Boltzmann constant, which has the value We see that the Boltzmann entropy of an ensemble with large N is equal to the sum of Shannon entropies of the individual particles.The asymptotical equality between Boltzmann and Shannon entropies for the large N makes it possible to use the Shannon entropy also for describing an statistical ensemble.The pioneer on this field was E. Jaynes who published, already in fifties, works in which only Shannon entropy was used to formulate statistical physics [13].However, many authors advocating the use of Shannon entropy in statistical physics do not fully realized the difference between Boltzmann's and Shannon entropy.The use of Shannon entropy can be only justified if one considers the physical ensemble as a system of random objects on which energy (or other physical quantity) is taken as a random variable.Then, the total entropy of the whole ensemble is given as the sum of Shannon entropies of individual statistical elements (e.g., particles).While the Boltzmann's entropy loses its sense for an ensemble containing only a few particles, Shannon entropy is defined also for an "ensemble" with even one particle.Boltzmann's entropy is typical ensemble concept while Shannon entropy is a probabilistic concept.This is not only the change of the methodology when treating statistical ensemble but it has also long-reaching conceptual and even pedagogical consequences.
According to Jaynes [14], the equilibrium probability distribution of the particle energy of a statistical ensemble should maximize the Shannon entropy ( ) ( ) subject to given constraints.For example, by taking the mean energy per particle as the constraint at the extremizing procedure, we obtain the following probability distribution for the particle energy where the constants λ and µ are to be determined by substituting ( ) i P E into constraint's equations.We see how easily and quickly we obtain results forming the essence of the classical statistical mechanics.The use of Shannon entropy in statistical physics makes it possible to rewrite it in terms of modern theory of probability where a statistical ensemble is treated as a collection of the mutually interacting random objects [13].

The Shannon-Like Entropies
Recently, there is an endeavour in the applied sciences (see, e.g.[15]) to employ entropic measures of uncertainty having similar properties as information entropy, but they are simpler to handle mathematically.The classical measure of probabilistic uncertainty which has dominated in the literature since it was proposed by Shannon, is the information or Shannon entropy defined for a discrete random variable according by the formula log .
Since Shannon has introduced his entropy, several other classes of probabilistic uncertainty measures (entropies) have been described in the literature (see, e.g., [16]).We can broadly divide them into two classes: (i) The Shannon-like uncertainty measures which for a certain value of the corresponding parameters converge towards the Shannon entropy, e.g., Rényi's entropy (ii) The Maassen and Uffink uncertainty measures which converges also, under certain conditions, to the Shannon entropy, ( ) ( ) (iii) The uncertainty measures having no direct connection to Shannon entropy, e.g., information "energy" defined in information theory as [16] ( ) ( ) and called Hilbert-Schmidt norm in quantum physics.The most important uncertainty measures of the first class are: (i) The Rényi entropy defined as follows [17] ( ) ( ) (ii) The Havrda-Charvat entropy (or α -entropy) 5 is defined as [18] ( ) ( ) ( ) For the sake of completeness, we list some other entropy-like uncertainty measures presented in the literature [19]: (i) The trigonometric entropy is defined as [10] ( ) ( ) ( ) ( ) (ii) The R-norm entropy γ and b H and then recover by taking limits , , , , A quick inspection shows that all five Shannon-like entropies listed above are all mutually functionally related.For example, each of the Havrda-Charvat entropies can be expressed as a function of the Rényi's entropy, and vice versa There are six properties which are usually considered desirable for a measure of a random trial: (i) symmetry, (ii) expansibility, (iii) subadditivity, (iv) additivity, (v) normalization, and (vi) continuity.The only uncertainty measure which satisfies all these requirements is Shannon entropy.Each of the other entropies violates at least one of them, e.g.Rényi's entropy violates only the subadditivity property, Havrda-Charvat's entropy violates the additivity property, the R-norm entropies violate both subadditivity and additivity.More details about the properties of each entropies can be found elsewhere (e.g., [15]).The Shannon entropy satisfies all above requirements put on uncertainty measure and it exact matches the properties of physical entropy 6 .All these classes of entropies represent the probabilistic uncertainty measures which have similar mathematical properties as Shannon entropy.
The best known Shannon-like probabilistic uncertainty measure is the Havrda and Charvat entropy [18] which is more general than Shannon measure and much simpler than Renyi's measure.It depends on a parameter α which is from the interval As such, it represents a family of uncertainty measures which includes information entropy as a limiting case when 1 α → .We note that in physics the Havrda Charvat entropy is known as Tsallis entropy [20].All the mentioned entropic measures of uncertainty are functions of the components of the probability distribution of a random variable A x  and they have three important properties: (i) They assume their maximal values for the uniform probability distribution of x  .(ii) They become zero for the prob- ability distributions having only one component.(iii) They express a measure of the spread of a probability distribution.The larger this spread becomes, the smaller values they assume.These properties qualify them for be-ing the measures of uncertainty (inaccuracy) in the physical theory of measurement.
The entropic uncertainty measures for a discrete random variable are, in the frame of theory of probability, exactly defined.The transition from the discrete to the continuous entropic uncertainty measures is, however, not always unique and has still many open problems.A continuous random variable c x  is characterized by the function of its probability density ( ) p x .The moment and probabilistic uncertainty measures exist also for the continuous random variables.The typical moment measure is the k -th central moment of c x  .The classical probabilistic uncertainty measure of a continuous random variable c x  is the corresponding Shannon entropy It is a function of the probability density ( ) p x and consists of two terms 7 ( ) ( ) ( ) ( )  H x ) for the entropic uncertainty measure of a continuous random variable.This functional is well known to play an important role in probability and statistics.We refer to [15] for applications of the Shannon entropy functional to the theory of probability and statistics.
As it is well-known, the Shannon entropy functionals of some continuous variables represent complicated integrals which often are difficult to compute analytically or even numerically.Everybody, who tried to calculate analytically the differential entropies of the continuous variables, became aware how difficult it may be.From the purely mathematical point of view, the differential entropy can be taken as a formula for expressing the spread of any standard single-valued function (the probability density belongs to this class of functions).Generally, the Shannon entropy functional assigns to a probability density function (belonging to the class of functions ( ) L R ) a real number H through a mapping ( ) L R H → .H is a monotonously increasing function of the degree of "spreading" of ( ) p x , i.e. the larger H becomes, the spread is ( ) p x .The Shannon entropy functional was studied just at the beginning of information theory [17].Since that time, besides the Shannon entropy functional, several other entropy functionals were introduced and studied in the probability theory.The majority of them are dependent on certain parameters.As such, they form a whole family of different functionals (including the Shannon entropy functional as a special case).In a sense, they are a generalization of the Shannon entropy functional.Some of them can equally well express the spread of the probability density functions as differential entropy and are considerably easier to handle mathematically.These include: (i) The Rényi entropic functional [17] ( ) ( ) ( ) 7 In order to apply the formula for the Shannon entropy for the continuous random variable c x  with the probability density function ( ) p x , we divide the x -axis into n equidistant intervals.The probability that c x  assumes value from the interval , H p is the Shannon entropy functional.

Conclusions
From what has been said so far it follows: (i) The concept of entropy is inherently connected with the probability distribution of outcomes of a random trial.The entropy quantifies the probability uncertainty of a general random trial.
(ii) There are two ways how to express the uncertainty of a random trial: The moment and probabilistic measure.The former measure includes in its definition both values assigned to trial outcomes and their probabilities.The latter measure contains in its definition only the corresponding probabilities.The moment uncertainty measures are given as a rule by the higher statistical moments of a random variable whereas the probabilistic measure is expressed by means of entropy.The most important probabilistic uncertainty measure is the Shannon entropy defined by the formula where i P is the probability of i-th outcome of a random trial.(iii) By means of Shannon entropy it is possible to quantify the configurational order in the set of elements of a general system.The corresponding quantity is called the Watanabe measure of configurational order and is defined as follows configurational order of a system = (sum of entropies of the parts of the system) − (entropy of the whole system).This measure expresses quantitatively the property of a configurationally organized systems to have order between its elements, which causes that the system as a whole behaves in a more deterministic way than its individual parts.
(iv) The asymptotical equality between the Boltzmann and Shannon entropies for the statistical systems with large particles makes it possible to use the Shannon entropy for describing statistical ensembles.
(v) Besides the Shannon entropy there exists a class of so-called Shannon-like Entropies.The most important ( ) Interpretation of the above properties agrees with common sense, intuition, and the reasonable requirements that can be asked from a measure of uncertainty.Indeed, a random experiment which has only one possible outcome (that is, a strictly deterministic trial) contains no uncertainty at all; we know what will happen before per- the last case, summing the resulting n inequalities.
dimension of the physical probability density function.In contrast with the probability which is a dimensionless number, the probability density function has a dimension so that its appearance behind the logarithm and cosine in the entropy functionals [21]γfor a physical random variable (see, e.g.,[21])8,9.