_{1}

^{*}

Entropy represents a universal concept in science suitable for quantifying the uncertainty of a series of random events. We define and describe this notion in an appropriate manner for physicists. We start with a brief recapitulation of the basic concept of the theory probability being useful for the determination of the concept of entropy. The history of how this concept came into its to-day exact form is sketched. We show that the Shannon entropy represents the most adequate measure of the probabilistic uncertainty of a random object. Though the notion of entropy has been introduced in classical thermodynamics as a thermodynamic state variable it relies on concepts studied in the theory of probability and mathematical statistics. We point out that whole formalisms of statistical mechanics can be rewritten in terms of Shannon entropy. The notion “entropy” is differently understood in various science disciplines: in classical physics it represents the thermodynamical state variable; in communication theory it represents the efficiency of transmission of communication; in the theory of general systems the magnitude of the configurational order; in ecology the measure for bio-diversity; in statistics the degree of disorder, etc. All these notions can be mapped on the general mathematical concept of entropy. By means of entropy, the configurational order of complex systems can be exactly quantified. Besides the Shannon entropy, there exists a class of Shannon-like entropies which converge, under certain circumstances, toward Shannon entropy. The Shannon-like entropy is sometimes easier to handle mathematically then Shannon entropy. One of the important Shannon-like entropy is well-known Tsallis entropy. The application of the Shannon and Shannon-like entropies in science is really versatile. Besides the mentioned statistical physics, they play a fundamental role in the quantum information, communication theory, in the description of disorder, etc.

At the most fundamental level, all our further considerations rely on the concept of probability. Although there is a well-defined mathematical theory of probability, there is no universal agreement about the meaning of probability. Thus, for example, there is the view that probability is an objective property of a system and another view that it describes a subjective state of belief of a person. Then there is the frequentist view that the probability of an event is the relative frequency of its occurrence in a long or infinite sequence of trials. This latter interpretation is often employed in the mathematical statistics and statistical physics. The probability means in everyday life the degree of ignorance about the outcome of a random trial. This is why the probability is commonly interpreted as degree of the subjective expectation of an outcome of a random trial. Both subjective and statistical probability are “normed”. It means that the degree of expectation that an outcome of a random trial occurs, and the degree of the “complementary” expectation, that it does not, is always equal to one [^{1}.

Although the concept of probability is here covered in a sophisticated mathematical language, it expresses only the commonly familiar properties of probability used in everyday life. For example, each number of spots at the throw of a simple die represents an elementary random event to which a positive real number is associated called its probability (relation (i)). The probability of two (or more) numbers of spots at the throw of a simple die is equal to the sum of their probabilities (relation (iii)). The sum of probabilities of all possible numbers of spots is normed to one (relation (iv)).

The word “entropy”^{2} was first used in 1984 by Clausius in his book Abhandlungen über Wärmetheorie to describe a quantity accompanying a change from the thermal to mechanical energy and it continued to have this meaning in thermodynamics. Boltzmann [

Mathematicians were attracted to the possibility of providing axiomatic structure of entropy and to the ramification thereof. The axiomatic approach to the concept of entropy attempts to find a system of postulates which provides a unique mathematical characteristic of entropy^{3} and which adequately reflects the properties asked from the probabilistic uncertainty measure in a diversified real situation. This has been very interesting and thought-provoking area for scientists. Khinchin [

The fundamental concept for the description of random processes is the notion of the random trial. A random trial is characterized by a set of its outcomes (values) and the corresponding probability distribution. A typical random trial is the throw of a single dice characterized by the following scheme (

S

S_{1}

S_{2}

S_{3}

S_{4}

S_{5}

S_{6}

P

1/6

1/6

1/6

1/6

1/6

1/6

x

1

2

3

4

5

6

To any random trial it is assigned a random variable

There are two measures which express the uncertainty of a random trial:

(i) The moment measures containing in its definition both the values assigned to trial outcomes and the set of the corresponding probabilities. The moment uncertainty measures are given as a rule by the higher statistical moments of a random variable

and

respectively.

The statistical moments of a random variable are often used as the uncertainty measures of the random trial, especially in the experimental physics, where, e.g., the standard deviation of measured quantities characterizes the accuracy of a physical measurement. The moment uncertainty measures of a random variable are also used by formulating the uncertainty relations in quantum mechanics [

(ii) The probabilistic or entropic measures of uncertainty of a random trial contain in their expressions only the components of the probability distribution of a random trial.

To determine the notion of entropy we consider quantities, called as partial uncertainties which are assigned to individual probabilities

(i) It is a monotonously decreasing continuous and unique function of the corresponding probability;

(ii) The common value of the uncertainty of a certain outcome of two statistically independent trials

where

(iii)

It was shown that the only function which satisfies these requirements has the form [

The mean value of the partial uncertainties

The quantity

Shannon entropy satisfies the following demands (see Appendix): (i) If the probability distribution contains only one component, e.g.

(ii) The more spread is the probability distribution

(iii) For a uniform probability distribution

One uses for the characterization of a random trial a random scheme. If

S

P

X

We note that there is a set of the probabilistic uncertainty measures defined by means of other functions then

Since the simple rule holds that the smaller the order in a system the larger its entropy, the entropy appears to be the appropriate quantity for the expression of the measure of the configurational order (organization). The orderliness and entropy of a physical system are related to each other inversely so that any increase in the degree of configurational order must necessarily result in the decrease of its entropy. The measure of the configurational order constructed by using entropies is called Watanabe measure and is defined as follows [

configurational order of a system = (sum of entropies of the parts of the system) − (entropy of the whole system).

The Watanabe measure for configurational order is related to the other measure of configurational organization well-known in theory of information, called redundancy. Both measures express quantitatively the property of the configurationally organized systems to have order between its elements, which causes that the system as a whole behaves in a more deterministic way than its individual parts. If a system consists only of elements which are statistically independent, the Watanabe measure for the configurational organization becomes zero. If the elements of a system are deterministically dependent, its configurational organization gets the maximum value. A general system has its configurational organization between these extreme values. To the prominent systems which can be organized configurationally belong physical statistical systems (i.e., about all, Ising systems of spins) [

A remarkable event in the history of physics was the interpretation of the phenomenological thermodynamics in terms of motion and randomness. In this interpretation, the temperature is related to motion while the randomness is linked with the Clausius entropy. The homeomorphous mapping of the phenomenological thermodynamics on the formalism of mathematical statistics gave rise to two entropy concepts: the Clausius thermodynamic entropy as a thermodynamic state variable of a thermodynamic system and the Boltzmann statistical entropy as the logarithm of probability of state of a physical ensemble. The fact that the thermodynamic entropy is a state variable means that it is completely defined when the state (pressure, volume, temperature, etc.) of a thermodynamic system is defined. This is derived from mathematics, which shows that only the initial and final states of a thermodynamic system determine the change of its entropy. The larger the value of the entropy of a particular state of a thermodynamic system, the less available is the energy of this system to do work.

The statistical concept of entropy was introduced in physics when seeking a statistical quantity homeomorphous with the thermodynamic entropy. As it is well-known, the Clausius entropy of a thermodynamic system ^{4}. The Boltzmann law represents the solution to the functional equation between

On the other side, the joint “thermodynamic” probability of system (3) is

To obtain the homomorphism between Equations (3) and (4), it is sufficient that

which is just the Boltzmann law [

We give some remarks regarding the relationship between the Clausius, Boltzmann and Shannon entropies:

(i) The thermodynamic probability

We show that the physical entropy given by Boltzmann’s law is equal to the sum of Shannon entropies of energies taken as random variables defined on the individual particles, i.e.,

The probability

into Boltzmann’s entropy formula we have

Supposing that the number of particles in a statistical ensemble is very large, we can use the asymptotic formula

which inserted in Boltzmann’s entropy yields

For very large N, the second term in Equation (9) can be neglected and we find

We see that the Boltzmann entropy of an ensemble with large

According to Jaynes [

subject to given constraints. For example, by taking the mean energy per particle as the constraint at the extremizing procedure, we obtain the following probability distribution for the particle energy

where the constants

Recently, there is an endeavour in the applied sciences (see, e.g. [

Since Shannon has introduced his entropy, several other classes of probabilistic uncertainty measures (entropies) have been described in the literature (see, e.g., [

(i) The Shannon-like uncertainty measures which for a certain value of the corresponding parameters converge towards the Shannon entropy, e.g., Rényi’s entropy

(ii) The Maassen and Uffink uncertainty measures which converges also, under certain conditions, to the Shannon entropy,

(iii) The uncertainty measures having no direct connection to Shannon entropy, e.g., information “energy” defined in information theory as [

and called Hilbert-Schmidt norm in quantum physics. The most important uncertainty measures of the first class are:

(i) The Rényi entropy defined as follows [

(ii) The Havrda-Charvat entropy (or ^{5} is defined as [

For the sake of completeness, we list some other entropy-like uncertainty measures presented in the literature [

(i) The trigonometric entropy is defined as [

(ii) The R-norm entropy

All the above-listed Shannon-like entropies converge towards Shannon entropy if

A quick inspection shows that all five Shannon-like entropies listed above are all mutually functionally related. For example, each of the Havrda-Charvat entropies can be expressed as a function of the Rényi’s entropy, and vice versa

There are six properties which are usually considered desirable for a measure of a random trial: (i) symmetry, (ii) expansibility, (iii) subadditivity, (iv) additivity, (v) normalization, and (vi) continuity. The only uncertainty measure which satisfies all these requirements is Shannon entropy. Each of the other entropies violates at least one of them, e.g. Rényi’s entropy violates only the subadditivity property, Havrda-Charvat’s entropy violates the additivity property, the R-norm entropies violate both subadditivity and additivity. More details about the properties of each entropies can be found elsewhere (e.g., [^{6}. All these classes of entropies represent the probabilistic uncertainty measures which have similar mathematical properties as Shannon entropy.

The best known Shannon-like probabilistic uncertainty measure is the Havrda and Charvat entropy [

The entropic uncertainty measures for a discrete random variable are, in the frame of theory of probability, exactly defined. The transition from the discrete to the continuous entropic uncertainty measures is, however, not always unique and has still many open problems. A continuous random variable ^{7}

As it is well-known, the Shannon entropy functionals of some continuous variables represent complicated integrals which often are difficult to compute analytically or even numerically. Everybody, who tried to calculate analytically the differential entropies of the continuous variables, became aware how difficult it may be. From the purely mathematical point of view, the differential entropy can be taken as a formula for expressing the spread of any standard single-valued function (the probability density belongs to this class of functions). Generally, the Shannon entropy functional assigns to a probability density function (belonging to the class of functions

The Shannon entropy functional was studied just at the beginning of information theory [

(i) The Rényi entropic functional [

(ii) The Havrda-Charvat entropic functional [

(iii) The trigonometric entropic functional [

Note that

As it is known long ago, the entropy functionals ^{8,9}.

From what has been said so far it follows:

(i) The concept of entropy is inherently connected with the probability distribution of outcomes of a random trial. The entropy quantifies the probability uncertainty of a general random trial.

(ii) There are two ways how to express the uncertainty of a random trial:

The moment and probabilistic measure. The former measure includes in its definition both values assigned to trial outcomes and their probabilities. The latter measure contains in its definition only the corresponding probabilities. The moment uncertainty measures are given as a rule by the higher statistical moments of a random variable whereas the probabilistic measure is expressed by means of entropy. The most important probabilistic uncertainty measure is the Shannon entropy defined by the formula

where

(iii) By means of Shannon entropy it is possible to quantify the configurational order in the set of elements of a general system. The corresponding quantity is called the Watanabe measure of configurational order and is defined as follows configurational order of a system = (sum of entropies of the parts of the system) − (entropy of the whole system).

This measure expresses quantitatively the property of a configurationally organized systems to have order between its elements, which causes that the system as a whole behaves in a more deterministic way than its individual parts.

(iv) The asymptotical equality between the Boltzmann and Shannon entropies for the statistical systems with large particles makes it possible to use the Shannon entropy for describing statistical ensembles.

(v) Besides the Shannon entropy there exists a class of so-called Shannon-like Entropies. The most important Shannon-like entropies are (a) The Rényi entropy Equation (14); (b) The Havrda-Charvat entropy Equation (13). The well-known Tsallis entropy is mathematically identical with Havrda-Charvat entropy.

In conclusion, we can state that the concept of entropy is inherently connected with the probability theory. The application of Shannon entropy in science is really versatile. Besides the mentioned statistical physics, Shannon entropy is used in metronomic, in biological physics, in quantum physics and even in cosmology. Entropy expresses the extent of the randomness of a probabilistic (statistical) system and, therefore, it belongs to the important quantities for describing the natural phenomena. This is why entropy represents in physics a fundamental quantity next the energy.

We ask from the Shannon entropy the following desirable properties [

(i)

(ii) If probability distribution of

(iii) For

(iv)

with equality if and only if

(v) If

where the conditional entropy, defined as

(vi) Using notation given above it holds

The equality in (A4) is valid if and only if

in which case (A3) becomes

All these properties can be proved in an elementary manner. Without entering into the technical details, we note that properties (i)-(iii) are obvious while property (v) can be obtained by a straightforward computation taking into account only the definition of entropy. Finally, from Jensen’s inequality

applied to the concave function

Interpretation of the above properties agrees with common sense, intuition, and the reasonable requirements that can be asked from a measure of uncertainty. Indeed, a random experiment which has only one possible outcome (that is, a strictly deterministic trial) contains no uncertainty at all; we know what will happen before performing the experiment . This is just property (ii). If to the possible outcomes having the probability zero, the amount of uncertainty with respect to what will happen in the trial remains unchanged (property (iii)). Property (iv) tells us that in the class of all probabilistic trials having

where

and

Here

where

and

Here

is the conditional entropy of

which is the so-called “uncertainty balance”, the only conservation law for entropy.

Finally, property (vi) shows that some data on

with equality if and only if

with equality if and only if

Fortunately this inequality holds for any number of components. More generally, for

with equality if and only if

measures the global dependence between the random variables

Note that the difference between the amount of uncertainty contained by the pair

or, equivalently,

is the distance between the random variables

Khintchin [