An Algorithm to Generate Probabilities with Specified Entropy

The present communication offers a method to determine an unknown discrete probability distribution with specified Tsallis entropy close to uniform distribution. The relative error of the distribution obtained has been compared with the distribution obtained with the help of mathematica software. The applications of the proposed algorithm with respect to Tsallis source coding, Huffman coding and cross entropy optimization principles have been provided.


Introduction
After the publication of his first paper "A mathematical theory of communication", Shannon [1] made a remarkable discovery of entropy theory which immediately caught the interest of engineers, mathematicians and other scientists from various disciplines.Naturally one had speculated before Shannon about the nature of information but at the qualitative level but it was Shannon who for the first time introduced the following quantitative measure of information in a statistical framework: ( ) with the convention 0 log 0 : 0 = .Shannon's main focus was related with the type of communication problems related with engineering sciences but as the field of information theory progressed, it became clear that Shannon's entropy was not the only feasible information measure.Indeed, many modern communication processes, including signals, images and coding systems, often operate in complex environments dominated by conditions that do not match the basic tenets of Shannon's communication theory.For instance, coding can have a non-trivial cost functions, codes might have variable lengths, sources and channels may exhibit memory or losses, etc. Post-Shannon developments of non-parametric entropy, it was realized that generalized parametric measures of entropy can play a significant role to deal with the prevailing situations, since these measures introduce flexibility in the system and also helpful towards maximization problems.
An extension to the Shannon entropy proposed by Renyi [2], is given by ( ) The Renyi entropy offers a parametric family of measures, from which the Shannon entropy is accessible as a special case when 1 α → .Another information theorist Tsallis [3] introduced his measure of entropy, given by ( ) When q → 1, the Tsallis entropy recovers the Shannon entropy for any probability distribution.The Tsallis [4] entropy has been postulated to form the ground of a nonextensive generalization to statistical mechanics.Tsallis pioneering work has stimulated the exploration of the properties of other generalized or alternative information measures [5] [6].Oikonomou and Bagci [7] maximized the Tsallis entropy based on complete deformed functions to show that the escort distributions are redundant.Bercher [8] showed that Tsallis distributions can be derived from the standard (Shannon) maximum entropy setting, by incorporating a constraint on the divergence between the distribution and another distribution imagined as its tail.
Information theory provides a fundamental performance limits pertaining to certain tasks of information processing, such as data compression, error-correction coding, encryption, data hiding, prediction, and estimation of signals or parameters from noisy observations.Shannon [1] provided an operational meaning to his entropy through a source coding theorem by establishing the limits to possible data compression.Bercher [9] discussed the interest of escort distributions and Rényi entropy in the context of source coding whereas Parkash and Kakkar [10] developed new mean codeword lengths and proved source coding theorems.Huffman [11] introduced a procedure for designing a variable length source code which achieves performance close to Shannon's entropy bound.Baer [12] provided new lower and upper bounds for the compression rate of binary prefix codes optimized over memoryless sources.Mohajer et al. [13] studied the redundancy of Huffman codes whereas Walder et al. [14] provided algorithms for fast decoding of variable length codes.
In Section 2, we have provided an algorithm to find a discrete distribution closer to uniform distribution with specified Tsallis [3] entropy.We have proved the acceptability of the algorithm by comparing the relative error of the probability distribution generated through algorithm with the probability distribution generated through Mathematica software through an example.Section 3 provides the brief introduction to source coding and the study of source coding with the Tsallis entropy.Also, we have extended the applications of the algorithm with respect to Tsallis source coding, Huffman coding and cross entropy optimization principles.

Generating Probability Distribution Closer to Uniform Distribution with Known Entropy
Tsallis introduced the generalized q-logarithm function is defined as which for 1 q = , becomes the common natural logarithm.Its inverse is the generalized q-exponential function, given by ( ) which becomes the exponential function for 1 q = .The q-logarithm satisfies the following pseudo additive law ( ) It is to be noted that the classical power and the additive laws for the logarithm and exponential do no longer hold for (4) and (6).Except for 1 q = , in general log log q q x x α α ≠ The Tsallis entropy (3) can be written as an expectation of the generalized q-logarithm as ( ) Let us suppose that there are 3 n ≥ probabilities to be found.Separating the th n probability and renaming it n q , that is, n n p q = , we have Multiplying and dividing by ( ) q q q p q q p q q q q q q p q p q q , the above expression can be written as Since , 1, 2, , 1 .Thus, the above expression (7) becomes S q r q r q q q r q q r q q r q q q r q q ) where ( ) is the expression for Tsallis entropy of the ( ) is the binary entropy function ( ) ( ) ( ) Thus, ( ) ( ) where ( ) The maximum value of Tsallis entropy subject to natural constraint, that is, So, we have 0 log In a similar way, ( ) , being the entropy of ( ) variables with the probability vector r must satisfy ( ) Hence, n q must satisfy the requirements 0 1, The objective of present paper is to find n q subject to conditions (11) and ( 12) so as to obtain next stage en- tropy ( )

−
. It is to be noted that the next iteration's entropy ( ) , may be larger or smaller than

( )
q n S r .The procedure may be iterated until only two variables remain, 1 r and 2 r .The remaining entropy for these normalized variables, , q q S L S q = satisfies the usual binary entropy function To finish the selection of the j q , take 1 1 q r = and 2 2 1 1 q r q = = − as one of the two solutions of (13).The set of scaled values 1 q through n q is obtained at the end.Swaszek and Wali [15] made use of Shannon's en- tropy and to find the probability distribution for the same, provided the following relation between k p 's and k q 's after recursing through the sequential definitions of r vectors., , .
We also use relation (14) to find probability distribution ( ) , , , n p p p  for Tsallis entropy which is close to uniform distribution.

Method A
1) For given , , q k q k S , find the solutions of the following equation . 1 3) Generate the random number k q in the interval [ ] 4) Repeat the above three steps for 1, 2, ,3 k n n = − −  .5) For 2 k = , solve Equation ( 13) for getting 1 r and 2 r .6) Take 1 1 q r = and 2 2 q r = .7) Use equation ( 14) to get probability distribution ( ) , , , n p p p p =  closer to uniform distriburtion.
Note: Before specifying the value of parameter q and entropy q n S , make sure to choose that value of q for which value given entropy is less than or equal to its maximum value which is obtained at uniform distribution, that is, log q q n S n ≤ .L S q S = = .

Numerical
Proceeding on similar lines, we obtain remaining probabilities as depicted in the following Table 1.
Note that the values 1 q and 2 q do not lie in the neighbourhood of 1 n because they are exact solutions of equation of ( 13) and hence is the case for the values 1 p and 2 p .With this exception, the other values, that is, , , , , , p p p p p p are very close to the uniform distribution and the associated entropy is 4.5 bits.

Use of Mathematica Software
The above mentioned problem is also solved using the Mathematica software by using the same input.NMinimize command is used for this purpose which has several inbuilt optimization methods available.Since the problem is to find the discrete distribution ( ) , , , n P p p p =  having a specific Tsallis entropy closer to the , therefore the optimization problem becomes: Minimize ( ) The solution obtained is ( ) 0.141287, 0.141287, 0.141287, 0.141287, 0.141287, 0.141287, 0.141287, 0.0109883 P = for 8 n = .

Relative Errors
The relative error is calculated using the formula ( ) . It is found that relative error in case of probability distribution found by mathematica software is 1.87081 in case of method A. This implies that method A provides an acceptable discrete distribution and hence the method itself is acceptable.

Source Coding
In source coding, one considers a set of symbols , , , n X x x x =  and a source that produces symbols i x from X with probabilities i p where 1 1 . The aim of source coding is to encode the source using an alphabet of size D, that is to map each symbol i x to a codeword i c of length i l expressed using the D letters of the alphabet.It is known that if the set of lengths i l satisfies the Kraft's [16] inequality then there exists a uniquely decodable code with these lengths, which means that any sequence 1 2 Furthermore, any uniquely decodable code satisfies the Kraft's inequality (15).
The Shannon [1] source coding theorem indicates that the mean codeword length is bounded below by the entropy of the source, that is, Shannon's entropy ( ) H P , and that the best uniquely decodable code satisfies ( ) ( ) 1

H P L H P ≤ < +
where the logarithm in the definition of the Shannon entropy is taken in base D. This result indicates that the Shannon entropy ( ) H P is the fundamental limit on the minimum average length for any code constructed for the source.The lengths of the individual codewords, are given by log The characteristic of these optimum codes is that they assign the shorter codewords to the most likely symbols and the longer codewords to unlikely symbols.

Source Coding with Campbell Measure of Length
Implicit in the use of average codeword length (16) as a criteria of performance is the assumption that cost varies linearly with code length.But this is not always the case.Campbell [17] introduced the mean codeword length which implies that cost is an exponential function of code length.The cost of encoding the source is expressed by the exponential average where 0 t > is some parameter related to cost.Minimizing the cost is equivalent to minimizing the monotonic increasing function of t C defined as ( ) L is the exponentiated mean codeword length given by Campbell [12] which approaches to L as 0 t → .
Campbell proved that Renyi [2] entropy forms a lower bound to the exponentiated codeword length t L as ( ) ( ) where or equivalently ( ) subject to Kraft's inequality (15) with optimal lengths given by By choosing a smaller value of α , the individual lengths can be made smaller than the Shannon lengths log , specially for small i p .Similar approach is applied to provide an operational significane to Tsallis [3] entropy through a source coding problem.
From Renyi's entropy of order α where logarithm is taken to the base D, we have ( ) Substituting ( 22) in (3) where parameter q is replaced by α gives Case-II when 1 From ( 24) and (25), it is observed that Tsallis entropy S α forms a lower bound to G α which is nothing but the new generalized length and is a monotonic increasing function of t C .It reduces to mean codeword length L when 1 α → .The optimal codeword lengths are given by Campbell's mean codeword length.G α is not an average of the type ( ) as introduced by Kolmogorov [18] and Nagumo [19] but is a simple expression of the q-deformed logarithm.

Application of Method A
1) Huffman [11] introduced a method for designing variable length source code in which he showed that the average length of a Huffman code is always within one unit of source entropy, that is, ( ) ( ) 1 H P L H P ≤ < + where L is defined by ( 16) and ( ) H P is defined by (1).By using method A as mentioned in Section 2, different sets of probability distributions can be generated which are closer to uniform distribution and which has same Tsallis entropy.The probability distributions thus generated can be used to develop Huffman code and to see that whether the lengths of Huffman codewords satisfies relation ( 24) or (25) where the Tsallis entropy forms a lower bound to generalized length G α .Thus, performance of Huffman algorithm can be judged in case of source coding with Tsallis entropy.
In the following example, Huffman code is constructed using the probability distribution obtained in Table 1.Let 1 2 3 4 5 6 7 8 , , , , , , , x x x x x x x x be an array of symbols with probabilities in decreasing order as shown below where Huffman method is employed to construct an optimal code.So, using the probability distribution generated in Table 1 along with known value of ( ) and above mentioned codeword lengths, value of G α is obtained as 6.2327 which is greater that Tsallis entropy (=4.5 bits), thus satisfies relation (24).
2) The problem of determining an unknown discrete distribution closer to uniform distribution with known Tsallis entropy as discussed in Section 2 can be looked upon as minimum cross entropy principle which states that given any priori distribution, we should choose that distribution which satisfies the given constraints and which is closest to priori distribution.So, cross entropy optimization principles offer a relevant context for the application of method A.

Table 1 .
Probabilities obtained through method A.