Optimum Probability Distribution for Minimum Redundancy of Source Coding ()
Keywords:Mean Codeword Length; Uniquely Decipherable Code; Kraft’s Inequality; Entropy; Optimum Probability Distribution; Escort Distribution; Source Coding
1. Introduction
Any message that brings a specification in a problem which involves a certain degree of uncertainty is called information and it was Shannon [1] who named this measure of information as entropy. In coding theory, the operational role of entropy comes from the source coding theorem which states that if
is the entropy of the source letters for a discrete memoryless source, then the sequence of source outputs cannot be represented by a binary sequence using fewer than
binary digits per source digit on the average, but it can be represented by a binary sequence using as close to
binary digits per source digit on the average as desired. To be clearer, let us consider the discrete source
that emits symbols
with probability distribution
where
. The aim of source coding is to encode the source using an alphabet of size
, that is, to map each symbol
to a codeword
of length
expressed using the
letters of the alphabet. It is known that if the set of lengths
satisfies Kraft’s [2] inequality
(1.1)
then there exists a uniquely decodable code with these lengths, which means that any sequence
can be decoded unambiguously into a sequence of symbols
. In this respect, Shannon [1] proved the first noiseless coding theorem for the uniquely decipherable code in the form of following inequality
(1.2)
where
is a Shannon’s entropy and
is the mean codeword length.
Later, Campbell [3] and Kapur [4] proved the source coding theorems for their own exponentiated mean codeword length in the form of following inequalities
(1.3)
and
(1.4)
respectively, where
![](https://www.scirp.org/html/htmlimages\11-7401895x\89532090-9669-4609-b886-f0670be81327.png)
is Campbell’s [3] mean codeword length,
![](https://www.scirp.org/html/htmlimages\11-7401895x\3b84ac34-aced-4f29-b25e-a92ed47d0eb5.png)
is Kapur’s [4] mean codeword length and
![](https://www.scirp.org/html/htmlimages\11-7401895x\412ecf27-80cf-4b07-8286-189ab3bf7e59.png)
is Renyi’s [5] measure of entropy.
Recently, Parkash and Kakkar [6] introduced two mean codeword lengths given by
(1.5)
and
(1.6)
Further, the authors provided two source coding theorems which show that for all uniquely decipherable codes, the mean codeword lengths
and
satisfy the relation:
(1.7)
and
(1.8)
respectively where
![](https://www.scirp.org/html/htmlimages\11-7401895x\79a5db07-0c6f-4337-b426-67a9df093550.png)
is a Kapur’s [4] two parameter additive measure of entropy and
![](https://www.scirp.org/html/htmlimages\11-7401895x\a606fa06-e9fd-439a-aab2-82bc54191125.png)
is measure of entropy developed by Parkash and Kakkar [6].
This is to emphasize that in the entire literature of source coding theorems, one can observe that the mean codeword length is lower bounded by the entropy of the source and it can never be less than the entropy of the source but can be made closer to it. This phenomenon provides the idea of absolute redundancy which is the number of bits used to transmit a message minus the number of bits of actual information in the message, that is, the mean codeword length minus the entropy of the source. The objective of the present communication is to minimize this redundancy in order to increase the efficiency of the source encoding. For this purpose we have made use of the concept of escort distribution as follows:
If
is the original distribution, then its escort distribution is given by ![](https://www.scirp.org/html/htmlimages\11-7401895x\a38fb5bc-b462-44b4-a5fe-2bef6fa91a7a.png)
where
for some parameter
. Many researchers including Harte [7], Bercher [8,9], Beck and Schloegl [10] etc. used this distribution in their respective findings.
The aim of the present paper is to obtain the optimum probability distribution with which the source should deliver messages in order to minimize the absolute redundancy. To obtain our goal, we have taken into consideration the above mentioned generalized mean codeword lengths. Moreover, the upper bound to these codeword lengths has been found for Huffman [4] encoding.
2. Optimum Probability Distribution to Minimize Absolute Redundancy
Let us assume that for discrete source
that emits symbols
with probability distribution
, the codewords
having lengths
, have been obtained using some encoding procedure on noiseless channel. Further, we assume that entropy of the source is
and average codeword length is
. Since from (1.7), we have
, therefore, the average redundancy of the source code is given by
(2.1)
where
and
.
In order to minimize the average redundancy, we resort to the following theorem:
Theorem 1: The optimum probability distribution that minimizes the absolute redundancy
of the source with entropy
and the mean codeword length
is the escort distribution, given by
(2.2)
Proof: To minimize the redundancy, we need to minimize
(2.3)
subject to the constraint
(2.4)
To prove this, we first of all, find the extremum of
which is equivalent to extremizing
and then use the fact that
is minimum or maximum will depend upon the value of parameter
.
So, in order to extremize
, we consider the Lagrangian given by
![](https://www.scirp.org/html/htmlimages\11-7401895x\ef10a84d-d471-4f6e-8066-dade5666c968.png)
where
is Lagrange’s multiplier.
Now
(2.5)
Letting
, we get
(2.6)
Substituting (2.6) in (2.4), we get
(2.7)
Substituting (2.7) in (2.6), we get the result (2.2).
Now,
(2.8)
We see that
for
and
for
.
Also,
![](https://www.scirp.org/html/htmlimages\11-7401895x\254d9785-405d-4b6a-b596-3fa34231db39.png)
So,
has minimum value for
and maximum for
.
Thus,
has minimum value for
and maximum for
and consequently, observing the function
, we see that it has minimum value for
,
.
Thus, the minimum value is given by
. (2.9)
Again, the necessary condition for the construction of uniquely decipherable codes is given by
(2.10)
Therefore, from (2.9), we have
.
NOTE: It is to be noted that
if the source is Huffman [11] encoded since for the Huffman encoding, we have
. (2.11)
Therefore, for this case, (2.2) becomes
(2.12)
Similarly, if we consider the codeword length
which satisfies the relation
, then the absolute redundancy of the source code in this case is given by
![](https://www.scirp.org/html/htmlimages\11-7401895x\ce12dcd8-fa66-44ea-8b72-dd6aa5223096.png)
where
and
.
Theorem 2. The optimum probability distribution that minimizes the absolute redundancy
of the source with entropy
and mean codeword length
is the escort distribution, given by
(2.13)
Proof: We will find the extremum of
which is equivalent to extremizing
subject to constraint
(2.14)
Let us consider the Lagrangian given by
(2.15)
where
is a Lagrange’s multiplier.
For an extremum, let
, that is,
(2.16)
Using (2.14), we get
(2.17)
Substituting (2.17) in (2.16), we get (2.13).
Also,
![](https://www.scirp.org/html/htmlimages\11-7401895x\d3efae45-82d7-4131-aa96-aa92adf4729c.png)
and
![](https://www.scirp.org/html/htmlimages\11-7401895x\b9a01411-c917-4ed2-8388-bade091c95e4.png)
So,
reaches its minimum value when
and is given by
![](https://www.scirp.org/html/htmlimages\11-7401895x\5532a6c9-d9bb-4c73-86d8-20cca5a1aba2.png)
that is,
![](https://www.scirp.org/html/htmlimages\11-7401895x\a7fabc7a-84aa-407c-8c77-196ab3c8ffeb.png)
Note: Again in this case also, if the source is Huffman [11] encoded, then the probabilities are given by ![](https://www.scirp.org/html/htmlimages\11-7401895x\59c5b678-7c4c-42bd-a699-82680a53943d.png)
Next, we will find the upper bound on the codeword lengths
and
when the source is Huffman encoded.
Theorem 3. The exponentiated codeword length
satisfies the following inequality
(2.18)
if the source is encoded using Huffman procedure.
Proof: The exponentiated codeword length
can be written in the following form
(2.19)
where
.
Considering (2.12), (2.19) becomes
(2.20)
where
.
We need to find the extremum of
subject to constraint
(as the source is encoded using Huffman Procedure).
For this purpose, we first of all, find the extremum of
which is equivalent to extremizing
and then use the fact that
is minimum or maximum depending upon the value of parameter
.
So, we consider the Lagrangian given by
(2.21)
where
is a Lagrange’s multiplier Put
, (2.21) becomes
![](https://www.scirp.org/html/htmlimages\11-7401895x\0b7c4162-6fb7-4f06-8c33-17c33ebcd19c.png)
Letting
, we get
(2.22)
Now,
gives
(2.23)
Using (2.23) in (2.22), we get
![](https://www.scirp.org/html/htmlimages\11-7401895x\e29b8614-62a5-41e9-a5bb-893852deb1f8.png)
that is, ![](https://www.scirp.org/html/htmlimages\11-7401895x\acb695cf-5aa9-46d4-b68e-ab1a913c81e9.png)
Now,
![](https://www.scirp.org/html/htmlimages\11-7401895x\7748b580-afb0-44cf-afbc-05e8d60682b1.png)
We see that
for
and
for
.
Also,
![](https://www.scirp.org/html/htmlimages\11-7401895x\ee442e45-e2d0-45d4-aa30-c6120c588b82.png)
So,
has minimum value for
and maximum for
.
Therefore,
has minimum value for
and maximum for
and consequently, observing the exponentiated mean codeword length
, we see that it has maximum value for
,
.
Thus, the maximum value is given by
.
Theorem 4. The mean codeword length
is upper bounded by
, that is,
(2.24)
if the source is encoded using Huffman procedure.
Proof: The exponentiated codeword length
can be written in the following form
(2.25)
We need to find the extremum of
subject to constraint
(as the source is encoded using Huffman Procedure).
So, we consider the Lagrangian given by
(2.26)
where
is a Lagrange’s multiplier .
Letting
, we get
(2.27)
Since
, we have
(2.28)
Substitute (2.28) in (2.27), we get
![](https://www.scirp.org/html/htmlimages\11-7401895x\ee3a05a1-f1fa-4dbf-ab4f-920252ef2a4e.png)
Now,
![](https://www.scirp.org/html/htmlimages\11-7401895x\42407d34-b329-4492-ae7c-3e6229748756.png)
Also,
![](https://www.scirp.org/html/htmlimages\11-7401895x\24ae5fac-7611-453b-8a87-fca58eadeed1.png)
So, the mean codeword length
has maximum value when
, and is given by
.
Note-I: For the case of Campbell’s codeword length
, we have from (1.3),
. So, the average redundancy of the source code in this case is given by
![](https://www.scirp.org/html/htmlimages\11-7401895x\09f80663-3662-4f11-a8dc-8e160e743bc0.png)
where
and ![](https://www.scirp.org/html/htmlimages\11-7401895x\dd6c71be-989e-4fc3-9470-a37be3857df1.png)
The absolute redundancy in the case of Campbell’s [3] mean codeword length is the same as in case of exponentiated mean codeword length
developed by Parkash and Kakkar [6] as given in (2.1). Thus, we see that similar results as proved in theorem (2.1) and theorem (2.3) hold for Campbell’s case also.
Note-II: Absolute redundancy when we use Kapur’s[4] mean codeword length is given by
![](https://www.scirp.org/html/htmlimages\11-7401895x\3b31de71-bdf6-4a81-9c69-54aefd108dca.png)
where ![](https://www.scirp.org/html/htmlimages\11-7401895x\bfff6254-99de-4130-b0e0-009da4f66c82.png)
Theorem 5: The optimum probability distribution that minimizes the absolute redundancy of the source with entropy
and mean codeword length
is given by
. (2.29)
Proof: To minimize the redundancy, we need to minimize
(2.30)
subject to the constraint
(2.31)
To prove this, we first of all find the extremum of
which is equivalent to extremizing
and then using the fact that
is minimum or maximum depending upon the value of parameter
.
So, in order to extremize
, we consider the Lagrangian given by
![](https://www.scirp.org/html/htmlimages\11-7401895x\7c1a3639-a4e4-4678-b621-98ff8b6f168c.png)
where
is Lagrange’s multiplier.
(2.32)
Letting
, we get
(2.33)
Substituting (2.33) in (2.31), we get
(2.34)
Substituting (2.34) in (2.33), we get the result (2.29).
Now, ![](https://www.scirp.org/html/htmlimages\11-7401895x\0b83f6c0-a3dd-42c3-b88b-9d0071483171.png)
We see that
for
and
for
.
Also, ![](https://www.scirp.org/html/htmlimages\11-7401895x\305048b7-56eb-41ac-8fb4-5040ee46d4c3.png)
So,
has maximum value for
and minimum value for
.
Therefore,
has maximum value for
and minimum value for
and consequently observing the function
, we see that it has minimum value for
,
.
The minimum value is given by
.
Theorem 6. The Kapur’s [8] mean codeword length
satisfies the following inequality
(2.35)
if the source is encoded using Huffman procedure.
Proof: Proceeding as in Theorem 2.3, we can prove the Theorem 6.
Acknowledgements
The authors are thankful to Council of Scientific and Industrial Research, New Delhi, for providing the financial assistance for the preparation of the manuscript.