1. Introduction
Zipf’s law and Benford’s law are long-tail rank distributions appearing in many copious statistical ensembles [1]. Both laws are considered empirical laws. In 1881, Newcomb [2] found that the probability distribution
of the decimal
digits in the 1st digits of the logarithmic table obeys
, where
. Benford [3] found in 1938 that Newcomb’s distribution applies to many more ensembles and not only to the logarithmic table. Later [4] [5] the law generalized for N ranks to be,
. (1)
Eleven years later, in 1949, Zipf [6] discovered that in long texts, in several languages, the most frequent word appears twice as much as the second most frequent word, the second most frequent word appears twice as much as the fourth frequent word, and so on. The Zipfian distribution, similarly to Benford’s law, appears in many ensembles, like populations of cities, bestsellers lists, etc. Zipf’s law can be written [7] [8] as,
, (2)
where
is the Nth harmonic number.
Both Zipf’s law and Benford’s law are obtained from the maximum entropy distribution of indistinguishable balls in N distinguishable boxes, where the boxes are the ranks and the number of the balls is much larger than the number of boxes [8]. In Figure 1, the Benford distribution and Zipf distribution for 10 ranks are plotted. It is seen that Benford’s law and Zipf’s law are similar but not identical.
Hereafter, we derive both laws using basic probabilistic tools and explain the differences between them. In addition, we derive the Pareto 20 - 80 rule of thumb for Benford’s law and discuss their origin and limitations.
2. Zipf’s Law
Suppose that there are N identical biscuits and a mouse in a closed space. The mouse eats every day one biscuit. What is the probability of a biscuit being eaten on the d day?
The maximum survival days n that a biscuit has at the day d is,
,
where
.
Figure 1. The red bars are the Zipfian distribution and the blue bars are Benford’s law distribution for 10 ranks.
On the first day,
, the biscuit has maximum
days to survive. Where
, the biscuit has only
day. The probability p of the biscuit to be eaten is inversely proportional to n, namely,
, therefore, the normalized probability distribution is,
,
which is Zipf’s law.
We see that the probability of a biscuit being eaten on the day n obeys Zipf’s law. This model, which is similar to the coupon collector problem, is identical to the word distribution of long texts. Suppose that one wants to write a text of N
words. The first word has a probability of
, the second word
, etc. In the discussion, we explain why the Zipf distribution is so general.
3. Benford’s Law
Benford’s law is obtained by applying the Riemann sum to Zipf’s law [8] [9]. If we assume that is continuous, then,
and
Substitute these integrals in Zipf’s law (Equation (2)) and we obtain Benford’s law (Equation (1)).
Benford’s law seems to approximate the more accurate Zipf’s law. However, under certain conditions, Benford’s law is more accurate than Zipf’s law. For example, suppose that a pig that eats M biscuits per day replaces the mouse in the example above. In this case, a day becomes a rank that contains M biscuits. Since in a day there are M biscuits, the probability of a biscuit m to be eaten in the n day is,
. (3)
The probability to be eaten in the whole nth day is
.
Since
, therefore,
.
for
, we can use the approximation
,
where
is the Euler-Mascheroni constant. Therefore,
. (4)
Equation (4), when renormalized, yields Benford’s law. It is seen that Benford’s law is obtained when there are sub-distributions inside Zipf’s ranks.
4. Pareto 20 - 80 Rule of Thumb
In 1906, Italian economist Vilfredo Pareto [10] observed that 20% of the people in his country owned 80% of the nation’s wealth. That rule was found to apply with uncanny accuracy to many situations and be useful in many disciplines, including the study of business productivity. Hereafter we show that the Pareto principle can be easily calculated from Benford’s law. To do so we have to find the rank
which is the sum of the probabilities up to
is equal to the sum above it. In Benford’s law, the rank
obeys,
,
which yields;
, or
. (5)
The Pareto ratio is simply,
(6)
Therefore
is the fraction of the ranks that have equal probability to the rest of the ranks and according to the Pareto rule is 0.2.
Zipf’s law does not fit for Pareto ratio calculation as the distribution within the ranks does not exist and therefore none-integer
has no meaning. Benford’s law is used for fraud detection of financial reports [11] [12]. However, Benford’s distributions appear in many other statistics, of which a notable one is wealth distribution [13]. Pareto 20 - 80 distribution and Gini inequality index in free economies are in agreement with Benford’s law [14]. However, as was shown Zipf’s law, Benford’s law and Pareto’s rule are sensitive to the number of ranks N. Namely, the same distribution of probabilities yields different ratios between the ranks probabilities when N is changed. In Figure 2, we see that
Figure 2.
is the fraction of the ranks that has the same probability as the rest of the ranks for the Benford distribution. The Pareto’s 20 - 80 rule
is valid in the vicinity of N = 10 ranks.
around
, the ratio 20 - 80 is a pretty good approximation of Benford’s law distribution which fits better for the economy in which the incomes within the ranks are varying.
5. Discussion
The unequal probability distribution of the power laws is counterintuitive. If all the ranks have an equal probability to have an object, why they don’t have an equal amount of objects? The explanation comes from statistical mechanics, An ensemble of ranks and their probabilities to have indistinguishable objects is analogous to a microcanonical ensemble of N boxes and
balls, where
is the average number of balls in a rank. The thermodynamic microcanonical ensemble conserves material, volume, and energy. In the boxes and balls ensemble, the material is the boxes and their number N represents the conservation of volume. The number of balls represents the conservation of energy. According to the second law, in equilibrium, both the probabilities of the boxes to have a ball is equal and, all the microstates’ probabilities are equal. A microstate (a state of the ensemble) is a distinguishable configuration of all the balls in all the boxes [7]. These requirements are an outcome of the second law, which one of its definitions states that in equilibrium the entropy is maximum. Planck calculated the distribution of the balls in the boxes in 1901 [15] [16]. He maximized the entropy of a set of distinguishable oscillators having an average energy kBT, and each ball (photon) had an energy hv. Where kB is the Boltzmann constant, T is the temperature, h is the Planck constant, and v is the photon’s frequency. The famous Planck result is,
.
In the Planck equation, n is the occupation number of an oscillator in an ensemble in which the average energy is kBT, and each photon has energy hv,
therefore
is the average number of photons in an oscillator. If we designate
we can write the Planck equation as,
. (7)
In equilibrium for a given temperature and frequency all the oscillators should have the same number of photons
. Since v and T can have any value,
is not necessarily an integer, however, quantum mechanics enables, according to Equation (7), only an integer number of photons
to exist. Therefore we can calculate the average number of balls
as a function of the integer number of balls. In the case that
,
, we obtain that
. This is the classical result in which the occupation number and the number of balls are equal. Thus the probability is given by,
,
that when normalized to N boxes, yields Zipf’s law as in Equation (2). In the general case Equation (7) yields
. (8)
When Equation (8) is normalized to N boxes it becomes Benford’s law of Equation (1).
In the case when
,
, or
, the probability to find n balls, namely
. (9)
When normalized Equation (9) yields the canonical distribution namely.
.
The normalization factor
is the canonical partition function, which yields the central limit theorem in the limit of very small
[9].
6. Summary
Zipf’s law, Benford’s law, and Pareto’s 20 - 80 rule are considered empirical laws. We argue that Zipf’s law is the rank distribution of indistinguishable objects, while Benford’s law is the rank distribution in which the objects within the rank are distinguishable. Pareto’s 20 - 80 ratio, was found to be in good agreement with Benford’s law in the vicinity of 10 ranks. It has also been argued that all these distributions, including the central limit theorem, can be derived from Planck’s law and are the result of the quantization of energy. This argumentation may be considered a physical origin of probability.
Acknowledgements
I thank H. Kafri and E. Fishof for reading the manuscript and for their useful comments.