An Algorithm for Generating Random Numbers with Normal Distribution ()
1. Introduction
Random numbers are extremely important in all areas of computational science. As a result, any programming compiler comes with some sort of pseudo-random number generator that normally generates random numbers that are uniformly
distributed in the interval
. However, in many situations, one requires random numbers that have nonuniform distribution.
Let us suppose that we want to generate random numbers that are distributed according to the probability density function
. This can be systematically achieved by first finding the cumulative probability distribution function
according to [1] [2],
(1)
and then solving the equation
(2)
for x in terms of r, where r is a uniform random number in the unit interval
. The random numbers x that are generated by this so-called inverse transform method are distributed according to the probability density function
.
For example, consider the exponential probability density function,
(3)
where
is a constant. We have
(4)
Then solving
(5)
we get
(6)
Thus, if r is a uniformly distributed random number in the unit interval
, then x is a random number distributed according to the exponential density function (3).
In principle, the inverse transform method generates random numbers with any probability density function. However, the method relies on two conditions. First, the integral in Equation (1) should be evaluated analytically and, second, one should be able to solve Equation (2) explicitly for x in terms of r. But these are not always possible. For example, the Gaussian or normal probability density function,
(7)
is a function for which the integral in Equation (1) cannot be evaluated analytically. However, based on the Gaussian density function, one can generate a two-dimensional probability density, and then transform it into plane polar coordinates. After some simple algebraic manipulations, one obtains the random numbers
(8)
each of which is distributed according to the Gaussian probability density (7) [1]. Here
is a random number generated according to the exponential distribution (3) and
is uniformly distributed in the interval
. This algorithm is known as the Box-Muller method.
A second method for generating random numbers with non-uniform probability density function if the inverse transform of
fails is the von Neumann rejection method (also called acceptance-rejection method). This algorithm uses a so called comparison function
such that
over the entire interval of the definition of
[3] [4] [5]. Clearly, the simplest comparison function in most cases is a constant as shown in Figure 1. Then, a pair of random numbers
is generated such that
and
. This
Figure 1. Probability density function
and a constant comparison function
for generating random numbers by acceptance-rejection method.
defines a point within the rectangle shown in Figure 1. If
, this point is below the curve
and the random number x is accepted, otherwise, it is rejected. The process is then repeated. The random numbers x thus generated have the probability density function
. Clearly the rejection method works for any arbitrary probability density function.
Because the normal or Gaussian probability distribution is the most encountered distribution in nature, in this work we suggest yet another algorithm for generating random numbers with normal probability density function that is based on the central limit theorem.
The organization of the contents of this article is as follows: In Section 2, we develop the theoretical foundation for the investigation. In Section 3, we discuss random numbers with normal distribution, and suggest a new algorithm for their generation followed by computer simulation results. In Section 4, we compare the suggested algorithm with other algorithms in terms of accuracy and speed. Finally, in Section 5, we present a summary of the current investigation followed by concluding remarks.
2. Theory
Consider a random number x that is distributed according to the normalized probability density function
. The expected value or the mean value of x and x2 are given by [6]
(9)
and
(10)
where
is the domain of
. Then, the variance of the distribution is given by
(11)
Now let us pick n of these random numbers and find their average
(12)
Then, repeat this process a large number of times with the same number n to generate
. The question is, are the new random numbers y as random as the numbers x? In other words, is the probability density function for y the same as
? The answer is no. The new random numbers y are not as random as the numbers x. For example, consider a random number x that is uniformly distributed in the interval
. Figure 2(a) shows the graph of 1000 of these random numbers. We then generate 1000 pairs of these numbers and plot the average of them, which is shown in Figure 2(b). As can be seen from these figures, although both systems have a mean of 0.5, the random numbers generated by the average of pairs of the uniformly distributed numbers are generally closer to their mean than the former ones, and hence they have a different probability distribution function.
Let
be several uncorrelated random variables with corresponding probability density functions
. Let the variances of these random variables be
, respectively. Then according to the Bienaymé formula [7],
(13)
where
is the variance of the random numbers generated by the sum of the random variables
. From this equation, we find the variance of the average
,
(14)
where n is the number of random variables to be averaged. But according to Equation (11), we have
(15)
with similar results for
. Therefore, Equation (14) reduces to
(16)
(a) (b)
Figure 2. (a) Uniformly distributed random numbers in the interval
. (b) Average of two uniformly distributed random numbers in the same interval.
Finally, if all the random variables have the same distribution, we get
(17)
And, if for the sake of simplicity we denote
by
, Equation (14) reduces to
(18)
Furthermore, it can also be shown that the mean of these random numbers is the same as the mean of the original random numbers, i.e.,
(19)
As an example, the normalized probability density function for a random number that is uniformly distributed over the interval
is
. Then,
(20)
and
(21)
Therefore,
(22)
Therefore,
(23)
This explains why the random numbers in Figure 2(b) are less random than those in Figure 2(a). Likewise, if we average n of the uniformly distributed random numbers in the interval
to obtain new random numbers, we obtain
(24)
According to the central limit theorem, the random numbers thus obtained tend toward normal or Gaussian distribution as
[8] [9] [10] [11]. In fact, according to this theorem, the shape of the initial distribution of the random numbers is immaterial and the distribution of the mean approximates a normal distribution if the sample size is sufficiently large, with a mean and standard deviation given by Equations (18) and (19) [12]. It is also worth noting that as
,
, the resulting normal distribution approaches the Dirac delta function [13] [14] [15].
3. Random Numbers with Specific Normal Distribution
Most common computer programming languages come with some sort of built-in pseudo-random number generator that generates numbers in some interval [16] [17]. However, most generators use the linear congruential or power residue method and generate pseudo-random numbers in the interval
, which is perhaps the most convenient interval in many applications [18] [19] [20].
Figures 3-5 show the results of computer simulations for progression toward a normal or Gaussian distribution for
,
, and
starting with a uniform initial distribution in the interval
. In each case 106 random
numbers are generated. The corresponding probability density function for a normal distribution with mean and standard deviation given by Equations (24) is also shown for comparison in each case. As can be seen from Figure 5, the agreement between the sample mean of 20 random numbers with Gaussian distribution is nearly perfect. Therefore, excellent Gaussian approximations can be achieved with fairly small samples.
In Figure 5, the mean and standard deviation of
for the sample means and the Gaussian distribution are both
and
, respectively. This is due to the fact that the mean and the standard deviation in this case are given by Equations (24). Furthermore, because n is an integer, the standard deviation is quantized, with values given by
(25)
But this is certainly a limitation because we may need to generate random numbers that have Gaussian distribution with an arbitrary mean
and standard deviation
that are not necessarily one of these quantities.
Let us suppose that we need to generate random numbers with Gaussian distribution given by Equation (7), with arbitrary values of the mean
and standard deviation
. We start by generating uniform random numbers in the interval
, and take the average of n of these numbers, say 20 of them, to generate new random numbers with Gaussian distribution as in Figure 5. The random numbers thus generated have a mean 1/2 and a standard deviation
.
To adjust the generated random numbers to match the specified Gaussian distribution, we have to change two things; the mean
and the standard deviation
. The standard deviation needs to be adjusted first. To do so, we divide each random number by
and multiply by the required
. Thus, we multiply each of the random numbers by a factor
, where
(26)
This changes the standard deviation to the required value. But this also changes the mean from 1/2 to
(27)
We now have to shift the mean to the required value
. This is done by shifting each of the new random numbers by
, given by
(28)
Figure 3. Probability density generated by averaging two random numbers from a uniform distribution in the interval
(the bullets). The solid curve is the Gaussian function with the same mean and standard deviation calculated from Equation (24).
Figure 4. Same legends as in Figure 3 but with averaging three random numbers.
Figure 5. Same legends as in Figure 3 but with averaging twenty random numbers.
The random numbers thus generated have the required normal or Gaussian distribution.
Combining the above transformations, if x is the average of n uniformly distributed random numbers in the interval
, then
obtained through the transformation
(29)
is distributed according to the normalized Gaussian distribution
(30)
As an example, suppose we want to generate random numbers having a normal probability distribution with
and
, i.e.,
(31)
To do so, we generate random numbers x so that each is the average of 20 uniformly distributed random numbers in the interval
. We then transform these into new random numbers
according to Equation (29), which reduces to
(32)
These calculations can be performed using any computer programming language, such as Python. The random numbers
thus generated have the required probability distribution, as shown in Figure 6.
4. Comparison with Other Algorithms
As stated earlier, throughout the computer simulations 106 random numbers were generated. These numbers were then placed in bins, each of width
. The bin contents were then normalized and plotted in the figures described above.
In order to compare the suggested algorithm with the Box-Muller and the von Neumann rejection methods, we investigated two measures; accuracy and efficiency. For accuracy, we considered the root-mean-square deviation between the bin contents and the values of the corresponding normal distribution for each algorithm. In doing so, we compared the probability density functions in different intervals about the mean of the distribution, namely,
,
,
,
,
, and
. We have considered several intervals because the fluctuation of the bin contents is normally higher near the peak of the distribution. The results are shown in Table 1 for
, which clearly shows the accuracy of the three algorithms is nearly identical in all cases.
Figure 6. Probability density function for a Gaussian distribution with
and
generated by the algorithm described in this article. The solid curve is the corresponding Gaussian function given by Equation (31).
Table 1. Root-mean-square deviation between the bin contents and the values of the corresponding normal distribution for the suggested algorithm, the Box-Muller, and the von Neumann rejection algorithms.
As a measure of efficiency, we compared the central processing unit (CPU) times of the computer during the execution of the program using each algorithm. The results consistently showed that our algorithm is slower than the Box-Muller method by about a factor of two, but faster that the von Neumann rejection method by about a factor of four.
5. Summary and Conclusions
Since random numbers are extensively used in various fields, such as computational physics and other science, we have reviewed them in this work. But specifically we have focused on machine generated pseudo-random numbers that
are in most cases uniformly distributed over the interval
.
However, in many situations non-uniform random numbers, often with normal or Gaussian distribution, are needed. These random numbers are generated by two common methods, namely, the Box-Muller and the von Neumann rejection methods, which we have briefly reviewed in this article.
We have then suggested a third, yet very simple and efficient, algorithm for generating random numbers with a specified normal or Gaussian distribution. The suggested algorithm relies on a modified central limit theorem, and uses the
average of uniformly distributed random numbers in the interval
. The
algorithm generates random numbers with the required normal probability distribution function very accurately using a small number of uniform random numbers in the averaging process.
In comparison with other methods, the suggested algorithm is as accurate as the Box-Muller and the von Neumann rejection algorithms. In terms of the efficiency or the speed with which the random numbers are generated, however, our algorithm is slower than the Box-Muller method by about a factor of two, but faster than the von Neumann rejection method by about a factor of four.
Considering the accuracy, simplicity, and the efficiency of the suggested algorithm, it can justifiably compete with the existing methods.
At this time, the authors intend no future work based on the current study.
Acknowledgements
This work was supported by a URAP grant from the University of Wisconsin-Parkside.