1. Introduction
This article demonstrates how the distribution of any sample statistic meeting the conditions of the Central Limit Theorem (CLT) [1] and having the first four finite moments can be approximated with high accuracy by considering two more terms of the corresponding Edgeworth expansion [2]; this then allows us to use the correspondingly improved approximation even with relatively small sample sizes. The key component essential for the new technique to achieve this goal lies in the proper utilization of cumulants of a distribution; the article’s first section provides a summary of relevant formulas.
The second section then explains how cumulants help to find higher moments of a sample mean, which then enables us to extend the usual Normal approximation of its distribution by two extra terms (as done in Section 3), to match the distribution’s skewness and kurtosis. This yields a substantial improvement in the resulting accuracy, as illustrated by several examples.
We then proceed (in Section 4) to show how the same approach applies to any function of a sample mean, and how to utilize this method to find a specific function to remove the resulting skewness (in Section 5), automatically achieving a further, often dramatic, improvement. Section 6 demonstrates how to extend the new technique to a function of several sample means.
2. Distribution’s Cumulants
It is well known that expanding the moment generating function (MGF), defined as
, of a random variable (RV)
in powers of
provides an easy way of computing simple moments, defined by
, of the corresponding distribution, thus
(1)
This implies that, when similarly expanding
(2)
where
is the expected value of
, the coefficients of this expansion are
, i.e. central moments of the same distribution.
Finally, expanding the following cumulant generating function (CGF)
(3)
defines the distribution’s cumulants [3], denoted
. Since MGF of a sum of independent RVs is a product of the individual MGFs, the corresponding CGF becomes the sum of the individual
functions; this implies that, to get the
cumulant of an independent sum, all we have to do is to add the individual
’s.
The following formula
(4)
implies that
(5)
which shows that all cumulants (with the exception of
) are functions of central moments. For the first several such cumulants we get:
(the variance),
,
, etc.
When similarly expanding
in powers of
, the coefficient of
yields
, resulting in
(6)
Note that the coefficients of individual terms in each such
expansion equal to the number of different partitionings of a group of
distinct objects into subsets whose sizes are given by the corresponding
indices, assuming that interchanging subsets of the same size is inconsequential (e.g. dividing 6 people into three groups of 2 people each can be done in
ways, giving us the coefficient of
in the expansion of
). Also note that subsets of size one are not allowed.
Often, we need cumulants of a function of
say
; in many such cases it becomes impossible to find the MGF of
in an analytic form. Nevertheless, it is usually still possible to compute the first few moments of
(by numerical integration, if necessary) and convert them to the same number of cumulants.
EXAMPLE 1: To get the first four cumulants of
, where
has Cauchy distribution with the median of 0 and semi-inter-quartile range of 1, we first find the first four simple moments of
by
(in this case, it can be done analytically), getting
and
when
and 4 respectively. Expanding
then yields the first four cumulants of
(note that, coincidentally,
).
EXAMPLE 2: Similarly, when sampling Gamma distribution whose PDF is given by
and aiming to get the first four cumulants of
, we first evaluate the corresponding first four simple moments of
and then convert them, in the same manner, to cumulants; this results in
,
,
,
for
to 4, where
is the first derivative of
.
3. Moments of a Sample Mean
Suppose that, instead of considering a single
, we take a random independent sample (RIS) of size
from the
distribution and seek the moments of the corresponding sample mean
(7)
To find the
central moment of
(a non-trivial task, unless cumulants are used) which we denote
, we first notice that finding the corresponding cumulants (up to and including the
) is substantially easier, since the
cumulant of
is simply
. This result must be further divided by
to account for the denominator in the definition of
, thus leading to the following simple formula
(8)
We can now find
since we know how to express moments in terms of cumulants (what was true for
is still true for
). Thus, for example
where we note that individual terms of this expansion decrease with
in a different manner (the first term does it a lot faster than the last one). Also note that, when the distribution of
is symmetric (its PDF being an even function of
), both
and
(and their
counterparts) are equal to zero for every positive
In summary,
can be found as the coefficient of
in the expansion of
(9)
where
is the MGF of a single
.
4. Central Limit Theorem (CLT)
Consider RIS of size
from a distribution with a mean of
, standard deviation of
(both must be finite), and its MGF denoted
. CLT states that the distribution of
(10)
tends, as
, to that of the standardized (0 mean, variance of 1) Normal distribution.
Proof: The MGF of
is found by taking the MGF of
, namely
(11)
and raising it to the power of
while (at the same time) replacing
by
; this results in
(12)
where the RHS clearly tends, as
, to
of the
distribution.
5. Edgeworth Expansion
To achieve a better approximation (called Edgeworth), we expand the second factor of (12) in powers of
up to and including
-proportional terms, thus getting
(13)
where
and
are the distribution’s skewness and excess kurtosis, respectively.
To extend CLT by these terms, we need to convert the resulting MGF to the corresponding probability density function (PDF); in our case, it is sufficient to find out how to convert
, needed with
and 6 only. To do that, we start with the well known result (used to compute the MGF of standardized Normal distribution) of
(14)
and realize that, in general
(15)
(using by-part integration) when
is a polynomial. Starting with
, this yields
, thus proving that
converts to
. Using (15) repeatedly, we get
(16)
letting us convert
to
for all the
values needed in our approximation (
is usually referred to as Hermite polynomial of degree
).
The new, improved (even though still approximate) PDF of
is thus
(17)
The formula’s error decreases with
, unlike that of the basic Normal approximation, whose error decreases with only
. To be able to construct the new approximation, the sampled distribution’s first four moments need to be finite.
The corresponding cumulative distribution function (CDF) of
is given by
(18)
where
is the CDF of the standardized Normal distribution. This is clear from
where the last step is a result of a by-part integration of
. Using (18) to find a specific quantile of the
distribution (not a goal of this article) would lead to a modified (due to different standardization of
) version of Cornish-Fisher expansion [4].
It is possible to derive more accurate formulas by including further terms of the Edgeworth expansion; this would lead to results of high complexity and diminishing returns (we do not pursue it any further).
EXAMPLE 3: Suppose
has an exponential distribution with
. This time, rather uniquely and deliberately (to test the accuracy of the new approximation), we know the exact PDF of
to be
(19)
Based on
, we can easily find
,
and
; the corresponding Edgeworth approximation to the PDF of
then reads
(20)
which can be readily transformed into the PDF of
, if desired.
In the following table, we compare the maximum errors of Edgeworth and Normal approximations for three different values of
.
n |
Edgeworth |
Normal |
10 |
0.42% |
6.7% |
20 |
0.14% |
4.6% |
30 |
0.045% |
3.2% |
One can see that the improvement achieved by the new technique is truly substantial; the table also gives a good demonstration of the new error decreasing with
, instead of
of the last column. Note that a graph of the difference between exact and approximate PDFs always alternates between positive and negative areas; the largest of these (in absolute value) defines the maximum error.
Also note that (20) can be confirmed by directly expanding (19) in powers of
.
EXAMPLE 4: Sampling Uniform distribution (with values from 0 to 1) whose MGF is
(note the removable singularity at
) and whose first four cumulants are
,
, 0 (due to its symmetry with respect to the mean value) and
, the approximate PDF of
is thus
Since the exact PDF of
is also known (it is called Irwin-Hall distribution) and easily transformable to
, we can still compare the maximum error of the Normal approximation, which is 0.56% when
(tolerable but large) to that of the last formula, which equals 0.016% for the same value of
. Note that this large improvement has been achieved by only a single extra
-proportional term; not surprisingly, symmetric distributions result in simpler and more accurate approximations.
6. Transforming
In this section, we discuss how to find a similar approximation for the distribution of a function of
(still asymptotically Normal under most circumstances, specified shortly).
The first step is to expand the function (denoted
) with respect to its argument
(now taken to be a strictly Mathematical variable) at
(the corresponding expected value), getting
(21)
To be able to do this, we require the function to be sufficiently smooth (the three derivatives must exist and be finite); for our purpose, the value of
must also not equal to zero. Note that to achieve
accuracy of the resulting PDF (i.e. having an error proportional to
), we do not need higher than cubic terms in this expansion.
We already know that
(22)
where
stands for terms whose contribution goes beyond the desired
accuracy, and which can be thus discarded. Also note that only the first four cumulants of the sampled distribution are needed. The last set of formulas is then used to find the first four cumulants of
, each to sufficient accuracy in terms of the corresponding
expansion. Denoting them
to
we get, after some algebra (all
derivatives are evaluated at
)
(23)
For efficient execution of the algorithm, the following is of utmost importance: to achieve the overall
accuracy of the resulting approximation,
needs to be only
accurate (thus ignoring terms beyond quadratic in the
expansion), while both
and
require first expanding the corresponding powers of
up to and including quartic terms in
(to make each result
accurate). Finally, to reach the
accuracy of
, the 4th power of
needs to be expanded up to hexic terms, before taking its expected value.
The approximate PDF of
(24)
is then given by
(25)
where
(26)
is sufficiently accurate for this part of the result (but not when computing
).
Note that the basic Normal approximation in this case states that
(27)
has, approximately, the standardized Normal distribution.
We should point out that, to find Edgeworth approximation to the PDF of
, where
is a function of
requires the
cumulants in (23) to be the cumulants of
, not of
; the rest of the procedure remains the same.
EXAMPLE 5: To get an approximate PDF of
(implying that
) when sampling Exponential distribution with
, we start with
,
and
. Since we already know the first four cumulants of the sampled distribution, it is now routine to compute
,
,
and
, resulting in the following approximate PDF of
The maximum error of this approximation (when
) is 0.11%; a significant improvement over the error of basic Normal approximation, which equals 4.3%.
EXAMPLE 6: Similarly, when sampling Gamma distribution of Example 2, the maximum-likelihood (ML) estimator of
is the (numerical) solution to
(28)
where
(a function of
) denotes this estimator. Since this function is defined only implicitly (
does not have an analytic inverse), getting the corresponding
values (coefficients of the (21) expansion, where
stands for the
function,
is the order of its derivative and the evaluation is done at
which, in this context, are all implicit operations) is now more difficult. Expanding the RHS of
where
, results in
with the
derivatives deemed evaluated at the true value of
. Matching this to the RHS of (28) yields
,
,
and
. Based on (23), we can now build the corresponding Edgeworth approximation for the PDF of
, given values of
and
(we stop here, as the point of this example was only to show how to expand an implicit function of a sample mean; the rest is routine). We should also mention that there is a more direct way of building an approximate PDF of ML estimators [5], not to be discussed in this article. ◼
7. Removing Skewness
The technique of the previous section can be utilized in the following way: instead of being given a specific function of
, we may seek a function
which eliminates (to the level of our approximation) the third central moment (and thus skewness) of
. The reason for doing this is that the resulting PDF is simpler and more accurate than the one built directly for
, thus providing the most accurate approximation yet for the distribution of
itself (after the corresponding back transformation).
When the
distribution is parameter-free and positive, this goal can always be achieved by
being a simple power of
.
Proof: This can be seen from the fact that the
cumulant of
is proportional to
and can be made equal to zero by solving
or, equivalently,
(29)
The simplest solution is
(30)
EXAMPLE 7: For Exponential distribution with
,
and
, this would be achieved by
which leads to
,
and
, resulting in
,
,
(by design) and
. Therefore
(31)
has the following (still approximate, but now highly accurate) PDF
(32)
This result is easily transformed back to a PDF of
(substitute the RHS of (31) for
in (32) and multiply the result by ). The maximum error of this approximation when
is 0.036%; a very substantial improvement yet over the approximation of Example 3.
To use results of this example when sampling Exponential distribution with a mean of
, we just replace
in (31) by
, the rest stays the same. ◼
Another possibility of removing skewness of a parameter-free distribution (no longer necessarily positive) is to realize that the
cumulant of
is proportional to
and can thus be made equal to zero by
(33)
EXAMPLE 8: For Gumbel distribution whose PDF is
and whose first four cumulants are
(Euler gamma),
,
and
respectively, this implies that
where
results in
,
and
while eliminating skewness. This means that the PDF of
is, to a good accuracy, given by
(34)
easily transformed into an approximate PDF of
.
Unfortunately, this time we cannot compare our solution with an exact answer which is impossible to derive analytically; the only way to establish its maximum error is to generate an empirical distribution of
, using as many RIS’s as possible (in this case, we have used four million samples of size 10). Luckily, the resulting error of this distribution is smaller than the error of our Edgeworth approximation; this enables us only to estimate the maximum error of the latter not to exceed 0.25% (when
), while that of the basic Normal approximation is almost 3.5% (beyond acceptable).
We should mention that the above simulation took less than one second to complete, while converting the resulting grand sample of four million values into the corresponding empirical PDF required about 35 seconds; both computations were done by Mathematica. ◼
Note that the same skewness removal (and resulting accuracy improvement) can be achieved by applying a similarly designed
function to
(easily extended to sample mean of functions of three or more random variables).
EXAMPLE 9: Consider a RIS of size
from a bivariate Normal distribution with both means equal to 0, both variances equal to 1, and a correlation coefficientequal to
. To find a highly accurate approximation to the PDF of
, we first compute the first four cumulants of
, getting
,
,
and
; this implies, based on (30), that
is the corresponding skewness-removing function, leading to
,
and
. The final result is that
has a PDF of (34); its maximum error (established from the corresponding empirical distribution) when
is about 0.2%.
8. Function of Two Sample Means
The Edgeworth approximation is somehow more difficult to construct for a function of two sample means; the main complication arises from the fact that this time, instead of a single
, there are several (
, to be specific)
-order central moments, defined by
(35)
(where
), and the same applies to cumulants.
The central moments can be found from a joint MGF of
and
, namely from
(36)
by differentiating it
times with respect to
and
times with respect to
, and evaluating at
. When analytic form of this MGF is not available, we may first expand
to the required order and then take the expected value (by numerical integration, if necessary) of the resulting expansion.
Cumulants are similarly found by either expanding or differentiating
; for the second and third-order cumulants, we still get
, the forth-order formulas become more involved, namely
(37)
We can then get
, from an obvious generalization of (9), namely
(38)
as the coefficient of
in the corresponding bivariate expansion of this function.
This results in the following joint central moments of
and
(quoting only terms needed for our
accurate approximation)
(39)
in analogy with (22). Note that the coefficients in each numerator equal the number of ways to divide
males and
females into subgroups of size specified by the indices (e.g. the coefficient of
is
since we need to select one male out of 3 and one female out of 2 to form the smaller subgroup; the coefficient of
is
since we need to select 2 males out of 4 for the first subgroup, followed by creating two mixed subgroups from the remaining 2 males and 2 females).
At this point it should be obvious that the complexity of these formulas is forcing us to abandon any further by-hand computation and start using computers (equipped with Mathematica, in our case).
EXAMPLE 10: To find Edgeworth approximation to PDF of
when sampling Exponential distribution with
, we start by computing the joint MGF of
and
(these are the
and
of the general theory), by expanding (up to cubic terms in
and
) the integrand of
before completing the integration. The answer is then converted into a joint MGF of central moments of
and
(
of our notation) by expanding (38), this time in powers of
(up to cubic) while using
for the mean of
and 2 for the mean of
(note that the last operation will create some quartic, quintic and hexic terms in
and
).
Next, we need a bivariate Taylor expansion our
function in
at
and in
at 2 (up to cubic terms), resulting in
Using
values found in the previous step, we are then well positioned to compute the
to
cumulants of
, getting
The approximate PDF of
is given by (25). When
, its maximum error is about 0.25%, again determined from an empirical distribution of 4 million random values of
; this is again a major improvement over the basic Normal approximation whose maximum error is 4.4%.
9. Final Extension
The Edgeworth approximation becomes increasingly more difficult to construct (due to the number of terms needed) for a function of several sample means, yet the layout of the procedure is a simple generalization of what has been done in the previous sections. Let us summarize it:
Find a joint MGF of all RVs whose sample means are in the studied function, and expand it up to and including quartic terms (as shown already: we may need to expand before integrating).
Based on this expansion, denoted
, build
where
is a vector of the variables’ expected values, and expand it in terms of
up to and including cubic terms. Note that getting a joint central moment of the sample means, say
, is (to a sufficient accuracy) given by the coefficient of
in the last expansion, further divided by
.
The given function is then expanded in the sample means at their respective expected values, up to and including cubic terms (let us denote this expansion
). We then proceed to compute
as the expected value of
(ignoring the cubic term of this expansion),
as the expected value of
, after expanding it up to quartic terms,
as the expected value of
, after expanding it up to quartic terms, and finally
as the expected value of
, after expanding it up to hexic terms; this time we must also subtract
to convert the result from central moment to cumulant. The resulting approximation is then given by (24), (25) and (26), where
now changes to the given function of several sample means).
EXAMPLE 11: Assume sampling from a general bivariate Normal distribution; the ML estimator of its correlation coefficient
is the following sample statistic
(40)
To find its approximate PDF, we first realize (skipping a routine proof) that the distribution of
remains the same when sampling bivariate Normal distribution with both means equal to 0 and both variances equal to 1, as assumed from now on [6].
Associating
and
with
and
respectively,
of these five RVs is given by
(41)
which follows from the corresponding bivariate integration. This then provides a way of computing any joint central moment of the five sample means found in (40) by expanding the 5-variable version of (38).
We also need to expand
in the same sample means, treating them as Mathematical variables, at their expected values, up to cubic terms; in principle, this may already result in a total of
terms but, in this case, some of them turn out to be zero (we get 27 nonzero terms; still too large to quote). As before, we compute the first simple moment of this expansion, followed by computing the next three central moments; the last of these is then easily converted into the corresponding cumulant. Using the previous notation, these turn out to be
The approximate PDF of
is then given by (25). Note that to complete the computation, 452 values of
were needed, while the expansion of
had 4134 terms!
This time we can explore and quote exact errors of this approximation, as an analytic formula for the PDF of
exists (but it is rarely used, due to its complexity). The maximum error, when
and
, is 2.5%; too large to make it acceptable.
Nevertheless, when we run the same program using
instead of
, the resulting
is proportional to
It is easy to verify that
makes this expression (and therefore
itself) equal to zero [7] [8], while the remaining three cumulants are found to be
making the approximate PDF of
given by (34) [9]; this result is not only much simpler but also substantially more accurate. Its maximum error, when
and
, is now a tolerable 0.5%.
10. Conclusions
We have delineated a procedure for extending the CLT beyond its usual
accuracy by adding cubic and quartic terms to the MGF of
and consequently computing two extra moments of the sampled distribution: this then substantially improves the resulting approximation to the corresponding PDF (making its error proportional to
). The improvement is due to matching the sampling distribution’s skewness and kurtosis, in addition to the usual mean and variance. The same procedure can be applied to a function of
and eventually extended to a function of several sample means, while sampling any univariate or multivariate distribution. This eventually leads to having to deal with a large number of terms of various intermediate expansions, but this should not be seen as a major obstacle in our age of powerful and fast computers.
When the sampled distribution is either parameter-free or has only a single parameter, the technique enables us to find a transformation of any sample statistic of the previous paragraph, to eliminate its distribution’s skewness; this automatically results in an additional and often quite dramatic further improvement in the accuracy of the resulting approximation (making it also simpler, as an extra benefit).
We have demonstrated the new procedure using the classic example of Fisher
transformation, but its main applicability is to sample statistics having no analytic formula for their distribution, especially when the basic Normal approximation of the CLT proves woefully inadequate.