Improving Accuracy of Normal Approximation

Abstract

The Central Limit Theorem occasionally results in extremely slow convergence to the Normal distribution, making the corresponding approximation virtually useless (a typical example being the sample correlation coefficient). In most of these cases, there is a relatively easy way of substantially improving the approximation by including a few extra terms of the corresponding Edgeworth expansion to match not only the distribution’s mean and variance, but also its skewness and kurtosis. A further improvement is possible by transforming the studied sample statistic to fully remove the skewness; this simplifies the resulting approximation, making it also more accurate. All this is amply demonstrated by many practical examples.

Share and Cite:

Vrbik, J. (2025) Improving Accuracy of Normal Approximation. Open Journal of Statistics, 15, 473-491. doi: 10.4236/ojs.2025.156025.

1. Introduction

This article demonstrates how the distribution of any sample statistic meeting the conditions of the Central Limit Theorem (CLT) [1] and having the first four finite moments can be approximated with high accuracy by considering two more terms of the corresponding Edgeworth expansion [2]; this then allows us to use the correspondingly improved approximation even with relatively small sample sizes. The key component essential for the new technique to achieve this goal lies in the proper utilization of cumulants of a distribution; the article’s first section provides a summary of relevant formulas.

The second section then explains how cumulants help to find higher moments of a sample mean, which then enables us to extend the usual Normal approximation of its distribution by two extra terms (as done in Section 3), to match the distribution’s skewness and kurtosis. This yields a substantial improvement in the resulting accuracy, as illustrated by several examples.

We then proceed (in Section 4) to show how the same approach applies to any function of a sample mean, and how to utilize this method to find a specific function to remove the resulting skewness (in Section 5), automatically achieving a further, often dramatic, improvement. Section 6 demonstrates how to extend the new technique to a function of several sample means.

2. Distribution’s Cumulants

It is well known that expanding the moment generating function (MGF), defined as M X ( t ):=E( e tX ) , of a random variable (RV) X in powers of t provides an easy way of computing simple moments, defined by μ ˙ j :=E( X j ) , of the corresponding distribution, thus

M X ( t )=1+ j=1 μ ˙ j t j j! (1)

This implies that, when similarly expanding

M Xμ ( t )= M X ( t ) e μt =1+ j=2 μ j t j j! (2)

where μ:= μ ˙ 1 is the expected value of X , the coefficients of this expansion are μ j :=E[ ( Xμ ) j ] , i.e. central moments of the same distribution.

Finally, expanding the following cumulant generating function (CGF)

C( t ):=ln M X ( t )= j=1 κ j t j j! (3)

defines the distribution’s cumulants [3], denoted κ j . Since MGF of a sum of independent RVs is a product of the individual MGFs, the corresponding CGF becomes the sum of the individual C( t ) functions; this implies that, to get the j th cumulant of an independent sum, all we have to do is to add the individual κ j ’s.

The following formula

ln M Xμ ( t )=ln M X ( t )μt (4)

implies that

ln( 1+ j=2 μ j t j j! )= j=2 κ j t j j! (5)

which shows that all cumulants (with the exception of κ 1 =μ ) are functions of central moments. For the first several such cumulants we get: κ 2 = μ 2 = σ 2 (the variance), κ 3 = μ 3 , κ 4 = μ 4 3 μ 2 2 , etc.

When similarly expanding exp( j=2 κ j t j j! ) in powers of t , the coefficient of t j yields μ j j! , resulting in

μ 2 = κ 2 (6)

μ 3 = κ 3

μ 4 = κ 4 +3 κ 2 2

μ 5 = κ 5 +10 κ 3 κ 2

μ 6 = κ 6 +15 κ 4 κ 2 +10 κ 3 2 +15 κ 2 3

Note that the coefficients of individual terms in each such μ j expansion equal to the number of different partitionings of a group of j distinct objects into subsets whose sizes are given by the corresponding κ indices, assuming that interchanging subsets of the same size is inconsequential (e.g. dividing 6 people into three groups of 2 people each can be done in ( 6 2,2,2 )/ 3! =15 ways, giving us the coefficient of κ 2 3 in the expansion of μ 6 ). Also note that subsets of size one are not allowed.

Often, we need cumulants of a function of X, say Y=h( X ) ; in many such cases it becomes impossible to find the MGF of Y in an analytic form. Nevertheless, it is usually still possible to compute the first few moments of Y (by numerical integration, if necessary) and convert them to the same number of cumulants.

EXAMPLE 1: To get the first four cumulants of Y:= X 2 1+ X 2 , where X has Cauchy distribution with the median of 0 and semi-inter-quartile range of 1, we first find the first four simple moments of Y by

μ ˙ j = 1 π ( x 2 1+ x 2 ) j dx 1+ x 2

(in this case, it can be done analytically), getting μ ˙ j = 1 2 , 3 8 , 5 16 and 35 128 when j=1,2,3 and 4 respectively. Expanding

C y ( t )=ln( 1+ j=1 4 μ ˙ j t j j! )= 1 2 t+ 3 4 t 2 2 + 128 3 t 4 4!

then yields the first four cumulants of Y (note that, coincidentally, κ 3 =0 ).

EXAMPLE 2: Similarly, when sampling Gamma distribution whose PDF is given by

x α1 exp( x ) Γ( α ) whenx>0

and aiming to get the first four cumulants of Y:=lnX , we first evaluate the corresponding first four simple moments of lnX and then convert them, in the same manner, to cumulants; this results in κ j =ψ( α ) , ψ ( α ) , ψ ( α ) , ψ ( α ) for j=1 to 4, where ψ( α ) is the first derivative of lnΓ( α ) .

3. Moments of a Sample Mean

Suppose that, instead of considering a single X , we take a random independent sample (RIS) of size n from the X distribution and seek the moments of the corresponding sample mean

X ¯ := i=1 n X i n (7)

To find the k th central moment of X ¯ (a non-trivial task, unless cumulants are used) which we denote μ ¯ k , we first notice that finding the corresponding cumulants (up to and including the k th ) is substantially easier, since the j th cumulant of i=1 n X i is simply n κ j . This result must be further divided by n j to account for the denominator in the definition of X ¯ , thus leading to the following simple formula

κ ¯ j = κ j n j1 (8)

We can now find μ ¯ k since we know how to express moments in terms of cumulants (what was true for X is still true for X ¯ ). Thus, for example

μ ¯ 6 = κ ¯ 6 +15 κ ¯ 4 κ ¯ 2 +10 κ ¯ 3 2 +15 κ ¯ 2 3 = κ 6 n 5 +15 κ 4 κ 2 n 4 +10 κ 3 2 n 4 +15 κ 2 3 n 3

where we note that individual terms of this expansion decrease with n in a different manner (the first term does it a lot faster than the last one). Also note that, when the distribution of X is symmetric (its PDF being an even function of xμ ), both μ 2k+1 and κ 2k+1 (and their X ¯ counterparts) are equal to zero for every positive k.

In summary, μ ¯ j :=E[ ( X ¯ μ ) j ] can be found as the coefficient of t j j! in the expansion of

exp( nlnM( t n )μt ) (9)

where M( t ) is the MGF of a single X .

4. Central Limit Theorem (CLT)

Consider RIS of size n from a distribution with a mean of μ , standard deviation of σ (both must be finite), and its MGF denoted M( t ) . CLT states that the distribution of

Z n := i=1 n X i nμ σ n (10)

tends, as n , to that of the standardized (0 mean, variance of 1) Normal distribution.

Proof: The MGF of Z n is found by taking the MGF of Xμ , namely

exp( j=2 κ j t j j! ) (11)

and raising it to the power of n while (at the same time) replacing t by t σ n ; this results in

exp( j=2 κ j t j n j/2 1 σ j j! )=exp( t 2 2 )exp( j=3 κ j t j n j/2 1 σ j j! ) (12)

where the RHS clearly tends, as n , to exp( t 2 2 ) of the N( 0,1 ) distribution.

5. Edgeworth Expansion

To achieve a better approximation (called Edgeworth), we expand the second factor of (12) in powers of 1 n up to and including 1 n -proportional terms, thus getting

1+ κ 3 t 3 3! σ 3 n + κ 3 2 t 6 23 ! 2 σ 6 n + κ 4 t 4 4! σ 4 n =1+ γ 3 t 3 6 n + 3 γ 4 t 4 + γ 3 2 t 6 72n (13)

where γ 3 := κ 3 σ 3 and γ 4 := κ 4 σ 4 are the distribution’s skewness and excess kurtosis, respectively.

To extend CLT by these terms, we need to convert the resulting MGF to the corresponding probability density function (PDF); in our case, it is sufficient to find out how to convert exp( t 2 2 ) t j , needed with j=3,4 and 6 only. To do that, we start with the well known result (used to compute the MGF of standardized Normal distribution) of

exp( z 2 2 ) 2π exp( zt )dz =exp( t 2 2 ) (14)

and realize that, in general

H( z ) exp( z 2 2 ) 2π exp( zt )tdz = H( z ) exp( z 2 2 ) 2π dexp( zt ) dz dz = ( zH( z ) H ( z ) ) exp( z 2 2 ) 2π exp( zt )dz (15)

(using by-part integration) when H( z ) is a polynomial. Starting with H 0 ( z )=1 , this yields H 1 ( z )=z H 0 ( z ) H 0 ( z )=z , thus proving that exp( t 2 2 )t converts to exp( z 2 2 ) 2π z . Using (15) repeatedly, we get

H 2 ( z )=z H 1 ( z ) H 1 ( z )= z 2 1 H 3 ( z )=z H 2 ( z ) H 2 ( z )=z( z 2 3 ) H 4 ( z )= z 4 6 z 2 +3 H 5 ( z )=z( z 4 10 z 2 +15 ) H 6 ( z )= z 6 15 z 4 +45 z 2 15 (16)

letting us convert exp( t 2 2 ) t j to exp( z 2 2 ) 2π H j ( z ) for all the j values needed in our approximation ( H j ( z ) is usually referred to as Hermite polynomial of degree j ).

The new, improved (even though still approximate) PDF of Z n is thus

exp( z 2 2 ) 2π ( 1+ κ 3 H 3 ( z ) 6 σ 3 n + κ 3 2 H 6 ( z ) 72 σ 6 n + κ 4 H 4 ( z ) 24 σ 4 n ) (17)

The formula’s error decreases with 1 n 3/2 , unlike that of the basic Normal approximation, whose error decreases with only 1 n . To be able to construct the new approximation, the sampled distribution’s first four moments need to be finite.

The corresponding cumulative distribution function (CDF) of Z n is given by

Φ( z ) exp( z 2 2 ) 2π ( κ 3 H 2 ( z ) 6 σ 3 n + κ 3 2 H 5 ( z ) 72 σ 6 n + κ 4 H 3 ( z ) 24 σ 4 n ) (18)

where Φ is the CDF of the standardized Normal distribution. This is clear from

H k+1 ( z )exp( z 2 2 )dz = ( z H k ( z ) H k ( z ) )exp( z 2 2 )dz = H k ( z )exp( z 2 2 )

where the last step is a result of a by-part integration of H k ( z )exp( z 2 2 )dz . Using (18) to find a specific quantile of the X ¯ distribution (not a goal of this article) would lead to a modified (due to different standardization of X ¯ ) version of Cornish-Fisher expansion [4].

It is possible to derive more accurate formulas by including further terms of the Edgeworth expansion; this would lead to results of high complexity and diminishing returns (we do not pursue it any further).

EXAMPLE 3: Suppose X has an exponential distribution with μ=1 . This time, rather uniquely and deliberately (to test the accuracy of the new approximation), we know the exact PDF of

Z n := i=1 n X i n n

to be

( z+ n ) n1 exp( ( z+ n ) n ) ( n1 )! n n/2 forz> n (19)

Based on lnM( t )=ln( 1t ) , we can easily find κ 2 =1 , κ 3 =2 and κ 4 =6 ; the corresponding Edgeworth approximation to the PDF of Z n then reads

exp( z 2 2 ) 2π ( 1+ z( z 2 3 ) 3 n + 2 z 6 21 z 4 +36 z 2 3 36n ) (20)

which can be readily transformed into the PDF of X ¯ , if desired.

In the following table, we compare the maximum errors of Edgeworth and Normal approximations for three different values of n .

n

Edgeworth

Normal

10

0.42%

6.7%

20

0.14%

4.6%

30

0.045%

3.2%

One can see that the improvement achieved by the new technique is truly substantial; the table also gives a good demonstration of the new error decreasing with n 3/2 , instead of n 1/2 of the last column. Note that a graph of the difference between exact and approximate PDFs always alternates between positive and negative areas; the largest of these (in absolute value) defines the maximum error.

Also note that (20) can be confirmed by directly expanding (19) in powers of 1 n .

EXAMPLE 4: Sampling Uniform distribution (with values from 0 to 1) whose MGF is

M( t )= exp( t )1 t

(note the removable singularity at t=0 ) and whose first four cumulants are 1 2 , 1 12 , 0 (due to its symmetry with respect to the mean value) and 1 120 , the approximate PDF of

Z n := X ¯ κ 1 κ 2 n = X ¯ 1 2 1 12n

is thus

exp( z 2 2 ) 2π ( 1+ κ 4 H 4 ( z ) 24 σ 4 n )= exp( z 2 2 ) 2π ( 1 H 4 ( z ) 20n )

Since the exact PDF of i=1 n X i is also known (it is called Irwin-Hall distribution) and easily transformable to Z n , we can still compare the maximum error of the Normal approximation, which is 0.56% when n=10 (tolerable but large) to that of the last formula, which equals 0.016% for the same value of n . Note that this large improvement has been achieved by only a single extra 1 n -proportional term; not surprisingly, symmetric distributions result in simpler and more accurate approximations.

6. Transforming X ¯

In this section, we discuss how to find a similar approximation for the distribution of a function of X ¯ (still asymptotically Normal under most circumstances, specified shortly).

The first step is to expand the function (denoted g ) with respect to its argument X ¯ (now taken to be a strictly Mathematical variable) at X ¯ =μ (the corresponding expected value), getting

g( X ¯ )g( μ )+ g ( μ )( X ¯ μ )+ g ( μ ) 2 ( X ¯ μ ) 2 + g ( μ ) 3! ( X ¯ μ ) 3 + (21)

To be able to do this, we require the function to be sufficiently smooth (the three derivatives must exist and be finite); for our purpose, the value of g ( μ ) must also not equal to zero. Note that to achieve 1 n accuracy of the resulting PDF (i.e. having an error proportional to n 3/2 ), we do not need higher than cubic terms in this expansion.

We already know that

μ ¯ 2 = κ 2 n (22)

μ ¯ 3 = κ 3 n 2

μ ¯ 4 =3 κ 2 2 n 2 + κ 4 n 3

μ ¯ 5 =10 κ 3 κ 2 n 3 +

μ ¯ 6 =15 κ 2 3 n 3 +

where + stands for terms whose contribution goes beyond the desired 1 n accuracy, and which can be thus discarded. Also note that only the first four cumulants of the sampled distribution are needed. The last set of formulas is then used to find the first four cumulants of g( X ¯ ) , each to sufficient accuracy in terms of the corresponding 1/n expansion. Denoting them K 1 to K 4 we get, after some algebra (all g derivatives are evaluated at μ )

K 1 =:E[ g( X ¯ ) ]=g( μ )+ 1 2 g κ 2 n + (23)

K 2 =:E[ ( g( X ¯ ) K 1 ) 2 ]= ( g ) 2 κ 2 n + g g κ 3 + g g κ 2 2 + 1 2 ( g ) 2 κ 2 2 n 2 +

K 3 =:E[ ( g( X ¯ ) K 1 ) 3 ]= ( g ) 2 ( g κ 3 +3 g κ 2 2 ) n 2 +

K 4 =:E[ ( g( X ¯ ) K 1 ) 4 ]3 K 2 2 = ( g ) 4 κ 4 +12 ( g ) 3 g κ 3 κ 2 +4 ( g ) 3 g κ 2 3 +12 ( g ) 2 ( g ) 2 κ 2 3 n 3 +

For efficient execution of the algorithm, the following is of utmost importance: to achieve the overall 1/n accuracy of the resulting approximation, K 1 needs to be only 1/n accurate (thus ignoring terms beyond quadratic in the g( X ¯ ) expansion), while both K 2 and K 3 require first expanding the corresponding powers of g( X ) K 1 up to and including quartic terms in X ¯ μ (to make each result 1/ n 2 accurate). Finally, to reach the 1/ n 3 accuracy of K 4 , the 4th power of ( g( X ¯ ) K 1 ) 4 needs to be expanded up to hexic terms, before taking its expected value.

The approximate PDF of

Z n := g( X ¯ ) K 1 K 2 (24)

is then given by

exp( z 2 2 ) 2π ( 1+ K 3 H 3 ( z ) 6 K ^ 2 3/2 + K 3 2 H 6 ( z ) 72 K ^ 2 3 + K 4 H 4 ( z ) 24 K ^ 2 2 ) (25)

where

K ^ 2 := ( g ) 2 κ 2 n (26)

is sufficiently accurate for this part of the result (but not when computing Z n ).

Note that the basic Normal approximation in this case states that

Z n := g( X ¯ )g( μ ) ( g ) 2 κ 2 n (27)

has, approximately, the standardized Normal distribution.

We should point out that, to find Edgeworth approximation to the PDF of g( h( X ) ¯ ) , where h( X ) is a function of X, requires the κ j cumulants in (23) to be the cumulants of h( X ) , not of X ; the rest of the procedure remains the same.

EXAMPLE 5: To get an approximate PDF of

Z n := ln( X ¯ ) K 1 K 2

(implying that g=ln ) when sampling Exponential distribution with μ=1 , we start with g =1 , g =1 and g =2 . Since we already know the first four cumulants of the sampled distribution, it is now routine to compute K 1 = 1 2n , K 2 = 1 n + 1 2 n 2 , K 3 = 1 n 2 and K 4 = 2 n 3 , resulting in the following approximate PDF of Z n

exp( z 2 2 ) 2π ( 1 H 3 ( z ) 6 n + 6 H 4 ( z )+ H 6 ( z ) 72n )

The maximum error of this approximation (when n=10 ) is 0.11%; a significant improvement over the error of basic Normal approximation, which equals 4.3%.

EXAMPLE 6: Similarly, when sampling Gamma distribution of Example 2, the maximum-likelihood (ML) estimator of α is the (numerical) solution to

ψ( α ^ )= lnX ¯ =ψ( α )+( lnX ¯ ψ( α ) ) (28)

where α ^ (a function of lnX ¯ ) denotes this estimator. Since this function is defined only implicitly ( ψ does not have an analytic inverse), getting the corresponding g i values (coefficients of the (21) expansion, where α ^ stands for the g function, i is the order of its derivative and the evaluation is done at μ=ψ( α ) which, in this context, are all implicit operations) is now more difficult. Expanding the RHS of

ψ( α ^ )ψ( g 0 + g 1 x+ g 2 x 2 2 + g 3 x 3 6 )

where x:= lnX ¯ ψ( α ) , results in

ψ( g 0 )+ g 1 ψ x+( g 1 2 ψ + g 2 ψ ) x 2 2 +( g 1 3 ψ +3 g 1 g 2 ψ + g 3 ψ ) x 3 6

with the ψ derivatives deemed evaluated at the true value of α . Matching this to the RHS of (28) yields g 0 =α , g 1 = 1 ψ , g 2 = ψ ( ψ ) 3 and g 3 = ψ ( ψ ) 4 + 3 ( ψ ) 2 ( ψ ) 5 . Based on (23), we can now build the corresponding Edgeworth approximation for the PDF of α ^ , given values of α and n (we stop here, as the point of this example was only to show how to expand an implicit function of a sample mean; the rest is routine). We should also mention that there is a more direct way of building an approximate PDF of ML estimators [5], not to be discussed in this article. ◼

7. Removing Skewness

The technique of the previous section can be utilized in the following way: instead of being given a specific function of X ¯ , we may seek a function g which eliminates (to the level of our approximation) the third central moment (and thus skewness) of g( X ¯ ) . The reason for doing this is that the resulting PDF is simpler and more accurate than the one built directly for X ¯ , thus providing the most accurate approximation yet for the distribution of X ¯ itself (after the corresponding back transformation).

When the X distribution is parameter-free and positive, this goal can always be achieved by g being a simple power of X ¯ .

Proof: This can be seen from the fact that the K 3 cumulant of g( β X ¯ ) is proportional to g β 3 κ 3 +3 g ( β 2 κ 2 ) 2 and can be made equal to zero by solving

g ( βμ ) κ 3 +3 g ( βμ )β κ 2 2 =0

or, equivalently,

g ( x )μ κ 3 +3x g ( x ) κ 2 2 =0 (29)

The simplest solution is

g( x )= x 1 μ κ 3 / ( 3 κ 2 2 ) (30)

EXAMPLE 7: For Exponential distribution with μ=1 , κ 2 =1 and κ 3 =2 , this would be achieved by

g( X ¯ )= X ¯ 1/3

which leads to g 1 = 1 3 , g 2 = 2 9 and g 3 = 10 27 , resulting in K 1 =1 1 9n , K 2 = 1 9n + 0 n 2 , K 3 =0 (by design) and K 4 = 2 ( 9n ) 3 . Therefore

Z n :=( X ¯ 1/3 1+ 1 9n ) 9n (31)

has the following (still approximate, but now highly accurate) PDF

exp( z 2 2 ) 2π ( 1 H 4 ( z ) 108n ) (32)

This result is easily transformed back to a PDF of X ¯ (substitute the RHS of (31) for z in (32) and multiply the result by dz d x ¯ ). The maximum error of this approximation when n=10 is 0.036%; a very substantial improvement yet over the approximation of Example 3.

To use results of this example when sampling Exponential distribution with a mean of β , we just replace X ¯ 1/3 in (31) by X ¯ 1/3 / β 1/3 , the rest stays the same. ◼

Another possibility of removing skewness of a parameter-free distribution (no longer necessarily positive) is to realize that the K 3 cumulant of g( X+a ) is proportional to

g ( μ+a ) κ 3 +3 g ( μ+a ) κ 2 2

and can thus be made equal to zero by

g( x )=exp( κ 3 x 3 κ 2 2 ) (33)

EXAMPLE 8: For Gumbel distribution whose PDF is

exp( x e x )

and whose first four cumulants are γ (Euler gamma), π 2 6 , ψ ( 1 ) and π 4 15 respectively, this implies that

exp( A X ¯ )

where

A= 12 ψ ( 1 ) π 4 0.29613

results in K 1 =exp( Aγ )+ 0.07876 n , K 2 = 0.17195 n 0.0372 n 2 and K 4 = 0.01436 n 3 while eliminating skewness. This means that the PDF of

Z n := exp( A X ¯ ) K 1 K 2

is, to a good accuracy, given by

exp( z 2 2 ) 2π ( 1+ K 4 H 4 ( z ) 24 K ^ 2 2 ) (34)

easily transformed into an approximate PDF of X ¯ .

Unfortunately, this time we cannot compare our solution with an exact answer which is impossible to derive analytically; the only way to establish its maximum error is to generate an empirical distribution of X ¯ , using as many RIS’s as possible (in this case, we have used four million samples of size 10). Luckily, the resulting error of this distribution is smaller than the error of our Edgeworth approximation; this enables us only to estimate the maximum error of the latter not to exceed 0.25% (when n=10 ), while that of the basic Normal approximation is almost 3.5% (beyond acceptable).

We should mention that the above simulation took less than one second to complete, while converting the resulting grand sample of four million exp( A X ¯ ) values into the corresponding empirical PDF required about 35 seconds; both computations were done by Mathematica. ◼

Note that the same skewness removal (and resulting accuracy improvement) can be achieved by applying a similarly designed g function to h( X,Y ) ¯ (easily extended to sample mean of functions of three or more random variables).

EXAMPLE 9: Consider a RIS of size n from a bivariate Normal distribution with both means equal to 0, both variances equal to 1, and a correlation coefficientequal to 1 2 . To find a highly accurate approximation to the PDF of X 2 X 2 +2 Y 2 ¯ , we first compute the first four cumulants of X 2 X 2 +2 Y 2 , getting κ 1 =μ=0.40825 , κ 2 =0.11237 , κ 3 =0.01547 and κ 4 =0.01563 ; this implies, based on (30), that g( x )= x 0.83333 is the corresponding skewness-removing function, leading to K 1 =0.47399 0.02219 n , K 2 = 0.10519 n + 0.00887 n 2 and K 4 = 0.01287 n 3 . The final result is that

Z n = ( X 2 X 2 +2 Y 2 ¯ ) 0.83333 K 1 K 2

has a PDF of (34); its maximum error (established from the corresponding empirical distribution) when n=10 is about 0.2%.

8. Function of Two Sample Means

The Edgeworth approximation is somehow more difficult to construct for a function of two sample means; the main complication arises from the fact that this time, instead of a single μ k , there are several ( k+1 , to be specific) k -order central moments, defined by

μ ij :=E[ ( X μ x ) i ( Y μ y ) j ] (35)

(where i+j=k ), and the same applies to cumulants.

The central moments can be found from a joint MGF of X μ x and Y μ y , namely from

M c ( t,u ):=exp( μ x t μ y u )E[ exp( tX+uY ) ]=1+ i+j2 μ ij t i u j i!j! (36)

by differentiating it i times with respect to t and j times with respect to u , and evaluating at t=u=0 . When analytic form of this MGF is not available, we may first expand exp( tX+uY ) to the required order and then take the expected value (by numerical integration, if necessary) of the resulting expansion.

Cumulants are similarly found by either expanding or differentiating ln M c ( t,u ) ; for the second and third-order cumulants, we still get κ ij = μ ij , the forth-order formulas become more involved, namely

κ 40 = μ 40 3 μ 20 2 (37)

κ 31 = μ 31 3 μ 20 μ 11

κ 22 = μ 22 μ 20 μ 02 2 μ 11 2

We can then get μ ¯ ij , from an obvious generalization of (9), namely

exp( nlnM( t n , u n ) μ x t μ y u ) (38)

as the coefficient of t i u j i!j! in the corresponding bivariate expansion of this function.

This results in the following joint central moments of X ¯ and Y ¯ (quoting only terms needed for our 1/n accurate approximation)

μ ¯ 20 = κ 20 n (39)

μ ¯ 11 = κ 11 n

μ ¯ 31 = 3 κ 20 κ 11 n 2 + κ 31 n 3

μ ¯ 22 = κ 20 κ 02 +2 κ 11 2 n 2 + κ 22 n 3

μ ¯ 41 = 10 κ 21 κ 20 n 3 +

μ ¯ 32 = κ 30 κ 02 +6 κ 21 κ 11 +3 κ 20 κ 12 n 3 +

μ ¯ 51 = 15 κ 20 2 κ 02 n 3 +

μ ¯ 42 = 3 κ 20 2 κ 02 +12 κ 20 κ 11 2 n 3 +

μ ¯ 33 = 9 κ 20 2 κ 02 +6 κ 20 κ 11 2 n 3 +

in analogy with (22). Note that the coefficients in each numerator equal the number of ways to divide i males and j females into subgroups of size specified by the indices (e.g. the coefficient of κ 21 κ 11 is 3×2 since we need to select one male out of 3 and one female out of 2 to form the smaller subgroup; the coefficient of κ 20 κ 11 2 is 6×2 since we need to select 2 males out of 4 for the first subgroup, followed by creating two mixed subgroups from the remaining 2 males and 2 females).

At this point it should be obvious that the complexity of these formulas is forcing us to abandon any further by-hand computation and start using computers (equipped with Mathematica, in our case).

EXAMPLE 10: To find Edgeworth approximation to PDF of

g:= lnX ¯ π 2 + X 2 ¯

when sampling Exponential distribution with μ=1 , we start by computing the joint MGF of lnX and X 2 (these are the X and Y of the general theory), by expanding (up to cubic terms in t and u ) the integrand of

M( t,u ):= 0 exp( x+tlnx+u x 2 )dx

before completing the integration. The answer is then converted into a joint MGF of central moments of lnX ¯ and X 2 ¯ ( μ ¯ ij of our notation) by expanding (38), this time in powers of 1/n (up to cubic) while using γ for the mean of lnX and 2 for the mean of X 2 (note that the last operation will create some quartic, quintic and hexic terms in t and u ).

Next, we need a bivariate Taylor expansion our g function in lnX ¯ at γ and in X 2 ¯ at 2 (up to cubic terms), resulting in

g γ 2+ π 2 +( lnX ¯ +γ )+ γ( X 2 ¯ 2 ) ( 2+ π 2 ) 2 γ ( X 2 ¯ 2 ) 2 ( 2+ π 2 ) 3 + γ ( X 2 ¯ 2 ) 3 ( 2+ π 2 ) 4

Using μ ¯ ij values found in the previous step, we are then well positioned to compute the K 1 to K 4 cumulants of g , getting

K 1 =0.048630 0.028197 n

K 2 = 0.014082 n + 0.001486 n 2

K 3 = 0.003444 n 2

K 4 = 0.001050 n 3

The approximate PDF of

Z n := lnX ¯ π 2 + X 2 ¯ K 1 K 2

is given by (25). When n=10 , its maximum error is about 0.25%, again determined from an empirical distribution of 4 million random values of Z n ; this is again a major improvement over the basic Normal approximation whose maximum error is 4.4%.

9. Final Extension

The Edgeworth approximation becomes increasingly more difficult to construct (due to the number of terms needed) for a function of several sample means, yet the layout of the procedure is a simple generalization of what has been done in the previous sections. Let us summarize it:

  • Find a joint MGF of all RVs whose sample means are in the studied function, and expand it up to and including quartic terms (as shown already: we may need to expand before integrating).

  • Based on this expansion, denoted M e ( t ) , build

exp( nln M e ( t n )μt )

where μ is a vector of the variables’ expected values, and expand it in terms of 1/n up to and including cubic terms. Note that getting a joint central moment of the sample means, say μ ¯ ijk , is (to a sufficient accuracy) given by the coefficient of t 1 i t 2 j t 3 k in the last expansion, further divided by i!j!k! .

  • The given function is then expanded in the sample means at their respective expected values, up to and including cubic terms (let us denote this expansion g e ). We then proceed to compute K 1 as the expected value of g e (ignoring the cubic term of this expansion), K 2 as the expected value of ( g e K 1 ) 2 , after expanding it up to quartic terms, K 3 as the expected value of ( g e K 1 ) 3 , after expanding it up to quartic terms, and finally K 4 as the expected value of ( g e K 1 ) 4 , after expanding it up to hexic terms; this time we must also subtract 3 K 2 2 to convert the result from central moment to cumulant. The resulting approximation is then given by (24), (25) and (26), where g( X ¯ ) now changes to the given function of several sample means).

EXAMPLE 11: Assume sampling from a general bivariate Normal distribution; the ML estimator of its correlation coefficient ρ is the following sample statistic

r:= XY ¯ X ¯ Y ¯ ( X 2 ¯ X ¯ 2 )( Y 2 ¯ Y ¯ 2 ) (40)

To find its approximate PDF, we first realize (skipping a routine proof) that the distribution of r remains the same when sampling bivariate Normal distribution with both means equal to 0 and both variances equal to 1, as assumed from now on [6].

Associating X, X 2 ,Y, Y 2 and XY with t 1 , t 2 , u 1 , u 2 and v respectively, lnM( t 1 , t 2 , u 1 , u 2 ,v ) of these five RVs is given by

1 2 t 1 2 + u 1 2 ( 1 ρ 2 )( t 1 2 u 2 + u 1 2 t 2 )+2ρ t 1 u 1 2( 1 ρ 2 ) t 1 u 1 v 12( t 2 + u 2 )+4( 1 ρ 2 ) t 2 u 2 2ρv( 1 ρ 2 ) v 2 1 2 ln( 12( t 2 + u 2 )+4( 1 ρ 2 ) t 2 u 2 2ρv( 1 ρ 2 ) v 2 ) (41)

which follows from the corresponding bivariate integration. This then provides a way of computing any joint central moment of the five sample means found in (40) by expanding the 5-variable version of (38).

We also need to expand r in the same sample means, treating them as Mathematical variables, at their expected values, up to cubic terms; in principle, this may already result in a total of 1+5+15+35=56 terms but, in this case, some of them turn out to be zero (we get 27 nonzero terms; still too large to quote). As before, we compute the first simple moment of this expansion, followed by computing the next three central moments; the last of these is then easily converted into the corresponding cumulant. Using the previous notation, these turn out to be

K 1 =ρ( 1 1 ρ 2 2n )

K 2 = ( 1 ρ 2 ) 2 n ( 1+ 2+11 ρ 2 n )

K 3 = 6ρ ( 1 ρ 2 ) 3 n 2

K 4 = ( 72 ρ 2 6 ) ( 1 ρ 2 ) 4 n 3

The approximate PDF of

Z n := r K 1 K 2

is then given by (25). Note that to complete the computation, 452 values of μ ijkm were needed, while the expansion of ( r K 1 ) 4 had 4134 terms!

This time we can explore and quote exact errors of this approximation, as an analytic formula for the PDF of r exists (but it is rarely used, due to its complexity). The maximum error, when n=20 and ρ= 1 2 , is 2.5%; too large to make it acceptable.

Nevertheless, when we run the same program using g( r ) instead of r , the resulting K 3 is proportional to

2ρ g ( ρ )( 1 ρ 2 ) g ( ρ )

It is easy to verify that g( ρ ):= 1 2 ln 1+ρ 1ρ makes this expression (and therefore K 3 itself) equal to zero [7] [8], while the remaining three cumulants are found to be

K 1 = 1 2 ln 1+ρ 1ρ + ρ 2n

K 2 = 1 n + 6 ρ 2 2 n 2

K 4 = 2 n 3

making the approximate PDF of

Z n := 1 2 ln 1+r 1r K 1 K 2

given by (34) [9]; this result is not only much simpler but also substantially more accurate. Its maximum error, when n=20 and ρ= 1 2 , is now a tolerable 0.5%.

10. Conclusions

We have delineated a procedure for extending the CLT beyond its usual n 1/2 accuracy by adding cubic and quartic terms to the MGF of X ¯ and consequently computing two extra moments of the sampled distribution: this then substantially improves the resulting approximation to the corresponding PDF (making its error proportional to n 3/2 ). The improvement is due to matching the sampling distribution’s skewness and kurtosis, in addition to the usual mean and variance. The same procedure can be applied to a function of X ¯ , and eventually extended to a function of several sample means, while sampling any univariate or multivariate distribution. This eventually leads to having to deal with a large number of terms of various intermediate expansions, but this should not be seen as a major obstacle in our age of powerful and fast computers.

When the sampled distribution is either parameter-free or has only a single parameter, the technique enables us to find a transformation of any sample statistic of the previous paragraph, to eliminate its distribution’s skewness; this automatically results in an additional and often quite dramatic further improvement in the accuracy of the resulting approximation (making it also simpler, as an extra benefit).

We have demonstrated the new procedure using the classic example of Fisher z transformation, but its main applicability is to sample statistics having no analytic formula for their distribution, especially when the basic Normal approximation of the CLT proves woefully inadequate.

Conflicts of Interest

The author declares no conflicts of interest regarding the publication of this paper.

References

[1] (2001) Central Limit Theorem. In: Encyclopedia of Mathematics, EMS Press.
http://encyclopediaofmath.org/index.php?title=Central_limit_theorem&oldid=18508
[2] Edgeworth, F.Y. (1905) The Law of Error I. Proceedings of the Cambridge Philosophical Society, 20, 36-65.
[3] Kendall, M.G. and Stuart, A. (1969) The Advanced Theory of Statistics. Volume 1, 3rd Edition, Griffin.
[4] Cornish, E.A. and Fisher, R.A. (1938) Moments and Cumulants in the Specification of Distributions. Revue de lInstitut International de Statistique / Review of the International Statistical Institute, 5, 307-320.[CrossRef]
[5] Vrbik, J. (2014) Asymptotic Expansion of Sampling Distributions of an MLE to 1/n Accuracy. Advances and Applications in Statistics, 38, 37-50.
[6] Fisher, R.A. (1915) Frequency Distribution of the Values of the Correlation Coefficient in Samples from an Indefinitely Large Popuation. Biometrika, 10, 507-521.[CrossRef]
[7] Hotelling, H. (1953) New Light on the Correlation Coefficient and Its Transforms. Journal of the Royal Statistical Society Series B: Statistical Methodology, 15, 193-225.[CrossRef]
[8] Winterbottom, A. (1979) A Note on the Derivation of Fisher’s Transformation of the Correlation Coefficient. The American Statistician, 33, 142-143.[CrossRef]
[9] Vrbik, J. (2022) Fisher Transformation via Edgeworth Expansion.
https://arxiv.org/abs/2208.05070v1

Copyright © 2026 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.