^{*}

We present a family of formal expansions for the density function of a general one-dimensional asymptotic normal sequence X
_{n}. Members of the family are indexed by a parameter τ with an interval domain which we refer to as the spectrum of the family. The spectrum provides a unified view of known expansions for the density of X
_{n}. It also provides a means to explore for new expansions. We discuss such applications of the spectrum through that of a sample mean and a standardized mean. We also discuss a related expansion for the cumulative distribution function of X
_{n}.

Historically, formal expansions (i.e., non-rigorous expansions) for distributions of random variables have played an important role in the development of asymptotic theories in statistics. The most well-known example is the Edgeworth expansion for the density of a standardized mean which was first derived in 1905 as a formal expansion for the density [

Nevertheless, in spite of the usefulness of formal expansion there does not seem to exist a systematic approach for deriving such expansions in the literature. To obtain an Edgeworth type of expansion for an asymptotically normal sequence, the common approach is to use the moments of (or in the absence of the exact moments, the approximate moments obtained through the delta method) and substitute them into the Edgeworth expansion formula for the standardized mean. To obtain a saddlepoint type of expansion, one often follows Daniels’s derivation [

The main purpose of this paper is to introduce a family of formal expansions for a general asymptotic normal sequence. Members of the family are indexed by a parameter with an interval domain which we call the spectrum of the expansions. The spectrum has the following applications. 1) It provides a means to study the whole family of formal expansions and search for good and valid expansions. 2) It provides a way to view known expansions from a unified standpoint, thereby linking seemingly unrelated expansions under a unified framework. For the case of a standardized mean, for example, the Edgeworth expansion and the saddlepoints expansion [

The rest of this paper is organized as follows: in Section 2, we derive the family of formal expansions. In Section 3, we discuss the validity of the family for the cases of the sample mean and the standardized mean. For the latter case, the family led to a set of valid new expansions for the density function. In Section 4, we consider a related formal expansion for the distribution function. Concluding remarks in Section 5 will include further notes on previous work which have motivated this paper.

Suppose has a density function and a moment generating function. Assume that as approaches infinity the interval in which exists approaches a non-vanishing open interval where, are constants and. We will derive a formal expansion for at each point and thus we call the spectrum of the formal expansions for. To derive the formal expansion at a point, we need the following inversion formula

where, and. A key result that will be used in the derivation is the following lemma which establishes a new defining relation for Hermite polynomials.

Lemma 1: Let be the density function of the standard normal distribution and be the Hermite polynomial of degree. Then

for

Proof of Lemma 1: See Appendix.

The derivation of the formal expansion at involves the following two steps. Step 1: obtaining a formal series representation of the density function, and Step 2: rearranging terms in the formal series representation according to their asymptotic orders. Step 1 is achieved by first replacing the exponent of the integrand in (1) with its Taylor series expansion, then isolating the quadratic term of the Taylor series and performing a term-by-term integration. Note that in this step no attempt will be made to isolate the asymptotic factor from the exponent since we do not presume that the expansion of interest is based on a power sequence of. Also, with the aid of Lemma 1, Step 1 is independent of and gives a unified series representation for all values in the spectrum as we will see in the proof of Theorem 1 below.

Theorem 1: For any where, has the following formal series representation (3)

where is the density function of the normal distribution with mean and variance

, and.

Proof of Theorem 1: For convenience of presentation, we first consider the special case of. By setting in (1) to, on the path of integration near the origin we have

Since and, where and are the mean and variance of, (4) may be written as

Letting, Equation (1) may be formally rewritten as

Letting and for, we may write (6) as

where is the density of, and for brevity we have written as. Expanding the function in the integrand, we get

(8)

We now perform the term-by-term integration for the right-hand side of (8). This is easily carried out using Lemma 1 by noting that is an entire function. Thus the contour of integration in (8) may be deformed from to. This and Lemma 1 lead to

for. It follows that

or equivalently

which is the formal series representation (3) at.

For a general, may not be zero and (4) becomes

Let be the density function of the normal distribution with mean and variance and write. By replacing (4) with (12) and then following the same steps for the case of shown above, we obtain (3).

The series representation (3) takes on a simpler form (11) for the special case of because the

term in (3) is not in (11). Another special case where (3) has a simpler form is the case where is the saddlepoint satisfying. Define, the generalized saddlepoint approximation for, as

Setting to in (3) and noting that , we obtain the series representation of at the saddlepoint:

Note that here depends on and thus is not a constant in the spectrum as changes.

We now discuss Step 2. Series (3), (11) or (14) are not particularly useful from an asymptotic expansion point of view in that they, like Charlier differential series, do not use information concerning the asymptotic properties of the distribution of. They are not asymptotic expansions for. When is asymptotically normal, the sequence may be an asymptotic sequence and may thus be used to transform these series into formal asymptotic expansions. To do so, we need to rearrange terms in the curly brackets in (3), (11) and (14) in ascending order according to the rates at which the’s approach zero. Corollary 1 below gives the rearranged series at the saddlepoint (14).

Corollary 1: Suppose as approaches infinity for and in particular. Then we have formally

We refer to (15) this as the generalized saddlepoint expansion for based on the asymptotic sequence

. Note that conditions in Corollary 1 are satisfied by a large class of statistics, including the sample mean.

To transform the general series (3) into a formal expansion, we also need to consider the Hermite polynomials that appear in (3). If the absolute value of their common argument, , goes to infinity when

goes to infinity, then since is a polynomial of order. The reciprocals of these polynomials will form an asymptotic sequence with respect to. Thus (3) contains ratios of terms in two asymptotic sequences andand its asymptotic properties become complicated. To avoid this complication, we assume that is bounded. With this assumption, the relative rate at which terms in the curly bracket of (3), such as and, approach zero is determined by that of the’s. Rearranging terms in (3), we have Corollary 2: Suppose for and is bounded as approaches infinity. Then we have formally (16) where and.

In particular, at formal expansion (16) becomes (17).

We refer to (16) as the general expansion for and (17) as the generalized Edgeworth expansion because the latter is the expansion at the origin but unlike the Edgeworth expansion which is based on the power sequence, (17) is based on a general asymptotic sequence.

Note that conditions on the relative order of the

’s and the boundedness of in the corollaries are easily verified once is given. When some of these conditions are not met, terms in the series representations need to be arranged accordingly. The resulting formal expansions may be different from those obtained above but the leading term should still be

.

To demonstrate the use of the spectrum, we now examine the spectrum for the important cases of sample mean and standardized mean. We show that the known expansions such as the saddlepoint, Edgeworth and saddlepoints expansions, can all be located through the spectrum. Moreover, we examine the validity of other expansions in the spectrum.

Let be the average of independent copies of a random variable. How does the generalized saddlepoint expansion relate to the saddlepoint expansion for given by Daniels [

Thus the generalized saddlepoint approximation (13) is the same as Daniels’s saddlepoint approximation,

.

To examine the asymptotic property of other terms of the generalized saddlepoint expansion (15), we first note that for any. Hence for. Denote by. It is not difficult to show that

which is in the saddlepoint expansion in [

terms in expansion (15) may be constructed for this particular case and it can be shown that they are equal to the corresponding terms in Daniels’s saddlepoint expansion. Thus the generalized saddlepoint expansion (15) is Daniels’s saddlepoint expansion.

It may also be easily verified using the same arguments demonstrated above that the general expansion (16) coincides with the expansion Daniels derived through the Edgeworth expansion at in [

. Thus when

, expansions given by D(4.3) are not valid.

More specifically, the coefficient in D(4.3), for example, is in general. Thus the second term in D(4.3), , is in general. Hence D(4.3) cannot even be an asymptotic expansion in a formal sense. This illustrates the necessity of the condition that be bounded, which we have used in arriveing at (16).

It is not difficult to verify that for this case the generalized Edgeworth expansion (17) coincides with the Edgeworth expansion. Furthermore, the saddlepoints approximation for the density of a standardized mean given by Routledge and Tsao [

Although in this case the term in (18) and the term in (16) may be easily further expanded, verification of the validity of expansions with more terms than that in (18) is more involved and will not be considered here. We only consider (18) for which the validity of the family can be established. The following equation will be used implicitly for showing the validity:

where and.

Let be the cumulant generating function of the standardized mean. Then its derivatives have the following expansions: (i), (ii), (iii), and iv) for. Denote the leading term of the expansion in (18) by. We have

Equations (i), (ii), (iii) and (19) imply that

(20)

Also, (iii) and (iv) imply that. Thus (20) may be written as

By the Edgeworth expansion,

. Thus

This proves the validity of (18). We have compared the numerical accuracy of to the normal approximation for small and moderately large sample sizes through a number of examples. Not surprisingly, is substantially more accurate than when is close to the saddlepoint. They are about the same when is near zero.

To summarize, all known expansions for the above two special cases have been located in their spectrums. For the sample mean, the generalized saddlepoint expansion is the only member which is a valid asymptotic expansion. For the standardized mean, new valid expansions have been found.

The formal expansions for density functions may be integrated to obtain expansions for the corresponding distribution function,. Consider the case where and. By formally integrating (17), we obtain the generalized Edgeworth expansion

It may be easily verified that (23) is the same as the Edgeworth expansion for the distribution function when is the standardized mean. We now consider another example where (23) is valid. Let be a U-statistic of degree 2,

where the’s are iid and is a symmetric function of two variables with and . Let be the standard deviation of and be the distribution function of, then under certain conditions [

where is an approximation with error to the third cumulant of,. With , (25) then implies that (23) is indeed valid. Furthermore, it can be shown that the fourth cumulant of, , satisfies. The right-hand side of (25) and thus that of (23) can be further expanded. The expansion in (25) is simpler than that in (23) in that it is defined in terms of a simpler asymptotic sequence

while (23) is defined in terms of

) which may be difficult to compute. From the present point of view, however, is a more natural asymptotic sequence upon which to base asymptotic expansions. Presently, it is not clear which one is more accurate for small and moderate sample sizes.

Steps similar to Steps 1 and 2 in Section 2 may be devised to derive formal expansions for the distribution function directly using the inversion formula,

where is the tail probability and. This process, however, is more complicated due to the extra term in the integrand and it leads to different expansions depending on whether or not is expanded. We will not discuss such expansions here.

For the cases of the sample mean and standardized mean, the spectrum has provided a new perspective on asymptotic expansions for density functions. It revealed that the saddlepoint expansion is the only valid expansion in the spectrum for the sample mean. It led to new expansions and provided a unified standpoint for viewing known expansions for the standardized mean. It also led to valid expansions outside the iid setting. These suggest that the spectrum is a valuable tool in finding expansions for density functions.

The derivation in Section 2 does not explicitly use the condition that is asymptotically normal. Without this condition, however, the sequence may not be an asymptotic sequence and this condition has been used implicitly in the corollaries. Our derivation also shows that to obtain a saddlepoint type of expansion it is not necessary to isolate the asymptotic factor n by expressing the cumulant generating function of as. Instead, one can use directly to obtain an expansion. Although the former approach will lead to the same saddlepoint approximation as the latter, it will obscure the underlying asymptotic sequence of the expansion and consequently that of the asymptotic order of the saddlepoint approximation. Indeed, the fact that the cumulant generating function can be written as times a function not dependent on is only a coincidence in the iid case. It has made it possible to establish the validity of the saddlepoint expansion through the method of steepest descent for this case. But it is not essential for deriving a formal expansion in general.

Turning now to some historical notes and remarks on previous work which have motivated this work. The Charlier difference series and the Gram-Charlier series of type A are mathematically elegant formal techniques which have contributed to the discovery of the Edgeworth expansion. However, they were not specifically aimed at approximating distributions from an asymptotic point of view and were unable to make use of the information that is asymptotically normal beyond choosing the normal density function as the developing function. When the focus is on obtaining accurate approximations for the distributions of rather than obtaining the speed at which the sequence approaches normality, other developing functions may be more suitable. In the present paper, we have found the leading term of the general expansion (16) to be very useful for this purpose.

Although in the extended version of Poincaré’s definition of an asymptotic expansion,

the asymptotic sequence needs not to be a power sequence, important developments in the theory of asymptotic analysis are mostly concerned with power series expansions. The developments in asymptotic expansions in statistics reflect that of the theory of asymptotic analysis. Our use of the sequence was inspired by [15,16] which have used non-power sequences to characterize the Edgeworth expansion and the saddlepoint expansion. Indeed, with an appropriate standardizetion the cumulant generating function of an asymptotically normal sequence approaches a second order polynomial. If the limiting normal distribution is not a degenerate distribution, then the sequence may be an asymptotic sequence which can be used to construct asymptotic expansions.

I would like to thank a referee for helpful comments which have led to improvements in this paper.

We need the following identities which may be found in [

where is the Hermite polynomial of degree and by convention.

By setting to zero in (27) we obtain

The left-hand side of (29) is the moment generating function of the standard normal distribution evaluated at. It follows that

for, where is the th moment of the standard normal distribution. Since when is odd, (30) may be written as

To prove (2), we show that if satisfies

for, then. We first note that

Again, since when is odd, (33) may be written as

Furthermore, by differentiating (32) with respect to we obtain

for. Thus

for. It follows that and are the solutions of the same differential Equation (28) or (35). Furthermore, and , by induction for.