Share This Article:

A Note on “Limit Distributions of Self-Normalized Sums” Using Cauchy-Generated Samples

Abstract Full-Text HTML XML Download Download as PDF (Size:504KB) PP. 863-875
DOI: 10.4236/am.2019.1011062    113 Downloads   208 Views  
Author(s)    Leave a comment

ABSTRACT

In this case study, we would like to illustrate the utility of characteristic functions, using an example of a sample statistic defined for samples from Cauchy distribution. The derivation of the corresponding asymptotic probability density function is based on [1], elaborating and expanding the individual steps of their presentation, and including a small extension; our reason for such a plagiarism is to make the technique, its mathematical tools and ingenious arguments available to the widest possible audience.

1. Introduction

The problem of finding the distribution of self-normalized sum of a Cauchy-generated sample has been considered since at least 1969 (see, for example, [2] ), but solved (by proposing a concrete numerical algorithm) only in 1973 by [1] , our key reference. After that, the research has been focused mainly on proving general statements concerning self-normalized sums, without dealing with specific distributions such as Cauchy—see, for example, [3] .

In this article, we demonstrate how characteristic functions (CHF) are used in Statistics to find a distribution of a specific sample statistic (a function of individual observations). We do this by using a single comprehensive example, namely finding the n limit of the probability density function (PDF) of

i = 1 n X i i = 1 n X i 2 (1)

where X 1 , X 2 , , X n is a random independent sample (RIS) of size n from a Cauchy distribution with zero median. This goal has already been achieved (in a more general setting) by [1] ; our main (rather pedagogical) reason for extending their presentation is to make it accessible to graduate and advance undergraduate students.

First we recall that the CHF of a random variable X is defined by

φ x ( s ) = E ( e i s X ) = cos ( s x ) f ( x ) d x + i sin ( s x ) f ( x ) d x (2)

where f ( x ) stands for the X’s PDF, E for the corresponding expected value, and i is the purely imaginary unit; note that the real part of φ x ( s ) is always an even (an alternate name is symmetric) function of s, while the purely imaginary part is odd. Also note that when f ( x ) is even, the corresponding CHF must be real (its purely imaginary part becomes zero because the integrand is odd).

Similarly

φ x y ( s , t ) = E ( e i s X + i t Y ) = x - y plane e i s x + i t y f ( x , y ) d x d y (3)

(where f ( x , y ) is the joint PDF of X and Y) defines the joint CHF of two (not necessarily independent) random variables X and Y; when they are independent, we have

φ x y ( s , t ) = φ x ( s ) φ y ( t ) (4)

(product of the individual CHFs). This implies that (when independent), the CHF of X + Y is given by φ x ( s ) φ y ( s ) , further implying that the CHF of a sample mean of n independent X i is given by

φ x ( s n ) n (5)

When two random variables are combined into one (e.g. defining U = X Y ), the CHF of U is given by

φ U ( s ) = E ( e i s X Y ) = x - y plane e i s x y f ( x , y ) d x d y (6)

Under very general conditions, knowing a generating function enables us to reverse the process and find the corresponding PDF thus:

f ( x ) = 1 2 π e i s x φ x ( s ) d s (7)

To learn more about the mathematical details of transforming PDF into CHF and back the reader may like to consult [4] .

2. Joint CHF and Its Limit

Assume a RIS of size n from a Cauchy distribution with the following PDF

g ( x ) = 1 π ( 1 + x 2 ) (8)

The joint CHF of X and X 2 (seen as distinct random variables) is then given by

exp ( i x s + i x 2 t ) g ( x ) d x (9)

This implies that the joint CHF of

U n = def i = 1 n X i n (10)

and of

V n 2 = def i = 1 n X i 2 n 2 (11)

is (replace s by s n and t by t n 2 in (9), and raise the result to the power of n)

φ U n , V n 2 ( s , t ) = { 1 + [ exp ( i x s n + i x 2 t n 2 ) 1 i x s n ] g ( x ) d x } n (12)

where adding and subtracting 1 facilitates taking the n limit; similarly, subtracting i x s n (which does not change the value of the integral since it integrates, in the principal-value sense, to 0) helps to make subsequent integrals converge. Note that principal value of an integral implies replacing d x by lim R R R d x ; this is tacitly assumed from now on.

Replacing x by ny makes the previous expression into

{ 1 + [ exp ( i y s + i y 2 t ) 1 i y s ] g ( n y ) n d y } n (13)

whose n limit is

exp { [ exp ( i y s + i y 2 t ) 1 i y s ] d y π y 2 } (14)

since g ( n y ) 1 n 2 u 2 and ( 1 + A n ) n exp ( A ) . The last displayed expression is thus the characteristic function of U and V 2 (our notation for the corresponding limits of U n and V n 2 ).

Note that only the tail behaviour of (8) was relevant in the end.

3. Finding CHF of (1)

Replacing t by i s 2 w (with w positive) results in

φ U , V 2 ( s , i s 2 w ) = exp { [ exp ( i y s s 2 w y 2 ) 1 i y s ] d y π y 2 } = exp [ s ψ ( w ) ] (15)

where (after the y s = z substitution)

ψ ( w ) = def [ exp ( i z w z 2 ) 1 i z ] d z π z 2 = 2 π Q ( 1 2 w ) 2 w π exp ( 1 4 w ) (16)

with

Q ( τ ) = def 0 τ exp ( z 2 2 ) d z (17)

Note that the value of ψ ( w ) is thus always negative (this remains true for the real part of ψ ( w ) after we make w complex; this becomes consequential later on).

Proof.

ψ ( w ) = 1 π exp ( i z w z 2 ) d z = 1 π exp ( 1 4 w ) exp ( w ( z i 2 w ) 2 ) d z = 1 π exp ( 1 4 w ) i / 2 w i / 2 w exp ( w t 2 ) d t = 1 π exp ( 1 4 w ) exp ( w t 2 ) d t = exp ( 1 4 w ) π w

since moving the path of the t integration does not change the integral’s value (there are no singularities between the two paths) and the integrand tends to 0 sufficiently fast within the same strip when w tends to plus or minus infinity.

Therefore, ψ ( w ) is given by

exp ( 1 4 w ) π w d w + C = 2 π exp ( z 2 z ) z 2 d z + C (18)

= 2 π exp ( z 2 z ) z 2 π exp ( z 2 z ) d z (19)

= 2 w π exp ( 1 4 w ) 2 π Q ( 1 2 w ) (20)

using the w = 1 2 z 2 substitution. Note that C = 0 , since Q ( ) = π 2 and

ψ ( 0 ) = 1 π exp ( i z ) 1 i z z 2 d z = 1 (21)

This is proved by replacing the path of integration (the real axis) by a clockwise half circle of radius R, centered on the origin and denoted R (legitimate, since the integrand has no singularities between the two paths), and then evaluating

lim R R exp ( i z ) z 2 d z lim R R 1 + i z z 2 d z = 0 + lim R 0 π i R e i t R e 2 i t e i t d t = π (22)

The first integral is equal to 0 due to Jordan’s lemma. In the second integral, we have traded the clockwise half circle for counter-clockwise (by changing the integrand’s sign).

Evaluating (15) at two distinct values of w (let us denote them w and w0), dividing the difference by s, and integrating over s from 0 to infinity yields

0 0 exp ( i s u ) exp ( s 2 v 2 w ) exp ( s 2 v 2 w 0 ) s f ( u , v ) d u d v d s = 0 exp [ s ψ ( w ) ] exp [ s ψ ( w 0 ) ] s d s (23)

where f ( u , v ) is the joint PDF of U and V; note that its support is the whole upper half of the u-v plane.

Replacing s by t v (note that v cancels out from d s s = d t t ) changes the left hand side of (23) to

0 0 exp ( i t u v ) exp ( t 2 w ) exp ( t 2 w 0 ) t f ( u , v ) d u d v d t (24)

= 0 φ ( t ) exp ( t 2 w ) exp ( t 2 w 0 ) t d t (25)

where φ ( t ) is the characteristic function of U V , the variable whose distribution we seek; the last expression is then equal to the right-hand side of (23).

4. Converting to PDF

Differentiating the resulting equation, i.e. (25) = (23), with respect to w then yields (after a sign reversal)

0 t φ ( t ) exp ( t 2 w ) d t = ψ ( w ) 0 exp [ s ψ ( w ) ] d s = ψ ( w ) ψ ( w ) (26)

(to the last integrand, ψ ( w ) is just a negative constant).

Finally, multiplying both sides of the previous equation by

exp ( y 2 4 w ) π π w (27)

and integrating over w from 0 to i (a notation which implies following the positive imaginary axis) results in

1 π 0 φ ( t ) exp ( i t y ) d t = 1 π 3 / 2 0 i exp ( y 2 4 w ) w ψ ( w ) ψ ( w ) d w (28)

where w denotes the corresponding principal value, and (16) is now extended to complex arguments.

Proof. Following [5] , we write

0 i exp ( y 2 4 w t 2 w ) w d w = 2 y i t 0 exp ( i t y 2 ( z 2 + 1 z 2 ) ) d z = e i t y 2 y i t 0 exp ( i t y 2 ( z 1 z ) 2 ) d z (29)

after introducing

w = i z 2 y 2 t (30)

A further z 1 z substitution makes it into

e i t y 2 y i t 0 exp ( i t y 2 ( z 1 z ) 2 ) d z z 2 (31)

Adding (29) and (31), which are identical in terms of their value, and dividing by 2 yields

e i t y y i 2 t 0 exp ( i t y 2 ( z 1 z ) 2 ) ( 1 + 1 z 2 ) d z (32)

Finally, introducing q = ( z 1 z ) results in

e i t y y i 2 t exp ( i t y 2 q 2 ) d q = e i t y π t (33)

assuming that, to evaluate the last integral, we first replace t by t i ε (to make it converge), and then take the ε 0 limit of the answer (Cauchy-type integration).

It is well known that the real part of the left hand side (and, therefore, of the right-hand side) of (28) yields the desired (clearly symmetric) PDF of U V , say f ( y ) , due to φ ( t ) being real (and thus automatically symmetric as well). This then yields

f ( y ) = 1 π 3 / 2 Re 0 i exp ( y 2 4 w ) w ψ ( w ) ψ ( w ) d w = 2 π 3 / 2 Re 0 i exp ( y 2 4 w ) ( 2 w ) 3 / 2 d w 1 + 1 2 w exp ( 1 4 w ) Q ( 1 2 w ) (34)

based on (16) and (18). The problem is that there is no simple analytic answer to the last integral; worse yet, its integrand is highly oscillatory, preventing us from direct numerical integration.

In an attempt to break this impasse, we introduce the following substitution

w = 1 2 τ 2 (35)

getting

f ( y ) = 2 π 3 / 2 Re 0 exp ( i π / 4 ) exp ( y 2 τ 2 2 ) d τ 1 + τ exp ( τ 2 2 ) Q ( τ ) (36)

where τ now follows the complex ray at −45˚.

Unfortunately, even after this substitution, special measures are still needed to facilitate accurate numerical integration of the integral. For one, it becomes necessary to separately deal with the y 2 < 1 and y 2 > 1 regions.

5. Case of y2 < 1

When y 2 < 1 , it is legitimate to rotate the ray of the last integration to the positive real axis, since there are no singularities of the integrand between the old and the new path, and the function decreases sufficiently fast towards infinity in that segment. We then get

f ( y ) = 0 2 π 3 exp ( y 2 τ 2 2 ) d τ 1 + τ exp ( τ 2 2 ) Q ( τ ) (37)

where τ is real, and the integrand is “well behaved” (no oscillations). Results of numerical evaluation are displayed in Figure 1 (we are showing only the y > 0 portion of the graph; visualize a mirror image of this curve when 1 < y < 0 ).

To investigate the nature of this singularity we notice that, for large τ , the integrand of (37) tends to

Figure 1. Resulting PDF for y 2 < 1 . Note the logarithmic-type singularity at y = ± 1 .

2 π 2 τ exp ( τ 2 ( 1 y 2 ) 2 ) (38)

since Q ( τ ) τ π / 2 . Integrating the last function over τ from 0 to infinity yields

1 π 2 Γ ( 0, 1 y 2 2 ) (39)

where the second factor is a special case of the incomplete gamma function defined by

Γ ( 0, z ) = def γ ln z k = 1 ( z ) k k k ! (40)

and γ is Euler’s gamma. This indicates that, by adding ln ( 1 y 2 ) π 2 to f ( y ) makes the resulting function finite (and thus amendable to a simple, Padé-type approximation, constructed later on).

6. Case of y2 > 1

The situation becomes somehow more difficult when y 2 > 1 . Now, we can rotate the −45˚ ray of the (36) integration to −90˚ (the negative imaginary axis), since the integrand decreases sufficiently fast towards infinity in this range of directions. The new path contributes 0 to the real part of (36), since the integrand (along the −90˚ ray) is real and d τ is purely imaginary. But moving from −45˚ to −90˚ we have crossed infinitely many singularities located between the two rays; these can be found numerically (a rather tedious procedure) by solving

1 + τ exp ( τ 2 2 ) Q ( τ ) = 0 (41)

The location of the first 50 of these (they are all simple poles, slowly converging to the Im τ = Re τ line) are shown in Figure 2.

We can then evaluate (36) using Cauchy integral theorem; note that, at each of these poles (say at τ p ), the τ derivative of 1 + τ exp ( τ 2 2 ) Q ( τ ) equals to 1 τ p , since

d ( 1 + τ exp ( τ 2 2 ) Q ( τ ) ) d τ | τ = τ p = exp ( τ p 2 2 ) Q ( τ p ) + τ p 2 exp ( τ p 2 2 ) Q ( τ p ) + τ p exp ( τ p 2 2 ) exp ( τ p 2 2 ) (42)

and

Figure 2. First 50 roots of (41).

1 + τ p exp ( τ p 2 2 ) Q ( τ p ) = 0 (43)

implying

exp ( τ p 2 2 ) Q ( τ p ) = 1 τ p (44)

This means that each pole (with the exception of the first one—see the next paragraph), contributes

2 π 3 Re [ 2 π i τ p exp ( τ p 2 y 2 2 ) ] (45)

to (36).

The pole on the imaginary axis needs to be avoided by an infinitesimal half circle, which means that it contributes only one half of the above amount, namely

2 π 1.30693 exp ( 1.30693 2 y 2 2 ) (46)

The last function yields the asymptotic behaviour of f ( y ) as y 2 , and provides an excellent approximation (its maximum absolute error is about 2 × 10 6 ) to this PDF when | y | > 1.8 . Unfortunately, to build a numerical solution for the full range of y 2 > 1 values by adding contributions of sufficiently many of these poles (as done by [1] ) is rather tedious, as finding hundreds of these poles (needed to reach a good accuracy, especially when y 2 approaches 1) is a non-trivial task, and the resulting convergence is quite slow.

Nevertheless we would like to mention that, by adding the contribution of the second pole, namely

2 2 π Im ( τ 2 exp ( τ 2 2 y 2 2 ) ) (47)

where τ 2 = 1.84906 3.45208 i , to (46) results in an equally accurate approximation (its error is less than 2 × 10 6 ) in the | y | > 1.7 tails of f ( y ) .

7. Alternate Solution

We are now going to backtrack and deal directly with (36). Even though its integrand is still highly oscillatory (with a frequency which increases as τ e i π / 4 ), we can mitigate the problem by dividing the range of integration into two segments: first we integrate from 0 to 1 i (which does not pose any numerical difficulty), thus getting what we will call the first component of (36), then from 1 i to infinity (along the −45˚ ray), getting the second component.

To carry out the latter integration, we first note that, for large τ , Q ( τ ) can be expanded in the following manner

Q ( τ ) π 2 exp ( τ 2 2 ) ( 1 τ 1 τ 3 + 3 τ 5 5 3 τ 7 + ) (48)

Substituting this into the integrand of (36) and further expanding in powers of 1 τ (up to and including the 6th power) while keeping the two exponential terms fixed (i.e. not expanding in τ ) yields

( 2 π 1 τ 2 exp ( τ 2 2 ) τ 4 π + 6 exp ( τ 2 2 ) τ 6 π ) exp ( τ 2 ( y 2 1 ) 2 ) (49)

which can be τ -integrated, from 1 i to ( 1 i ) , analytically; then we numerically integrate (over the same line segment) the difference between the integrands of (36) and (49)—since the resulting integrand now approaches zero (as τ increases) very quickly, the integration range becomes effectively finite ( 1 i to 6 6 i already achieves very high accuracy), thus eliminating any troublesome oscillation. Adding the real part of the last two answers (and multiplying by 2 π 3 ) then provides the second component of f ( y ) .

This technique yields the graph of Figure 3 (drawn for y > 1 ; visualize a mirror image of this curve when y < 1 ):

8. Monte-Carlo Verification

One can always get a good idea about a distribution of any sample statistic (however complicated and inaccessible to analytic treatment) by actually generating a random sample with the required properties by a computer, and using it to compute the desired statistic’s value; this is then repeated as many times as possible, displaying the results in a histogram.

We have done this with our (1), using n = 100 and generating one million of its random values. Plotting the resulting histogram, together with the theoretical asymptotic PDF of the last three sections, yields Figure 4; visual comparison clearly indicates a good agreement between the two answers. The tiny discrepancy still discernible (mainly in the 0 < y < 1 range) is due to the fact that n = 100 is not large enough to have reached the n limit yet (trying to make the sample size substantially higher would run up against our computer’s capacity).

Note that, to be more economical, we have folded the negative and positive parts of the distribution into a single graph, effectively plotting the PDF of the absolute value of (1).

9. Accurate Approximation

First we have to remember the we already have an excellent approximation to f ( y ) in the | y | > 1.7 region, given by the sum of (46) and (47); now we have to find a similarly accurate way of dealing with | y | 1.7 .

Figure 3. Resulting PDF when y 2 > 1 .

Figure 4. Comparing theoretical and empirical PDF’s.

We already know that, by adding

ln | 1 y 2 | π 2 (50)

to f ( y ) removes the corresponding singularity when y 2 1 ; we can easily verify that the same is true when y 2 > 1 , since one of the terms in the analytic result of the (49) integration (once we take its real part and multiply by 2 π 3 ) equals to

1 π 2 Re Γ ( 0, i y 2 1 2 ) (51)

where

Re Γ ( 0, i z ) = γ ln z k = 1 ( 1 ) k z 2 k 2 k ( 2 k ) ! (52)

whose singular part is given by (50). This proves that

f ( y ) + ln | 1 y 2 | π 2 (53)

is singularity-free for all values of y; unfortunately, that still does not make it sufficiently “smooth”, as the next paragraph indicates.

There is yet another subtle issue causing great difficulty when trying to build an approximate formula for (53): the function has a small (hardly noticeable) kink (a discontinuous second derivative) at y 2 = 2 ; this is not an artifact of the new technique—the same (rather surprising) phenomenon can be confirmed by the old technique of adding residues. Luckily, we can identify yet another term of the (49) contribution responsible for this discontinuity, and remove it by further subtracting the offending term from (53), thus getting

f ( y ) + ln | 1 y 2 | π 2 2 Re ( 2 y 2 ) 3 / 2 ( 3 y 2 11 ) 15 π 2 . (54)

This function is (finally!) sufficiently (even though not perfectly—tiny issues remain with higher derivatives at y 2 = 2 , y 2 = 3 etc.) smooth to fit it (in the | y | < 1.7 range) by a ratio of two polynomials (by minimizing the total error squared). This results in the following approximation

0.666457 0.82876 y 2 + 0.401154 y 4 0.0800733 y 6 + 0.00203206 y 10 1 0.572265 y 2 + 0.0327094 y 4 + 0.0207377 y 6 + 0.00042241 y 10 + 0.000261666 y 14 . (55)

Its absolute error never exceeds 6 × 10 6 , which is more than sufficient for any practical application. The form of this expression and the individual powers of y have been chosen somehow arbitrarily; we do not claim that our choices are optimal.

Do not forget that (55) is approximating (54); a corresponding adjustment has to be made to convert it into an approximation for f ( y ) .

10. Conclusion

The aim of this article was to demonstrate that finding the distribution of a relatively simple sample statistic requires a skillful use of characteristic functions and a whole gamut of sophisticated mathematical techniques, including real and complex analysis, Fourier transform, and curve fitting. We hope that students of Statistics can benefit from the ingenuity of the authors of the original derivation of this distribution as presented in [1] , and from some extra details included in this article; the latter includes a different numerical approach to building the resulting PDF, expressing it in the form of an accurate Padė-type approximation (discovering an interesting discontinuity in the process), and verifying the answer by Monte Carlo simulation.

Conflicts of Interest

The author declares no conflicts of interest regarding the publication of this paper

Cite this paper

Vrbik, J. (2019) A Note on “Limit Distributions of Self-Normalized Sums” Using Cauchy-Generated Samples. Applied Mathematics, 10, 863-875. doi: 10.4236/am.2019.1011062.

References

[1] Logan, B.F., Mallows, C.L., Rice, S.O. and Shepp, A.L. (1973) Limit Distributions of Self-Normalized Sums. The Annals of Probability, 1, 788-809.
https://doi.org/10.1214/aop/1176996846
[2] Efron, B. (1969) Student’s t-Test under Symmetry Conditions. Journal of the Americal Statistical Association, 64, 1278-1302.
https://doi.org/10.1080/01621459.1969.10501056
[3] Spataru, A. (2014) Convergence and Precise Asymptotics for Series Involving Self-Normalized Sums. Journal of Theoretical Probability, 29, 267-276.
https://doi.org/10.1007/s10959-014-0560-1
[4] Sneddon, I.N. (2010) Fourier Transforms. Dover Publications.
[5] Viola, M. How Does One Derive the Following Formula of Integration?
https://math.stackexchange.com/q/3323687

  
comments powered by Disqus

Copyright © 2020 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.