High moments Jarque-Bera tests for arbitrary distribution functions

The Jarque-Bera's fitting test for normality is a celebrated and powerful one. In this paper, we consider general Jarque-Bera tests for any distribution function df having at least 4k finite moments for k greater than 2. The tests use as many moments as possible whereas the JB classical test is supposed to test only skewness and kurtosis for normal variates. But our results unveil the relations between the coeffients in the JB classical test and the moments, showing that it really depends on the first eight moments. This is a new explanation for the powerfulness of such tests. General Chi-square tests for an arbitraty model, not only normal, are also derived. We make use of the modern functional empirical processes approach that makes it easier to handle statistics based on the high moments and allows the generalization of the JB test both in the number of involved moments and in the underlying distribution. Simulation studies are provided and comparison cases with the Kolmogorov-Smirnov's tests and the classical JB test are given.


Introduction
In this paper, we are concerned with generalizations of Jarque-Bera's (JB) [4] tests based on arbitrary first (4k) moments, k ≥ 2, rather than on the first eight ones as usual. (See [2] for a reminder of JB tests, page 69). We obtain general statistics that allow statistical tests for any distribution function G provided it has enough moments. For a reminder, the classical JB test belongs to the class of omnibus moment tests, i.e. those which assess simultaneously whether the skewness and kurtosis of the data are consistent with a Gaussian model. This test proved optimum asymptotic power and good finite sample properties (see [4]). A detailed description of that test and related indepth analyses can be found in Bowman and Shenton, D'Agosto, D'Agostino et al., etc. (See [5], [6], [7] and [8]). Let X, X 1 , X 2 , ... be a sequence of independent and identically distributed random variables (r.v.'s) defined on the same probability space (Ω, A, P). For each n ≥ 1, the skewness and kurtosis coefficients related to the sample X, ..., X n are defined by These statistics are designed to estimate the theoretical skewness and kurtosis given by b 2 = E (X − m) 3 /σ 3 and a 2 = E (X − m) 4 /σ 4 where m = E(X) and σ 2 = var(X) respectively denote the mean and the variance of X that is supposed to be nondegenerated. Here and in all the sequel, E stands for the mathematical expectation with respect to the probability P. Now, under the hypothesis : H0 : X follows a Gaussian normal law, we have b 2 = 0 and a = 3 and the JB statistic (1.2) T n = n 6 b 2 n,2 + 1 4 (a n,2 − 3) 2 has an asymptotic chi-square distribution with two degrees of freedom under the null hypothesis of normality. Jarque-Bera's test consists in rejecting H0 when T n is far from zero. We will find below that the constants 6 and 24 used in (1.2), actually, are closely related to the first four even moments of a N (0, 1) random variable which are 1, 3, 15 and 105 and a more convenient form of (1.2) is T n = n b 2 n,2 /6 + (a n,2 − 3) 2 /24 .
Our objective here is to generalize JB 's test to a general df G by considering high moments m ℓ = E(X ℓ ), ℓ ≥ 1, with m 1 ≡ m,instead of the first eight moments only. We base our methods on the remark that for a random variable X ∼ N (m, σ 2 ), one has Actually JB 's test only checks the third and fourth moments of X while the coefficients of the JB statistic (1.2) uses the first eight moments of X. Our guess is that we would have better tests if we were able to simultaneously check all the first (2k) moments for some k ≥ 2. To this purpose, we consider the following statistics, that is the normalized centered empirical moments (NCEM), and a n,p = µ n,2p µ p n,2 are the ℓ th non-centered and the centered empirical moments. By the classical law of large numbers, the statistics in (1.3) are, for each fixed p, asymptotic estimators of whenever the (4p) th moment exists. Finally we consider C 1 -class functions (f p ) p≤i≤k et (g p ) 1≤p≤k and denote f = (f 1 , ..., f k ) and g = (g 1 , ..., g k ).
Our general test is based on the following statistics, for k ≥ 2, (f p (b n,p ) + g p (a n,p )), which almost-surely (a.s) tends to as n → +∞. For an independent and identically distributed sequence X 1 , X 2 , ... of r.v.'s associated with a distribution function G having a finite 2k-moment, we will have by Theorem 1 below that From such a general result, we are able to derive a normality test by using it with b p = 0, a p = ((2p)/(2 p p!) for 2 ≤ p ≤ k, and rejects normality for a large value of T n (f, q, k).
We are going to establish a general asymptotic normality of T n (f, g, k) for any df 's G with 4k finite moments. These results provide themselves efficient tests for an arbitrary d.f . Next, we will derive chi-square tests that generalize JB's test for higher moments and for arbitrary df 's too.
Our results will show that these tests based on the 2k moments, need, in fact, the eight 4k moments for computing the variance. This unveils that the classical JB's test is not based only on the kurtosis and the skewness but also on the sixth and the eighth moments. To describe the complete form of the Jarque-Bera method, put The JB's test for a N (m, σ 2 ) r.v. will be showed to derive from the following general law (1.7) n b n,2 − b p ) 2 /bj(p) + (a n,2 − a p ) 2 /aj(p) ∼ χ 2 2 . with the particular coefficients p = 2, b p = 0 and a p = 3. This may be a new explanation of the powerfulness of the JB classical tests since a successful test of normality means that the sample is from a df having same first eight moments as the N (0, 1) r.v., and this is very highly improbable for a non normal r.v..
As an illustration of what preceeds, consider a distribution following a double-gamma distribution γ d ((1 + √ 13)/2, 1) of density probability f (x) = b a /(2Γ(a)) |x| a−1 exp(−b |x|) with a = 1 + (13))/2. This rv is centered and has a kurtosis coefficient equal to 3. It is rejected from normality by the JB test. If only the skewned and kurtosis do matter, it would not be the case. Actually, the rejection comes from the parameters aj(2) and bj(2) that are very different from a standard normal distribution to this specific distribution.
The rest of the paper is organized as follows. In Subsection 2.1 of Section 2 we begin to give a concise of reminder the modern theory of functional empirical processes that is the main theoretical tool we use for finding the asymptotic law of (1.5). Next in Subsection 2.2 we establish general results of the consistency of (1.5) and its asymptotic law, consider particular cases in Subsection 2.3, propose chi-square universal tests in Subsection 2.4 and finally state the proofs in Subsection 2.5. We end the paper by Section 3 where simulation results concerning the normal and double-exponential models are given.
We here express that in all the sequel, the limits are meant as n → +∞ and this will not be precised again unless it is necessary.

RESULTS AND PROOFS
2.1. A reminder of Functional empirical process. Since the empirical functional process is our key tool here, we are going to make a brief reminder on this process associated with X 1 , X 2 , ..., and defined for each n ≥ 1 by where f is a real measurable function defined on R such that It is known (see van der Vaart [3], pages 81-93) that G n converges to a functional Gaussian process G with covariance function at least in finite distributions. G n is linear, that is, for f and g satisfying (2.2) and for (a, b) ∈ R 2 , we have aG n (f ) + bG n (g) = G n (af + bg). This linearity will be useful for our proofs. We are now in position to state our main results.

Statements of results. First introduce this notation for
Here are our main results.
2.3. Particular cases and consequences.

2.3.1.
A general test. Let G be an arbitrary df with a 4k th finite moment for k ≥ 2, this is x 4k dG(x) < +∞. We want to check whether a sample X 1 , .., X n is from G. We then select C 1 −f unctions f i and g i , i = 1, ..., k and compute the observed value t * n (f, g, k) of (n)(T * n (f, g, k)− T * (f, g, k)) and report the p-value of the test, that is p = P(|N (0, 1)| ≥ |t * n (f, g, k)| s) where s 2 is either the exact variance σ 2 k or its plug-in estimator Our guess is that using a greater value of k makes the test more powerful since the equality in distribution of univariate r.v.'s means equality of all moments when they exist (see page 213 in [1]). For k = 2, this result depends on the first eight moments. Then to find another df G 1 for which the p-value exceeds 5% would suggest it has the same eight moments as G, which is highly improbable. Simulation studies in Section 3 support our findings. Remark that we have as many choices as possible for the functions the f ′ i s and g ′ i s.
Unfortunately, in the simulation studies reported below, we noticed that the plug-in estimator σ 2 k,n may hugely over estimate the exact variance and leads to accepting any data to follow that model, or significantly underestimate it and leads to reject data form the model itself. This is why we only use the exact variance here. Now let us show how to derive chi-square tests from Theorem 1.

2.3.2.
Generalized JB test and tests for symmetrical df 's. Suppose that X is a symmetrical distribution. We have from Theorem 1 that , (a n,2 − a p )) = (G n (B(p)), G n (C(p))) + o P (1).
Since X is symmetrical, that is µ 2ℓ−1 = 0 for ℓ ≥ 1, we may without loss of generality suppose that m 1 = 0 since replacing X by X − m 1 does affect neither the (b n,p , a n,p ) ′ s nor the (b p , a p ) ′ s. Then we have from (2.4) and (2.5) that and By reminding that h p h q = h p+q for p ≥ 0 and q ≥ 0, we observe that the product B(p) × C(p) only includes functions h j with odd j ′ s and then EG n (B(p) * C(p)) = 0. Thus where (Σ p ) 11 = Var(B(p)) = bj(p), (Σ p ) 22 = Var(C(p)) = aj(p) and (Σ p ) 12 = 0. We get Corollary 2. Let x 4p dG(x) < ∞ for p ≥ 2 and G be a symmetrical df . We have (2.9) n(b 2 n,p /bj(p) + (a n,p − a p ) 2 /aj(b)) → χ 2 2 . For a standard normal random variable, we get bj(2) = 6 and aj(2) = 24 and the normality JB's test becomes a particular case of (2.9), which is a general chi-square test for an arbritrary df with 2p-finite moments.
It is now time to prove Theorem 1 before considering the simulation studies.

2.5.
Proofs. Since G has at least first 4k moments finite, we are entitled to use the finite-distribution convergence of the empirical function process G n as below. Let us begin to give the asymptotic law of µ n,ℓ . By denoting h ℓ (x) = x ℓ , we have where A(ℓ) is defined in (2.4) and where we used that the linearity of the empirical functional process. By observing that µ ℓ = ℓ p=0 C p ℓ (−m 1 ) ℓ−p (m p ), we finally obtain (2.10) √ n (µ n,ℓ − µ ℓ ) = G n (A(l)) + o p (1).
Now the law of b n,p is given by By the delta-method, we have and then √ n µ and next, by noticing from 2.10 that µ n,ℓ → µ ℓ for all ℓ ≤ 2k, , where B(p) is given in (2.5). By the very same methods, we have √ n (a n,p − a p ) = G n (C(p)) + o p (1), C(p) is stated in (2.6). The delta-method also yields . This completes the proof of the theorem. The proof of the corollary is a simple consequence of the theorem.

Simulation and Applications
3.1. Scope the study. We want to focus on illustratring how performs the general test for usual laws such as Normal and Double Gamma ones. It is clear that the generality of our results that are applicable to arbitrary d.f.'s with some finite k th -moment (k ≥ 2) deserves extended simulation studies for different classes of df 's. We particularly have to pay attention to the choice of k and of the functions f i and g i , depending on the specific model we want to test.
In this paper, we want to set a general and workable method to simulate and test two symmetrical models. The normal and the doubleexponential one with density f (x) = (λ/2) exp(−λ |x|). We expect to find a test that accepts normality for normal data and rejects doubleexponental data and to confirm this by the Jarque-Berra test, and to have an other test that exactely does the contrary.
Once these results are achieved, we would be in position to handle a larger scale simulation research following the outlined method. Specially, fitting financial data to the generalized hyberpoblic model is one the most interesting applications of our results.
3.2. The frame. We first choose all the functions f i equal to f 0 and all the functions g i equal to g 0 . We fix k = 3, that is we work with the first twelve moments. As a general method, we consider two df ′ s G 1 and G 2 . We fix one of them say G 1 and compute T (f, g, k) = T (f, g, k, G 1 ) and the variance σ 2 k from the exact distribution function G 1 . We generate samples of size n from one the df ′ s (either G 1 or G 2 ) and compute T n (f, g, k). We repeat this B times and report the mean value t * of the replicated values of T * n = √ n (T n (f, g, k) − T (f, g, k)) /σ and report the p-value p = P(|N (0, 1)| ≥ t * ). The simulation outcomes will be considered as conclusive if p is high for samples from G 1 and low for samples from G 2 . The results are compared with those given by the Kolmogorov-Smirnov test (KST) and when the data are Gaussian, they are compared with the outcomes from JB's classical test.
3.3. The results. We consider the following cases : G 1 is a Gaussian r. v N (m, σ 2 ); G 2 is double-exponential law E d (λ) with density probability f 2 (x) = (λ/2) exp(−λ |x|) and G 3 is a double-gamma law γ d (a, b) with probability density f 3 (x) = b a /(2Γ(a)) |x| a−1 exp(−b |x|). N(m,σ 2 ). The choice f 0 (x) = g 0 (x) = x 2 is natural since the Jarque-Berra test may be derived for our result for these functions and for k = 2. The model is determined by these following parameters : We recall that the variance of our statistic depends on the first 4k moments.

Normal Model
Simulation study. Testing the model with N (0, 1) data gives the following outcomes for n = 20 T n (f, g, k) T Our test rejects the E d (1) model for n = 11 and JB's test rejects it only for n ≥ 22. We see here the advantage brought by the value k = 3 in our statistic. The KST has problems in rejecting the false E d (1) even for n = 1000 that of Jarque-Berra.
Testing the double-gamma versus the normal model. We have similar results. Ou test rejects the E d (1) model for n = 12 and JB's test rejects it only for n ≥ 18. We see here the advantage brought by the value k = 3 in our statistic. Although the first four moments of a γ d (a 0 , 1) are 0, 1, 0 and 3, that is, the same of those of standard normal rv, this model is rejected. We already pointed out that the coefficients 4 and 6 are in fact based on the first eight moments and the discrepancy of moments higher than 4 results in the rejection.
Analysing the tables above, we conclude that our test performs better the JB's test against a double-gamma df with same skewness and kurtosis than a normal df for small sample sizes around ten and this is real advantage for small data sizes. Even for k = 2, our test is performant for the small values n = 11 and n = 12.
We point out that the statistic T n (f, g, k) does not depend on the λ. Then we only consider λ = 1 in the following. We always use f 0 (x) = g 0 (x) = x 2 . The model is determined by the following values.
Simulation. Testing the model with E d (λ) data gives the following outcomes, for n = 800.
T n (f, g, k) T * n p% E d (1) 7858, 0174 −0.41 41, 370 The simulation results are very stable and constantly suggest acceptance.
Testing normal data. Using normal data gives T n (f, g, k) T * n p% N (0, 1) 236.019 −3.044 0.11 The N (0, 1) model is rejected. We noticed that the rejection of normal data is automatically obtained for large sizes here, when n is greater than 900. For n between 500 and 900, rejection is frequent but acceptance occurs now and then. Whe also noticed that the variance of T * n are high and do not allow to reject normal data for small sizes. This leads us to consider other functions. Now consider the classes of functions θu + (1 + u p ) p , p even.
We obtain good results for n = 150 with θ = 0.1 and p = 2. In this case, the exact value of the statistic is 11.600. The double-exponential E d (1) model is confirmed according to the following table T n (f, g, k) T * n p% E d (1) 7.968 −0.7973 21.38 while the normal model is rejected as illustrated below : T n (f, g, k) T * n p% N (0, 1) 3.001 −1.87 3.01 It is important to mention here that the KST is very powerfull is rejecting the normal model with double-exponential and double-gamma data with extremely low p-value's.

Conclusion and perspectives.
We proposed a general test for an arbitrary model. The methods are based on functional empirical processes theory that readily provided asymptotic laws from which statistical tests are derived. They depend on an integer k such that the pertaining df has 4k first f inite moments. We got two kinds of tests. A general one based on functions f i and g i , i = 1, ..., k, with an asymptotic normal law. We derived from these results chi − square tests that are valid for general df 's and that includes the Jarque-Berra test of normality. Both tests used arbitrary moments. We only undergone simulation studies for the first kind of test. Our simulation studies showed high performance for normality against other symmetrical laws such as double-exponential or double-gamma ones. For suitable choices of f i , g i and k, the test performs well for small samples (n = 20) both for accepting the normal model and rejecting other models. We also showed that for suitable choice of f i and g i , the test for the doubleexponential model is also successful, but for sizes greater that n = 150. In upcoming papers, we will focus on detailed results on specific models and try to found out, for each case, suitable value of the parameters of the tests ensuring good performances for small data. A paper is also to be devoted to simulation studies for the khi − square tests and their applications to financial data.