1. Introduction
F distribution is one of the most frequently used distributions in statistics. It arises in many practical situations. For example, the test statistic for testing equality of variances of two independently distributed normal distributions is distributed as an F distribution. Another example is the test statistic for testing equality of means of k independent normal distributions with homogeneous variance is also distributed as an F distribution.
Johnson and Kotz [1] give a comprehensive review on the approximations to the cumulative distribution function (cdf) of the F distribution. Li and Martin [2] propose a shrinking factor approximation method and approximate the cdf of the F distribution by the cdf of the
distribution. On the other hand, considering testing equality of variances of two independent normal distributions, Wong [3] derives the modified signed log-likelihood ratio statistic. As a result, a normal approximation for the cdf of the F distribution is obtained. The approximation by Wong [3] has a theoretical order of convergence
.
In this paper, we consider the problem of testing equality of means of k independent normal distributions with homogeneous variance. Rather than the standard one-way ANOVA approach, we derive an adjusted log-likelihood ratio statistic, which is asymptotically distributed as
distribution such that the mean of this adjusted log-likelihood ratio statistic is exactly the same as the mean of the
distribution. As a result, a very accurate new
approximation for the cdf of the F distribution is obtained.
2. Bartlett Corrected Log-Likelihood Ratio Statistic
Let
be identical independently distributed random variables with joint log-likelihood function
, where
is a p-dimensional vector parameter. A frequently used asymptotic method for testing the hypothesis
(1)
is based on the asymptotic distribution of the log-likelihood ratio statistic. In particular, the log-likelihood ratio statistic is defined as
where
is the unconstrained maximum likelihood estimator of
, which is obtained by maximizing the log-likelihood function with respect to
, and
is the constrained maximum likelihood estimator of
, which is obtained by maximizing the log-likelihood function with respect to
subject to the constraint that
. Generally, this constrained maximum likelihood estimator of
can be obtained by the Lagrange multiplier method. With the regularity conditions stated in Cox and Hinkley [4] , it is well-known that W is asymptotically distributed as
distribution, where r is the degrees of freedom, which is the difference in the number of unconstrained parameters being estimated and the number of constrained parameters being estimated. Hence, the observed level of significance for testing the hypothesis in (1) is
, where w is the observed value of the log-likelihood ratio statistic W. Note that Cox and Hinkley [4] show that this method of obtaining the observed level of significance has order of convergence of only
.
There exists many different ways of improving the accuracy of the convergence of the log-likelihood ratio statistic. Barndorff-Nielsen and Cox [5] and Brazzale et al. [6] give detail review of some higher order asymptotic methods and their applications. Recently, Davison et al. [7] derive a directional test for a vector parameter of interest for the linear exponential families. The method is quite complicated, both in terms of theories and computations.
In this paper, we propose a statistic, which is very similar to the Bartlett corrected log-ikelihood ratio statistic. Bartlett [8] [9] show that the expected value of W can be expressed as
where b is known as the Bartlett factor. Since
does not equal to the mean of the
distribution, Bartlett [8] [9] propose to adjust the log-like- lihood ratio statistic by
such that
with rate of convergence
. Lawley [10] shows that in fact all cumulants of
agree with those of a
distribution to the same order. Lawley’s proof is very complicated. Barndorff-Nielsen and Cox [11] discuss a much simpler derivation based on the saddlepoint approximation. However, the Bartlett factor, b, in general, is very difficult to obtain. This limited the use of the Bartlett corrected log-likelihood ratio statistic in applied statistic.
In this paper, we propose to adjust the log-likelihood ratio statistic W such that the adjusted log-likelihood ratio statistic has exactly the same mean as the
distribution. In other words, let
(2)
is asymptotically distributed as
distribution. Thus, the observed level of significance for testing the hypothesis in (1) is
, where
is the observed value of
. Note that his adjusted log-likelihood ratio statistic is just a modified version of the Bartlett corrected log-likelihood ratio statistic.
In the next section, the proposed adjusted log-likelihood ratio statistic for testing the equality of means of k homoscedastic normally distributed populations is derived. By comparing to the standard F-test in the one-way ANOVA approach, an approximation of the cdf of the F distribution is obtained.
3. Main Result
Let
be independent normally distributed random variables with mean
and a common variance
, where
and
. Our aim is to test
(3)
From the one-way ANOVA approach, we have the following sum of squares:
and the degrees of freedom are
For testing the hypothesis in (3), the F-test is used. Denote the test statistic as
(4)
It is well-known that
is distributed as the F distribution with degrees of freedom
. Hence, the observed level of significance for testing the hypothesis in (3) is
with
being the observed value of
.
From the likelihood analysis point of view, let
, and the log-likelihood function can be written as
It can be shown that the unconstrained maximum likelihood estimator is
, where
Therefore
When the null hypothesis in (3) is true, the log-likelihood function can be written as
and the constrained maximum likelihood estimator is
, where
Thus, we have
Therefore, the log-likelihood ratio statistic is
and W is asymptotically distributed as
distribution with
degrees of freedom.
Our proposed method required to obtain
. Since
is distributed as F distribution with
degrees of freedom,
(5)
where
is the probability density function of the F distribution with degrees of freedom
. Therefore, the observed level of significance for testing the hypothesis in (3) based on the proposed adjusted loglikelihood ratio statistic is
where
is defined in (5) and
is the observed value of the test statistic given in (4).
By re-indexing the above approximation, let X be distributed as the
distribution, where
are the corresponding degrees of freedom. Then the cdf of X is
for
. Hence, the log-likelihood ratio statistic is
Since W is asymptotically distributed as
distribution, we have
However, this approximation has order of convergence
only.
The proposed approach gives
where
As a result,
Note that
does not have a closed form solution but it can be obtained numerically by software like R, Maple and Matlab. Table 1 records some values of
for
. Moreover,
Hence, the proposed approximation will be problematic when u is large. Never- theless, the
distribution has the inverse property:
that can be applied to obliviate this problem. Thus, the proposed approximation is:
(6)
4. Numerical comparisons
Wong [3] gives a simple and accurate normal approximation to the cdf of the
distribution, which has order of convergence
. It takes the form
where
is the cdf of the standard normal distribution,
It is of interest to compare the proposed method, to the approximation by Wong [3] .
Figures 1(a)-8(a) are the plots of the cumulative distribution functions for
the
distribution for various u and v obtained by the exact method, the
approximation by Wong [3] , and the proposed method. The difference between the two approximated cumulative distribution functions and the exact cumu- lative distribution function are barely noticeable. To explore the accuracy of the two approximations, we examine the relative error, which is defined as
(a) (b)
Figure 1. (a) cdf with (u,v) = (1,1); (b) Relative error.
(a) (b)
Figure 2. (a) cdf with (u,v) = (1,2); (b) Relative error.
(a) (b)
Figure 3. (a) cdf with (u,v) = (1,10); (b) Relative error.
(a) (b)
Figure 4. (a) cdf with (u,v) = (2,1); (b) Relative error.
(a) (b)
Figure 5. (a) cdf with (u,v) = (2,2); (b) Relative error.
(a) (b)
Figure 6. (a) cdf with (u,v) = (2,10); (b) Relative error.
(a) (b)
Figure 7. (a) cdf with (u,v) = (10,2); (b) Relative error.
(a) (b)
Figure 8. (a) cdf with (u,v) = (15,2); (b) Relative error.
Figures 1(b)-8(b) are the plots of the corresponding relative errors. It is clear that the proposed method generally outperformed the approximation by Wong [3] in all cases.
5. Conclusion
In this paper, a simple chi-square approximation to the cumulative distribution function of the F-distribution is obtained via an adjusted log-likelihood ratio statistic. Simulation studies illustrated that the new approximation outperformed the higher-order asymptotic method discussed in Wong (2008), regardless of how show the degrees of freedom are.