1. Goodness-of-Fit in Beta Distribution
The random variable
has a Beta distribution with two parameter
and
if it has a probability density function of the form [1]:
(1.1)
where
is known as the first shape parameter and
as the second shape parameter. Recently, Rahman and Amin [2] explored three different estimation procedures, Method of Moment Estimate (MME), a slight variation of MME by incorporating Median instead of Mean (MDE), and Maximum Likelihood Estimate (MLE). They showed that all three methods have similar performances and suggested using the MME due to convenience.
Here, we intend to test:
: the sample is from a Beta distribution (1.1).
: the sample is not from a Beta distribution (1.1).
There are many tests to check goodness-of-fit for a specific density function. In practice, people tend to use Chi-square goodness-of-fit as it is very easy to comprehend and perform necessary computations. Shapiro-Wilk test and Shapira-Francia test are usually implemented for Normal Distribution. Here, we intend to use Shapiro-Francia test along with other commonly used tests, such as, Anderson-Darling, Kolmogorov-Smirnov, and usual Chi-Square tests for Beta Distribution.
1.1. Anderson-Darling Test
The Anderson-Darling test assesses whether a sample comes from a specified distribution. It makes use of the fact that, when given a hypothesized underlying distribution and assuming the data does arise from this distribution, the cumulative distribution function (CDF) of the data can be assumed to follow a uniform distribution. Let us consider
be a random sample. Anderson-Darling statistic
is given by Anderson and Darling [3] as follows:
(1.2)
where
be the ordered measurements and
is the CDF (Cumulative distribution function) of (1.1). Extensive research has been conducted on the asymptotic distributions of this statistic. But here, we are proposing simulation distribution under the null hypothesis to obtain the upper tail p-value.
1.2. Kolmogorov-Smirnov Test
Kolmogorov-Smirnov test (K-S test or KS test) (Kolmogorov [4] and Smirnov [5]) is a nonparametric test of the equality of continuous or discontinuous, one-dimensional probability distributions that can be used to test whether a sample came from a given reference probability distribution. The Kolmogorov-Smirnov statistic quantifies the distance between the empirical distribution function of the sample and the cumulative distribution function of the reference distribution. The empirical distribution function
for
independent and identically distributed (i.i.d.) ordered observations
is defined as
(1.3)
where
is the CDF of the null hypothesis distribution. In literature, a wide range of research has been done to obtain asymptotic distributions of this statistic. But here, we are proposing simulation distribution under the null hypothesis to obtain the upper tail p-value.
1.3. Shapiro-Francia Test
The Shapiro-Francia test is a statistical test for the normality of a population, based on sample data. It was introduced by S. S. Shapiro and R. S. Francia in 1972 as a simplification of the Shapiro-Wilk test [6]. Let
be the
ordered values from a sample size
. Let
be the mean of the
order statistic when making
independent draws from a normal distribution.
(1.4)
where
is the mean of the sample and
is the mean of
’s. Note that this is a left tailed test. Rahman and Pearson [7] showed that practical computation of
’s can be done using exclusive Monte–Carlo simulation. In implementing this test for testing Beta distribution, Uniform
can be used as the Standard Normal
is used in case of test for Normal distribution. In case of
,
. To show the accuracy of such choice, we also implement Monte-Carlo simulation as was done by Rahman and Pearson [7] in case of Normal distribution.
1.4. Chi-Square Goodness-of-Fit Test
Standard Chi-Square Goodness-of-fit test is computed as
(1.5)
where
stands for the number of groups,
stands for the observed counts in the
group, and
stands for the expected counts under
in the
group. Note that
will follow approximate Chi-Square distribution with
degrees of freedom as both the parameters in the Beta distribution are assumed to be unknown.
2. Simulation Results
One thousand samples are generated from the
,
,
,
,
,
, and
distributions. Where
stands for Beta Distribution as
,
stands for mixture of Beta and Normal distribution as
,
being the mixing parameter,
stands for truncated normal distribution as
,
being the truncation value. Sample sizes are considered, 25, 50, and 100. Then proportions of rejections are computed for
level of significance and are presented in Table 1.
In Table 1, tests are represented as AD for Anderson-Darling test, KS for Kolmogorov-Smirnov test, WE for Shapiro-Francia test using
, WS for Shapiro-Francia test using
from simulation, and CS for Chi-Square test.
All tests, except CS, critical values are determined using the following algorithm.
Step 1: Generate a sample from a distribution mentioned above.
Step 2: Estimate parameters
and
as if
is true.
Step 3: Compute the test statistic and save.
Step 4: Generate 1000 samples from a Beta distribution with estimated parameter values in Step 2. Compute the respective test statistic to construct the simulated distribution. For WS test, also obtain simulated
at this step.
Step 5: Obtain p-value by comparing test statistic value in Step 3 and the simulated distribution in Step 4 and save.
Step 6: Repeat Step 1 through Step 5 to generate 1000 p-values.
Step 7: Count number of p-values in Step 6 below 0.05, then the proportions of rejections are displayed in Table 1.
Note that in CS computation
is used for
,
is used for
, and
is used for
, in addition equal probability maintained for each groups in deciding groups.
MATLAB software is used in all computations and the codes are readily available from the primary author.
Table 1. Proportions of rejections of
at
.
|
AD |
KS |
WE |
WS |
CS |
AD |
KS |
WE |
WS |
CS |
|
|
|
25 |
0.041 |
0.059 |
0.029 |
0.021 |
0.092 |
0.026 |
0.018 |
0.000 |
0.000 |
0.092 |
50 |
0.062 |
0.038 |
0.029 |
0.028 |
0.054 |
0.039 |
0.026 |
0.000 |
0.001 |
0.080 |
100 |
0.055 |
0.048 |
0.023 |
0.024 |
0.053 |
0.048 |
0.039 |
0.000 |
0.000 |
0.069 |
|
|
|
25 |
0.047 |
0.046 |
0.002 |
0.017 |
0.196 |
0.057 |
0.057 |
0.032 |
0.031 |
0.130 |
50 |
0.055 |
0.052 |
0.001 |
0.017 |
0.122 |
0.060 |
0.056 |
0.031 |
0.027 |
0.070 |
100 |
0.048 |
0.047 |
0.002 |
0.016 |
0.118 |
0.031 |
0.038 |
0.016 |
0.024 |
0.052 |
|
|
|
25 |
0.957 |
1.000 |
1.000 |
1.000 |
0.424 |
0.969 |
1.000 |
1.000 |
1.000 |
0.410 |
50 |
1.000 |
1.000 |
1.000 |
1.000 |
0.660 |
0.998 |
1.000 |
1.000 |
1.000 |
0.604 |
100 |
1.000 |
1.000 |
1.000 |
1.000 |
0.944 |
1.000 |
1.000 |
1.000 |
1.000 |
0.920 |
|
|
|
|
|
|
|
25 |
0.227 |
1.000 |
1.000 |
1.000 |
0.223 |
|
|
|
|
|
50 |
0.541 |
1.000 |
1.000 |
1.000 |
0.150 |
|
|
|
|
|
100 |
0.942 |
1.000 |
1.000 |
1.000 |
0.316 |
|
|
|
|
|
In Table 1, we notice that AD and KS tests estimates level of significance more accurately than other tests. WE and WS tests under estimates the level of significance but CS overestimates the level of significance. For Beta–Normal mixture alternatives, CS test shows lower power. For Truncated Normal alternative, AD and CS show lower power.
3. Application
When we applied the above mentioned goodmess-of-fit procedures in the Relative humidity data of air in May2007 from the Haarweg Wageningen weather station [8].
Table 2. Relative humidity.
0.40 |
0.44 |
0.50 |
0.55 |
0.58 |
0.62 |
0.65 |
0.69 |
0.72 |
0.72 |
0.73 |
0.75 |
0.77 |
0.80 |
0.81 |
0.81 |
0.83 |
0.83 |
0.85 |
0.85 |
0.85 |
0.86 |
0.86 |
0.87 |
0.87 |
0.89 |
0.92 |
0.94 |
0.94 |
0.97 |
The respective p-values for AD is 0.510, for KS is 0.000, for WE is 0.000, for WS is 0.000, and for CS is 0.160 (Table 2).
4. Conclusion and Remarks
Performance of Kolmogorov-Smirnov test is more consistent than other tests considered. Shapiro-Francia test has adequate performance and simulation means for order statistics are not needed as standard uniform distribution means have similar results as for the simulated means. The Chi-Square test performed poorly in terms of powers. Anderson-Darling test is not reliable for various alternatives.
It is also demonstrated that in testing for Beta Distribution using Shapiro-Francia test, there is no need to use simulated means of order statistics, it is suffices to use
.