A Note on Beta Distribution Goodness of Fit

Abstract

The Beta Distribution is widely used in engineering and industrial applications. Goodness-of-fit procedures are revisited. Shapiro-Francia statistic is implemented in Beta distribution. A comparative study between the Anderson-Darling, Kolmogorov-Smirnov, Shapiro-Francia, and Chi-square goodness-of-fit test in testing for Beta distribution is performed using simulation.

Share and Cite:

Rahman, M. , Amin, M. and Hasan, M. (2025) A Note on Beta Distribution Goodness of Fit. Open Journal of Statistics, 15, 35-40. doi: 10.4236/ojs.2025.151002.

1. Goodness-of-Fit in Beta Distribution

The random variable X has a Beta distribution with two parameter α and β if it has a probability density function of the form [1]:

f( x;α,β )= Γ( α+β ) Γ( α )Γ( β ) x α1 ( 1x ) β1 ,0<x<1,α>0,β>0, (1.1)

where α is known as the first shape parameter and β as the second shape parameter. Recently, Rahman and Amin [2] explored three different estimation procedures, Method of Moment Estimate (MME), a slight variation of MME by incorporating Median instead of Mean (MDE), and Maximum Likelihood Estimate (MLE). They showed that all three methods have similar performances and suggested using the MME due to convenience.

Here, we intend to test:

H 0 : the sample is from a Beta distribution (1.1).

H 1 : the sample is not from a Beta distribution (1.1).

There are many tests to check goodness-of-fit for a specific density function. In practice, people tend to use Chi-square goodness-of-fit as it is very easy to comprehend and perform necessary computations. Shapiro-Wilk test and Shapira-Francia test are usually implemented for Normal Distribution. Here, we intend to use Shapiro-Francia test along with other commonly used tests, such as, Anderson-Darling, Kolmogorov-Smirnov, and usual Chi-Square tests for Beta Distribution.

1.1. Anderson-Darling Test

The Anderson-Darling test assesses whether a sample comes from a specified distribution. It makes use of the fact that, when given a hypothesized underlying distribution and assuming the data does arise from this distribution, the cumulative distribution function (CDF) of the data can be assumed to follow a uniform distribution. Let us consider X 1 , X 2 ,, X n be a random sample. Anderson-Darling statistic A 2 is given by Anderson and Darling [3] as follows:

A 2 =n 1 n i=1 n ( 2i1 )[ ln( F( Y i ) )+ln( 1F( Y n+1i ) ) ], (1.2)

where Y 1 , Y 2 ,, Y n be the ordered measurements and F is the CDF (Cumulative distribution function) of (1.1). Extensive research has been conducted on the asymptotic distributions of this statistic. But here, we are proposing simulation distribution under the null hypothesis to obtain the upper tail p-value.

1.2. Kolmogorov-Smirnov Test

Kolmogorov-Smirnov test (K-S test or KS test) (Kolmogorov [4] and Smirnov [5]) is a nonparametric test of the equality of continuous or discontinuous, one-dimensional probability distributions that can be used to test whether a sample came from a given reference probability distribution. The Kolmogorov-Smirnov statistic quantifies the distance between the empirical distribution function of the sample and the cumulative distribution function of the reference distribution. The empirical distribution function F n for n independent and identically distributed (i.i.d.) ordered observations X i is defined as

D n = sup x | F n ( x )F( x ) | (1.3)

where F( x ) is the CDF of the null hypothesis distribution. In literature, a wide range of research has been done to obtain asymptotic distributions of this statistic. But here, we are proposing simulation distribution under the null hypothesis to obtain the upper tail p-value.

1.3. Shapiro-Francia Test

The Shapiro-Francia test is a statistical test for the normality of a population, based on sample data. It was introduced by S. S. Shapiro and R. S. Francia in 1972 as a simplification of the Shapiro-Wilk test [6]. Let X ( i ) be the i th ordered values from a sample size n . Let m i:n be the mean of the i th order statistic when making n independent draws from a normal distribution.

W = i=1 n ( X ( i ) X ¯ )( m i m ¯ ) i=1 n ( X ( i ) X ¯ ) 2 i=1 n ( m i m ¯ ) 2 (1.4)

where X ¯ is the mean of the sample and m ¯ is the mean of m i ’s. Note that this is a left tailed test. Rahman and Pearson [7] showed that practical computation of m i:n ’s can be done using exclusive Monte–Carlo simulation. In implementing this test for testing Beta distribution, Uniform U( 0,1 ) can be used as the Standard Normal N( 0,1 ) is used in case of test for Normal distribution. In case of

U( 0,1 ) , m i = i n+1 . To show the accuracy of such choice, we also implement Monte-Carlo simulation as was done by Rahman and Pearson [7] in case of Normal distribution.

1.4. Chi-Square Goodness-of-Fit Test

Standard Chi-Square Goodness-of-fit test is computed as

χ 2 = k=1 g ( O k E k E k ) 2 (1.5)

where g stands for the number of groups, O k stands for the observed counts in the k th group, and E k stands for the expected counts under H 0 in the k th group. Note that χ 2 will follow approximate Chi-Square distribution with g21 degrees of freedom as both the parameters in the Beta distribution are assumed to be unknown.

2. Simulation Results

One thousand samples are generated from the B( 2.0,2.0 ) , B( 0.5,0.5 ) , B( 0.5,3.5 ) , B( 4.0,1.5 ) , BN( 0.25,0.5,0.5,0.25,0.25 ) , BN( 0.25,1.5,1.5,0.25,0.25 ) , and TN( 0,1,0.25 ) distributions. Where B stands for Beta Distribution as B( α,β ) , BN stands for mixture of Beta and Normal distribution as BN( γ,α,β,μ,σ ) , γ being the mixing parameter, TN stands for truncated normal distribution as TN( μ,σ,η ) , η being the truncation value. Sample sizes are considered, 25, 50, and 100. Then proportions of rejections are computed for α=0.05 level of significance and are presented in Table 1.

In Table 1, tests are represented as AD for Anderson-Darling test, KS for Kolmogorov-Smirnov test, WE for Shapiro-Francia test using m i = i n+1 , WS for Shapiro-Francia test using m i from simulation, and CS for Chi-Square test.

All tests, except CS, critical values are determined using the following algorithm.

  • Step 1: Generate a sample from a distribution mentioned above.

  • Step 2: Estimate parameters α and β as if H 0 is true.

  • Step 3: Compute the test statistic and save.

  • Step 4: Generate 1000 samples from a Beta distribution with estimated parameter values in Step 2. Compute the respective test statistic to construct the simulated distribution. For WS test, also obtain simulated m i at this step.

  • Step 5: Obtain p-value by comparing test statistic value in Step 3 and the simulated distribution in Step 4 and save.

  • Step 6: Repeat Step 1 through Step 5 to generate 1000 p-values.

  • Step 7: Count number of p-values in Step 6 below 0.05, then the proportions of rejections are displayed in Table 1.

Note that in CS computation g=4 is used for n=25 , g=7 is used for n=50 , and g=10 is used for n=100 , in addition equal probability maintained for each groups in deciding groups.

MATLAB software is used in all computations and the codes are readily available from the primary author.

Table 1. Proportions of rejections of H 0 at α=0.05 .

n

AD

KS

WE

WS

CS

AD

KS

WE

WS

CS

B( 2.0,2.0 )

B( 0.5,0.5 )

25

0.041

0.059

0.029

0.021

0.092

0.026

0.018

0.000

0.000

0.092

50

0.062

0.038

0.029

0.028

0.054

0.039

0.026

0.000

0.001

0.080

100

0.055

0.048

0.023

0.024

0.053

0.048

0.039

0.000

0.000

0.069

B( 0.5,3.5 )

B( 4.0,1.5 )

25

0.047

0.046

0.002

0.017

0.196

0.057

0.057

0.032

0.031

0.130

50

0.055

0.052

0.001

0.017

0.122

0.060

0.056

0.031

0.027

0.070

100

0.048

0.047

0.002

0.016

0.118

0.031

0.038

0.016

0.024

0.052

BN( 0.25,0.5,0.5,0.25,0.25 )

BN( 0.25,1.5,0.25,0.25 )

25

0.957

1.000

1.000

1.000

0.424

0.969

1.000

1.000

1.000

0.410

50

1.000

1.000

1.000

1.000

0.660

0.998

1.000

1.000

1.000

0.604

100

1.000

1.000

1.000

1.000

0.944

1.000

1.000

1.000

1.000

0.920

TN( 0,1,0.25 )

25

0.227

1.000

1.000

1.000

0.223

50

0.541

1.000

1.000

1.000

0.150

100

0.942

1.000

1.000

1.000

0.316

In Table 1, we notice that AD and KS tests estimates level of significance more accurately than other tests. WE and WS tests under estimates the level of significance but CS overestimates the level of significance. For Beta–Normal mixture alternatives, CS test shows lower power. For Truncated Normal alternative, AD and CS show lower power.

3. Application

When we applied the above mentioned goodmess-of-fit procedures in the Relative humidity data of air in May2007 from the Haarweg Wageningen weather station [8].

Table 2. Relative humidity.

0.40

0.44

0.50

0.55

0.58

0.62

0.65

0.69

0.72

0.72

0.73

0.75

0.77

0.80

0.81

0.81

0.83

0.83

0.85

0.85

0.85

0.86

0.86

0.87

0.87

0.89

0.92

0.94

0.94

0.97

The respective p-values for AD is 0.510, for KS is 0.000, for WE is 0.000, for WS is 0.000, and for CS is 0.160 (Table 2).

4. Conclusion and Remarks

Performance of Kolmogorov-Smirnov test is more consistent than other tests considered. Shapiro-Francia test has adequate performance and simulation means for order statistics are not needed as standard uniform distribution means have similar results as for the simulated means. The Chi-Square test performed poorly in terms of powers. Anderson-Darling test is not reliable for various alternatives.

It is also demonstrated that in testing for Beta Distribution using Shapiro-Francia test, there is no need to use simulated means of order statistics, it is suffices to use m i = i n+1 .

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

References

[1] Evans, M., Hastings, N., Peacock, B. et al. (2000) Statistical Distributions. 4th Edition, John Wiley & Sons Ltd.
[2] Rahman, M. and Amin, M.I. (2024). A Note on Parameter Estimation in Beta Distribution. Far East Journal of Theoretical Statistics, 68, 255-262.
https://doi.org/10.17654/0972086324015
[3] Anderson, T. W. and Darling, D. A. (1954). A Test of Goodness-of-Fit. Journal of the American Statistical Association, 49, 765-769.
https://doi.org/10.1080/01621459.1954.10501232
[4] Kolmogorov, A., Kolmogorov, A.N. et al. (1933) Sulla determinazione empirica di una legge di distribuzione. Istituto Italiano degli Attuari. Giornale, 4, 83-91.
[5] Smirnov, N. (1948) Table for Estimating the Goodness of Fit of Empirical Distributions. Annals of Mathematical Statistics, 19, 279-281.
https://doi.org/10.1214/aoms/1177730256
[6] Shapiro, S.S. and Francia, R.S. (1972) An Approximate Analysis of Variance Test for Normality. Journal of the American Statistical Association, 67, 215-216.
https://doi.org/10.1080/01621459.1972.10481232
[7] Rahman, M. and Pearson, L.M. (2000) Shapiro-Francia W`Statistic Using Exclusive Monte Carlo Simulation. Journal of the Korean Data & Information Science Society, 11, 139-155.
[8] Raschke, M. (2011) Empirical Behavior of Tests for the Beta Distribution and Their Application in Environmental Research. Stochastic Environmental Research and Risk Assessment, 25, 79-89.
https://doi.org/10.1007/s00477-010-0410-3

Copyright © 2025 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.