Application of Equality Test of Coefficients of Variation to the Heteroskedasticity Test

DOI: 10.4236/ajcm.2020.101005   PDF   HTML   XML   87 Downloads   198 Views  

Abstract

The presence of heteroskedasticity in a considered regression model may bias the standard deviations of parameters obtained by the Ordinary Least Square (OLS) method. In this case, several hypothesis tests on the model under consideration may be biased, for example, CHOW’s coefficient stability test (or structural change test), Student’s t-test and Fisher’s F-test. Most of the heteroscedasticity tests in the literature are based on the comparison of variances. Despite the multiplication of equality tests of coefficients of variation (CVs) that have appeared in the literature, to our knowledge, the first and only use of the coefficient of variation in the detection of heteroskedasticity was offered by Li and Yao in 2017. Thus, this paper offers an approach to determine the existence of heteroskedasticity by a test of equality of coefficients of variation. We verify by a Monte Carlo robustness and performance test that our method seems even better than some tests in the literature. The results of this study contribute to the exploitation of the statistical measurement of CV dispersion. They help technicians economists to better verify their hypotheses before making a scientific decision when making a necessary forecast, in order to contribute effectively to the economic and sustainable development of a company or enterprise.

Share and Cite:

Tovohery, J.M., Totohasina, A. and Rajaonasy, F.D. (2020) Application of Equality Test of Coefficients of Variation to the Heteroskedasticity Test. American Journal of Computational Mathematics, 10, 73-89. doi: 10.4236/ajcm.2020.101005.

1. Introduction

Gauss-Markov’s theorem states that the least squares estimator is called BLUE, because it is the Best linear Unbiased Estimator, in the sense that it provides the lowest variances for estimators ( [1], p. 53). However, the presence of heteroskedasticity in a considered regression model may bias the standard deviations of parameters obtained by the Ordinary Least Square (OLS) method ( [2], p. 31). In this case, several hypothesis tests on the model under consideration may be biased, for example, CHOW’s coefficient stability test (or structural change test) ( [3], p. 25), Student’s t-test and Fisher’s F-test. Heterosedasticity tests are already available in the literature. Examples include the Levene test, the Goldfeld-Quandt test, the White test, the Gleisjer test and the Breush test. Most of these tests are based on the comparison of variances.

Today, tests of comparison of Coefficients of Variation (CVs) have appeared in the literature. Examples include the Curto test [4], the application of the Rényi divergence proposed by Pardo (1999) [5], the test based on a numerical approach by Gokpinar (2015) [6], the Forkman test [7], McKay and Miller’s statistics [8].

To our knowledge, the first use of the coefficient of variation in the detection of heteroskedasticity was offered by Li and Yao (2017) [9]. Thus, the question is: “is it possible to find an application of these CV equality tests to detect the existence of heteroskedasticity?”

The rest of this article is organized as follows: Section 2 will discuss the position of our problem; Section 3 will present a state of the art on heteroskedasticity test; Section 4 will propose an approach to using a CV equality test when detecting heteroskedasticity; and finally, a conclusion is given at the end.

2. Position of Problem

We have a simple linear regression model

y t = a 0 + a 1 x t + ϵ t , t = 1, n ¯ (1)

such that the ϵ t are the errors made when applying the model. We want to check if the variance of the errors is constant for t ranging from 1 to n. That is, we want to test if the model is homoscedastic or heteroscedastic. Figure 1 shows an example of homoscedastic model, and Figures 2-4 show three examples of

Figure 1. Homoskedastic model ( σ ϵ 2 = constant ).

Figure 2. Heteroscedastic model ( σ ϵ 2 increases with the exogenous variable).

Figure 3. Heteroscedastic model ( σ ϵ 2 decreases with the exogenous variable).

Figure 4. Heteroscedastic model ( σ ϵ 2 represents a concave look).

heteroscedastic model. We note that these four models all have the same regression line equation: y = x + 2 .

3. State of the Art on the Homoskedasticity Test

We consider the general linear regression model Y = X a + ϵ . The various tests, which we will mention below, consist in testing the following hypothesis:

{ nul hypothesis H 0 : σ ϵ t = σ , t = 1, n ¯ ( constant ) ; alternative hypothesis H 1 : there are t 1 and t 2 , such as σ t 1 σ t 2 .

3.1. Breusch-Pagan Test

The Breusch-Pagan Test assumes that the squares of the errors ϵ i 2 are related to the dependent variable Y. According to Leblond (2003) ( [2], p. 31), the Breusch-Pagan test is done in four steps:

1) Recover the residues ϵ t of the regression;

2) Generate the residue square ( ϵ i 2 );

3) Regress the residue square on the variables dependent on the original regression ( ϵ t 2 = a ^ 0 + a ^ 1 y t , where a ^ 0 and a ^ 1 to be determined);

4) Test if the coefficients are jointly significant (Perform the F-test):

F = R 2 / k ( 1 R 2 ) / ( n k 1 ) (2)

where k is the number of explanatory variables x i , n is the sample size and R 2 is the coefficient of determination of ϵ 2 and Y.

Decision-making: We accept the null hypothesis H 0 at the confidence level ( 1 α ) × 100 % , if F < F k ; n k 1 α , where F k ; n k 1 α is the critical value of F-distribution at risk α , at k and n k 1 degrees of freedom.

3.2. Goldfeld-Quandt Test

The Goldfeld-Quandt test assumes that there is an explanatory variable X i that influences the variance of errors, such as E ( ϵ 2 | X i ) = σ 2 + h ( X i ) , where h is an increasing function ( [10], p. 103). The test is summarized as follows:

1) Sort the observation values according to the increasing or decreasing values of the explanatory variable X i suspected of being the source of heteroskedasticity.

2) Divide the observations into two groups:

Y 1 = ( y 1 y 2 y n 1 ) , Y 2 = ( y n 2 + 1 y n 2 + 2 y n ) ,

X 1 = ( x 1,1 x 1,2 x 1, k x 2,1 x 2,2 x 2, k x n 1 ,1 x n 1 ,2 x n 1 , k ) , X 2 = ( x n 2 + 1,1 x n 2 + 1,2 x n 2 + 1, k x n 2 + 2,1 x n 2 + 2,2 x n 2 + 2, k x n ,1 x n ,2 x n , k )

where n 1 = n / 3 and n 2 = 2 n / 3 .

3) Calculate the error variance estimators for each sub-sample:

σ ^ 1 2 = [ ( Y 1 X 1 a ^ ) × ( Y 1 X 1 a ^ ) ] / ( n 1 k 1 ) = ( i = 1 n 1 e i 2 ) / ( n 1 k 1 ) (3)

σ ^ 2 2 = [ ( Y 2 X 2 a ^ ) × ( Y 2 X 2 a ^ ) ] / ( n n 2 k 1 ) = ( i = n 2 + 1 n e i 2 ) / ( n n 2 k 1 ) (4)

where a ^ is the estimator of the parameter a by the least squares method, e i = y i ( a ^ 0 + a ^ 1 x i 1 + + a ^ 1 x i k ) and k is the number of explanatory variables of the model.

4) Calculate the Goldfeld-Quandt statistic:

G Q = σ ^ 1 2 σ ^ 2 2 (5)

The G Q statistic follows the F-distribution at n 1 k 1 and n n 2 k 1 degrees of freedom, noted as F n 1 k 1 ; n n 2 k 1 .

Decision-making: The null hypothesis H 0 is rejected at confidence level ( 1 α ) × 100 % , if G Q > F n 1 k 1 ; n n 2 k 1 ; α .

3.3. Gleisjer’s Test

The Gleisjer test can detect both heteroskedasticity and the form that this heteroskedasticity takes ( [1], p. 150). The Gleisjer test assumes that there is a relationship between the error ϵ of the model and the variable X i assumed to be the cause of heteroskedasticity. The steps of the test are summarized as follows:

Step 1: Determination of the residues generated by the suspected variable X i .

1) Regress Y to X. This gives the simple regression model Y k = a X k i + b + ϵ k , k = 1, n ¯ .

2) Calculate the estimators of a and b using the Ordinary Least Squares method: a ^ and b ^ .

3) Estimate the model’s residues ε k by its estimators: e k = Y k ( a ^ X k i + b ^ ) , k = 1 , n ¯ .

Thus, the vector of residues e k is known.

Step 2: Proposal of possible forms of existing heteroskedasticity.

Gleisjer suggests testing different forms of possible relationships between | e | and X i , for example:

1) Type 1:

| e k | = a 0 + a 1 X k i + v k , k = 1 , n ¯ , (6)

where v k is the residue of this model. This relationship generates the type of heteroskedasticity σ ^ e k 2 = k 2 X k i 2 , where k is a non-zero real constant. Thus, the variance of errors is a function of the squares of the suspected explanatory variable X i .

2) Type 2:

| e k | = a 0 + a 1 X k i + v i , k = 1 , n ¯ . (7)

This relationship generates the type of heteroskedasticity σ ^ e k 2 = k 2 X k i . In this case, the variance of the errors is proportional to the values of the suspected explanatory variable X i

3) Type 3:

| e k | = a 0 + a 1 1 X k i + v i , k = 1 , n ¯ . (8)

This relationship leads to heteroskedasticity of type σ ^ e k 2 = k 2 X k j 2 .

Step 3: Detection of heteroskedasticity

Significance test of the regression coefficient a 1 :

t * = | a ^ 1 | σ ^ a ^ 1 , (9)

with

σ ^ a ^ 1 = σ ^ v 2 k = 1 n ( h ( X k i ) h ( X i ) ¯ ) 2

and

σ ^ v 2 = 1 n 2 k = 1 n [ | e k | ( a ^ 0 + a ^ 1 h ( X k i ) ) ] 2 ,

where

h ( x ) = { x , for a type 1 relation ship; x , for a type 2 relation ship; 1 x , for a type 3 relation ship .

t * follows the t-distribution at n 2 degrees of freedom.

Decision-making: The null hypothesis H 0 is rejected at confidence level ( 1 α ) × 100 % , if there is a t * , such that t * > t n 2 ; α .

If the existence of heteroskedasticity is validated, then the relationship with the highest t * represents the form of existing heteroskedasticity.

3.4. White’s Test

White’s test consists in testing the existence of a relationship between the square of the residue and one or more explanatory variables or its squares. The test procedures can be summarized as follows:

Step 1: Determination of model’s residues.

1) When the parameters of the model Y = X a + ϵ are estimated, then we have the estimation of the residues: e = Y ( X a ^ ) .

2) Step 2: Regression of e 2 to x 1 , x 1 2 , , x k and x k 2 and validation.

3) We consider the model:

e i 2 = a 1 x i 1 + b 1 x i 1 2 + a 2 x i 2 + b 2 x i 2 2 + + a k x i k + b 2 x i k 2 + a 0 + v i , i = 1, n ¯ , (10)

what can be written in matrix form: E = W u + v , where

E = ( e 1 2 e 2 2 e n 2 ) , u = ( a 0 a 1 b 1 a n b n ) , v = ( v 1 v 2 v n ) and W = ( 1 x 11 x 12 2 x 1 k x 1 k 2 1 x 21 x 22 2 x 2 k x 2 k 2 1 x n 1 x n 1 2 x n k x n k 2 )

4) The estimator of u is: u ^ = ( W W ) 1 W E

5) Calculate the variance of the errors: σ ^ ϵ 2 = i = 1 n v ^ i 2 n k 1 , with v ^ = E W u ^ .

6) Calculate the variance-covariance matrix of parameters a i and b i : Ω ^ u = σ ^ ϵ 2 ( W W ) 1 .

In this case, the variance of i-th element of the vector u is: σ ^ u ^ i = i-th element of the diagonal of Ω ^ u .

7) Significance test of parameters a 1 , b 1 , , a k , b k : We calculate: t a i * = | a ^ i | σ ^ a ^ i and t b i * = | b ^ i | σ ^ b ^ i , i = 1, k ¯ .

The statistics t a i * and t b i * follow the t-distribution at n k 1 degrees of freedom.

Decision-making: The null hypothesis H 0 is rejected at the confidence level ( 1 α ) × 100 % , if there is a t u i * , such that t u i * > t n k 1 ; α . That means, the null hypothesis H 0 is rejected if there is a parameter u i significantly different from 0.

3.5. ANOVA Methods

In order to determine the existence of heteroskedasticity, researchers proposed the method of analysis of variances, commonly said ANOVA. According to the application example presented in ( [1], p. 147-148), the application of ANOVA consists in dividing the observations into several classes of values. Following the example of this same example by R. Bourbonnais, we propose the following steps:

1) Order the observations according to the increasing values of the explanatory variable X i suspected to be the source of heteroskedasticity.

2) Group the value of the variable X i into z classes of values. To determine z, one of the following expressions can be used in ( [11], p. 33):

a) z = I n t ( n ) , where n is the total number of observations, and I n t ( . ) is the integer part function;

b) Sturge’s formula: z = I n t ( 1 + 3.3 l o g 10 ( n ) ) ;

c) Yule’s formula: z = I n t ( 2.5 n 4 ) .

3) Group the values of the variable to be explained Y according to their corresponding classes ( y i in the class corresponding to x i ). Thus, we obtain z samples of Y.

4) Apply the ANOVA test to the z samples of Y, then draw a conclusion.

In the following subsections, we will present some ANOVA tests that can be done in step 4.

3.5.1. Bartlett’s Test

Bartlett’s statistic1 is defined as follows:

B = Q L (11)

where

Q = ( n z ) ln ( i z n i 1 n z s i 2 ) i = 1 z ( n i 1 ) ln ( s i 2 ) ,

L = 1 + 1 3 ( z 1 ) ( i = 1 z 1 n i 1 1 n z ) ,

n = i = 1 z n i and is the number of observations belonging to the i-th class, ( [12], p. 273).

Remark: Bartlett’s statistic B follows the chi-square distribution with degrees of freedom, noted as, if the residues are independent and follow the standard normal distribution.

Decision-making: The homoskedasticity hypothesis is rejected at confidence level, if.

3.5.2. Levene’s Test

The Howard Levene’s statistic proposed in 1960 ( [13], p. 4) is defined as follows:

(12)

where,

• z is the number of groups or value categories obtained,

is the number of observations belonging to the i-th class, and ,

,

(average of in the i class),

(average of all).

Remark: Levene’s F statistic follows the F-distribution with and degrees of freedom, noted. Bartlett’s test is not robust if the normality assumption of is not verified. However, the Levene test is stable even in the absence of this hypothesis.

Decision making: The null hypothesis is rejected at the confidence level if.

3.5.3. Brown-Forsythe’s Test

The Brown-Forsythe test is an improvement on the Levene test. To get the Brown-Forsythe statistic, just change to, where is the median of the i-th group of values. Brown-Forsythe’s statistic is more robust than Levene’s.

3.5.4. Hartley’s Test

We define the Hartley’s statistic ( [14], p. 14) by:

(13)

where, and = variance of the Y values of the i-th group, such as.

Remark: The Hartley test cannot be used if the group sizes are not equal. The critical values of the H statistic are tabulated in the Hartley table.

Decision making: We reject null hypothesis at the confidence level if

3.5.5. Cochran’s Test

The Cochran’s statistic is defined as follows:

(14)

Remarks: The Cochran’s test cannot be used if the group sizes are not equal. The critical values of the C statistic are tabulated in the Cochran’s table.

Decision making: We reject the null hypothesis at the confidence level if.

3.6. Zhaoyuan Li and Jianfeng Yao Test

Zhaoyuan Li and Jianfeng Yao [9] proposed two measures to detect heteroskedasticity in a multivariate linear model.

1) Test based on the likelihood ratio:

(15)

where and ( [9], p. 9).

follows the standard normal distribution, and is the Euler’s constant ( [9], p. 10).

Decision making: the assumption is rejected at the confidence level, if, where is the quantile of at the risk threshold. For, we have.

2) Coefficient of variation test:

(16)

where.

follows the standard normal distribution ( [9], p. 11).

Decision making: the assumption is rejected at the confidence level, if.

This last test shows a trend in the use of coefficient of variation in the detection of heteroskedasticity.

4. Application of the Equality Test of Coefficients of Variation to the Heteroskedasticity Test

4.1. Our Approach

In this section, we will show that the test of equality of coefficients of variation allows us to detect the existence of heteroskedasticity. The steps of our approach can be summarized as follows:

1) Estimate the parameter a of the regression model of Y to X, noted as.

2) Estimate the model’s residues:.

3) Calculate the square of residues:.

4) As the Goldfeld-Quandt method, divide the residue squares into two groups:

where and.

5) Calculate the Johannes Forkman’s statistic ( [7], p. 10):

(17)

where for,

, ,

and.

Decision making: if, then we accept at the confidence level. is the quantile of F-distribution with and degrees of freedom.

We chose Forkman’s statistic because is stable for all, where ( [7], p. 11).

4.2. Monte Carlo Simulation

Now, we will test the robustness of these measures proposed in the literature and the one in which we have proposed.

4.2.1. Methodology

Like the Gleisjer method, our simulation consists of generating two variables X and Y of size, such as and (see the Section 3.3). Thus, we consider 3 forms of heteroskedasticity: 1), 2) and 3) ( [1], p. 151).

Moreover, in order to enrich the forms of heteroskedasticity studied, we also propose to take the other three forms considered by Li and Yao: 4) , 5) and 6) , where is a random variable following the standard normal distribution ( [9], p. 15).

In this simulation, we consider only the simple regression model. We repeat times this test, and we count the number k of times the test rejects the hypothesis at the 95% confidence level. Then, the probability is calculated.

As p is a random variable, then we repeat these procedures several times (1000 times), then we calculate. We really put ourselves in the

case where the error is significantly not negligible (value of sufficiently different from 0).

So, if, then the test is considered robust. In addition, the measure with the highest is the measure considered most sensitive to the type of error i considered ().

As we want to test the robustness of the test, then it would be better to check whether the test in question detects small variations or not. During the simulations we did, we took, , and. We took, because it is already different from 0, but judged subjectively low value.

In Table 1, the probabilities, , , , , , and correspond respectively to the rejection probabilities of the null hypothesis of the Breush, Goldfeld-Quandt, Gleisjer, White, Bartlette, Levene, Li and Yao tests, and our proposal.

4.2.2. Simulation Results

From Table 1, we obtain the classifications in Tables 2-6.

4.3. Discussion

First of all, from these simulations, it is indisputable that the Levene test is the most robust and sensitive of all the tests considered in this study.

However, these results show that, among the 06 forms of heteroskedasticity proposed, our proposal can detect 04 for, and 05 for.

In general, our proposal fails to detect the only form of heteroskedasticity (whether for or.)

Furthermore, it is the second best test to detect the heteroskedasticity of type for.

In addition, our proposal seems better than the Li and Yao test, which is, to our knowledge, the first tendency to use the coefficient of variation to detect heteroskedasticity.

Table 1. Results of monte carlo simulations.

Table 2. Classification of tests in ascending order according to their wrong acceptance numbers of H0.

Table 3. Classification in ascending order of tests according to their sensitivities to the 03 types of heteroskedasticity proposed by Gleisjeir for.

Table 4. Classification in ascending order of tests according to their sensitivities to the 03 types of heteroskedasticity proposed by Gleisjeir for.

Table 5. Classification in ascending order of tests according to their sensitivities to the 03 types of heteroskedasticity considered by Li and Yao for.

Table 6. Classification in ascending order of tests according to their sensitivities to the 03 types of heteroskedasticity considered by Li and Yao for.

Finally, these results contribute to the justification of the weakness of Bartlette’s test. Indeed, we see from these results that this test is less robust than our proposal.

5. Conclusions

In this paper, we proposed a technique to detect the existence of heteroskedasticity by an equality test of the coefficients of variation. Thus, to illustrate our state of the art, we first recalled some tests to detect the existence of heteroskedasticity existing in the literature, such as the Breusch-Pagan test, the Goldfeld-Quandt test, the Gleisjer test, the White test and some heteroskedasticity tests based on an analysis of variance (ANOVA): Bartlett’s test, Levene’s test, Brown-Forsythe’s test, Hartley’s test and Cochran’s test.

Next, we also presented the heteroskedasticity test of Zhaoyuan Li and Jianfeng Yao. To the best of our knowledge, the Zhaoyuan Li and Jianfeng Yao test was the first tendency to use coefficients of variation to determine the existence of heteroskedasticity.

Among the equality tests of coefficients of variation available in the literature, we have considered Forkman’s test to illustrate our approach, as it is a robust and stable test for a sample with size. The results of our performance tests have shown that our approach can detect 5 types of heteroskedasticity among the 6 types considered in this paper.

At the end of this analysis, we affirm that the equality test of coefficients of variation allows us to detect the existence of possible heteroskedasticity in a simple regression model. Thus, our study contributes to the reapplication of several equality tests of coefficients of variation that have already appeared in the literature.

Acknowledgements

We thank the Editor and the referee for their comments and assistance.

NOTES

1Maurice Stevenson Bartlett (June 18, 1910-January 8, 2002).

Conflicts of Interest

The authors declare that there is no conflict of interests regarding the publication of this paper.

References

[1] Bourbonnais, R. (2015) économétrie-Cours et exercices corrigés. Dunod 9è édition.
https://docplayer.fr/66598708-Econometrie-cours-et-exercices-corriges-regis-bourbonnais-9-e-edition.html
[2] Leblond, S. (2003) Guide d’économétrie appliquée. Université de Montréal, Montréal.
http://www2.cirano.qc.ca/~mccauslw/ECN3949/GuideEconometrie.pdf
[3] Hamisultane, H. (2002) économétrie. Licence. France.
https://halshs.archives-ouvertes.fr/cel-01261163
[4] Curto, J.D. and Pinto, J.C. (2009) The Coefficient of Variation Asymptotic Distribution in the Case of Non-IID Random Variables. Journal of Applied Statistics, 36, 21-32.
https://doi.org/10.1080/02664760802382491
[5] Pardo, M.C. and Pardo, J.A. (2000) Use of R\’{e}nyi Divergence to Test for the Equality of the Coefficients of Variation. Journal of Computational and Applied Mathematics, 116, 93-104.
https://doi.org/10.1016/S0377-0427(99)00312-X
[6] Gokpinar, G. and Esra, G. (2015) A Computational Approach for Testing Equality of Coefficients of Variation in k Normal Populations. Hacettepe Journal of Mathematics and Statistics, 44, 1197-1213.
https://doi.org/10.15672/HJMS.2014317482
[7] Krishnamoorthy, K. and Meesook, L. (2013) Improved Tests for the Equality of Normal Coefficients of Variation. In: Computational Statistics, Springer, New York.
https://doi.org/10.1007/s00180-013-0445-2
[8] Shipra, B., Kibria, B.M.G. and Sharma, D. (2012) Testing the Population Coefficient of Variation. Journal of Modern Applied Statistical Methods, 11, 325-335.
http://digitalcommons.wayne.edu/jmasm/vol11/iss2/5
https://doi.org/10.22237/jmasm/1351742640
[9] Zhaoyuan, L. and Jianfeng, Y. (2017) Testing for Heteroscedasticity in High-dimensional Regressions. Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong.
https://arxiv.org/abs/1510.00097
https://doi.org/10.1016/j.ecosta.2018.01.001
[10] Crépon, B. (2003) économétrie linéaire.
http://www.crest.fr/ckfinder/userfiles/files/Pageperso/crepon/poly20052006.pdf
[11] Chekroun, A. (2017) Statistiques descriptives et exercices. Universiti Abou Bekr Belkaid Tlemcen-Algérie.
https://www.coursehero.com/file/27893347/chekroun-statistiquespdf/
[12] Bertoneche, M. (1979) Existence d’hétéroscédasticité dans le modèle de marché appliqué aux bourses européennes de valeurs mobilières. Journal de la société statistique de Paris, tome 120, 270-276.
http://www.numdam.org/item/JSFS_1979__120_4_270_0
[13] Gastwirth, J.L., Gel, Y.R. and Miao, W. (2009) The Impact of Levene’s Test of Equality of Variances on Statistical Theory and Practice. Statistical Science, 24, 343-360.
https://doi.org/10.1214/09-STS301
[14] Vessereau, A. (1974) Essais interlaboratoires pour l’estimation de la fidélité des méthodes d’essais. Revue de statistique appliquée, tome 22, 5-48.
http://www.numdam.org/item/RSA_1974__22_1_5_0/

  
comments powered by Disqus

Copyright © 2020 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.