Heteroskedasticity-Consistent Covariance Matrix Estimators in Small Samples with High Leverage Points

The aim of this paper is to demonstrate the impact of high leverage observations on the performances of prominent and popular Heteroskedasticity-Consistent Covariance Matrix Estimators (HCCMEs) with the help of computer simulation. Firstly, we figure out high leverage observations, then remove them and recalculate the HCCMEs without these observations in order to compare the HCCME performances with and without high leverage points. We identify high leverage observations with the Minimum Covariance Determinant (MCD). We select from among different covariates and disturbance term variances from the related literature in simulation runs in order to compare the percentage difference between the expected value of the HCCME and true covariance matrix as well as the symmetric loss function. Our results revealed that the elimination of high leverage (high MCD distance) observations had improved the HCCME performances considerably and under some settings substantially, depending on the degree of leverage. We hope our theoretical findings will be benefited for practical purposes in applications.


Introduction
An important assumption of the classical linear regression model is homoskedasticity, that is, the variances of

Heteroskedasticity-Consistent Covariance Matrix Estimators
We consider the linear model y Xβ ε = + in which  is 1 T × vector of dependent variable, X is matrix of regressors, and ε is .Note that the disturbance terms are pairwise uncorrelated.
We define the covariance matrix belonging to the OLS coefficient estimator of ( ) . Here, the unique unknown is the Σ matrix whose diagonal elements are the variances of the error terms.If the variances of the error terms are assumed to be equal, then Σ is estimated by the OLS as ( ) where 2  σ is the variance of the error terms estimated as ( ) , by Efron (1982) [6],  [8], where H is the hat matrix, ( ) in HC5.Indeed, HC3 resembles the one-delete jackknife estimator whose formula is provided to MacKinnon and White (1985) [18] by an anonymous referee.

Robust Estimation
The model we introduced in the previous section can be estimated by OLS which still preserves its popularity due to its favorable statistical properties and ease of computation.But OLS is very sensitive to outliers and bad leverage points.One can change the OLS estimate of a coefficient even playing with one of the observations arbitrarily which means that the breakdown value of OLS is 1 T .The main purpose of robust regression techniques is to protect the estimation against misguiding results due to outliers and bad leverage points.These outliers maybe coming from recording or measurement errors and if that is the case then one has to get rid of bad effects stemming from them.If there are not any such mistakes and these exceptional observations belong to the original data set, one must be careful in preserving them since they may explain some important facts about the data generating processes.Many methods of robustness generated so far suffer from shortcomings, the most important of which is the weakness of detecting the outliers and bad leverage points.In many cases what we face is the coordinated action of such observations to group themselves where they are able to mask their deceiving behavior.Although, many robust techniques fail in such cases, what we use, namely the MCD, in this study is guaranteed to come over this handicap.The observations in any regression analysis can be classified into four (see [14]): 1) Regular observations with internal X and well-fitting y.
2) Vertical outliers with internal X but non-fitting y.
3) Good leverage points with outlying X and well-fitting y. 4) Bad leverage points with outlying X and non-fitting y.Good leverage points are very valuable to OLS since they pull the regression line to the target.On the other hand, bad leverage points and outliers are extremely harmful since they strongly pull the estimated regression line to the wrong direction.With this classification in hand, the robust method must be able to make diagnosis to classify each observation into the four categories above correctly.In this study we make use of MCD to detect the observations with high distances of the covariates.
More technically, the MCD initiated by Rousseeuw and Van Driessen (1999) [17] has the objective of finding h observations out of T, with lowest determinant of the covariance of regressors.The estimate for the center is the average of these h observations and the MCD estimate of spread is the covariance matrix.Indeed, the objective of MCD is to find h observations that forms a subset, say 1 H , of the T observations (here we pick just the regressors, we do not include the response variables in MCD calculations) in such a way that the average of these h observations is L where , i X is the th i observation of regressors.This is a statistic for the loca- tion of the regressors.The statistic for the covariance is S where ( ) ( )

∑
The distance for the th i observation, d , is defined on both location and covariance statistics, ( ) ( ) − The details can be found in Rousseeuw and Van Driessen (1999) [17].
We have coded a GAUSS procedure to return the observations with lowest MCD distances and included it in the Appendix.We flag the robust MCD distances that are greater than 2 χ critical values and remove them to improve the HCCME performances (see Zaman et al. (2001) [19] for a decent application similar to ours).

Simulation Runs
The literature includes voluminous papers with different simulation designs to test the HCCME performances, almost all papers stated in the Introduction can be cited.The main shortcoming of these designs is their peculiarity to settings focused without generalizations.In order to cover the behaviors of different design matrices and error term variances we made use of patterns listed in Table 1.The distributions we included for the covariates and error term variances are selected to reflect progressive layers of leverage and error term variances.The simple regression model we use in simulation runs is: We have fixed 0 max min to account for the degree of heteroskedasticity.Note that λ returns 1 under homoskedasticity and becomes higher in case of more intensive heteroskedasticity.The simulation program is coded in GAUSS 7 and we set the Monte Carlo sample size to 10,000 replications.
The program first generates the design matrix entries and then the error terms with variances listed in Table 1.Then dependent variable values are fixed according to the simple regression model.The MCD procedure code is run to detect the covariates with high leverages (i.e. with MCD distances larger than the critical 2 χ values).These detected observations are removed from the data set.We estimate Ω by HCCMEs with the original (full) sample and the sample without high leverages (short sample).Since the true covariances are different for full and short samples, we calculated the percentage differences to set the ground for comparisons.We calculated the quasi-t statistics which are quite common in such studies but did not report them since they are parallel to the percentage deviations.We also prepared the symmetric, entropy and quadratic losses but preferred just reporting the symmetric loss in order to save space because the losses are similar to each other and the percentage deviations as well.
Tables 2-6 are prepared to display the percentage deviations of the estimators from the true values for the diagonal entries of Ω .Cases 1 -5 are corresponding to covariates following patterns explained in the first column Table 1.Covariates and error term variances.

Covariates (Xi)
Error Term Variances (σi) Table 2. HCCME performances at full and short samples, Case 1.      of Table 1.For each such covariate pattern the error term variances are generated from the second column of Table 1 (Cases a, b, c, and d) in which c 0 = 0.1, c 1 = 0.2, c 3 = 0.3 and d 0 = 0.285.L is the number of high leverage observations removed.The column heads are F for full sample and S for short sample (that is free from the high leverage observations).Since this paper has the specific purpose of HCCME performances in small samples we have used the sample sizes of T = 20, 30, 40, 50, 60, 80 and 100.Table 2 displays the performances of the six HCCMEs for short and full samples.In this setting the covariates are intentionally generated from the uniform distribution to have no leverage point of the covariates.In Case a, homoskedasticity with error term variances set equal to 1, removing the occasional high leverage points or not does not make much difference.Full and short samples are having similar percentage errors.We generate different covariates each time we run the simulation and keep records of the covariates.One point that deserves attention is the huge percentage errors belonging to HC4 and HC5 when sample size is limited to 20.Furthermore, HC0 by White has the next greatest percentage difference as well as HC3, the top performers are HC2 followed by HC1.The other point to mention is that the estimators are sometimes biased downward and sometimes upward indicated by positive and negative differences.
We introduce heteroskedasticity in Case b.This time again the percentage differences for short samples do not perform better than the full sample estimates when sample size is large over 50 since the high leverage observations are limited.And if they exist the leverages indicated by MCD distances are very low.But short sample HCCMEs are slightly better when T < 50 and especially when T = 20.
For Case c, the short sample estimates are better than the full sample estimates, and the difference is even more significant when sample size gets lower.Note that the estimation performance becomes much better when the sample size increases.We note the superior performance of HC2 and the inferior performance of HC4 and HC5.In Case d, the short sample estimates are better than the full sample estimates at almost all sample sizes.The performances are very unstable when T = 20.It is interesting to note that HC4 and HC5 that are claimed to be better than others and introduced more recently, are doing much worse than all others including White's estimator with its bias proved in small samples.
In Case 2, we generated the covariates from standard normal distribution to have more leverages with higher MCD distances.Short sample estimates are having lower percentage differences than full sample especially when the sample size is small.All estimators perform better when the sample size increases to even 40.The percentage differences more than 10 shrink to less than 3 when T increases to 80.When heteroskedasticity is introduced in Case b, percentage differences at T = 20 and 30 are large, and they become mild soon after T = 50.The largest differences belong to HC4 and HC5 at T = 20, they are more than 2.5 times the true variances.Although the percentage differences get lower, there are surprises possible, for instance HC4 and HC5 have percentage differences larger than 20% even at T = 80.Similar comments are true for Cases c and d.
In Case 3, we generate covariates from t distribution with 3 degrees of freedom.Note that the density of this distribution has thick tails to let high leverage covariates.The short sample estimates are slightly better than the full sample ones with high leverage points.The performance of the estimators become better as the sample size increases and the difference becomes smaller in case of homoskedasticity for the intercept terms variance.Regarding the variance of the slope coefficient, the estimates without the high leverage observations are much better especially at small sample sizes of 20 and 30.This difference becomes lower as the sample size increases.This difference is preserved in Case b.The performance becomes much more apparent in Case c when sample size is 20 or 30.Note that the best performer is HC2 followed by HC3 regardless of the sample being large or small.The same is for Case d as well.In Case 4, the covariates are generated from the lognormal distribution to amplify the number and degree of leverages.This time the differences in performances are drastic.HC4 and HC5 estimators are yielding estimates much higher than the true values, indeed, we have observed much larger numbers and preferred to report them as ">1000", this means the estimated value is more than ten times the true value.More interestingly this sometimes happens at sample sizes of 80 and even 100.Note that HC2 comes up with reasonable estimates when the high leverage observations are removed and even HC2 as the best performer becomes failing as there are high leverage covariates.The situation becomes even worse when there is heteroskedasticity introduced in Cases c and d.The other point that deserves attention is the extra failure in the performance of HC4 and HC5.These findings suggest that one should refrain from using these estimators especially when there is high leverage and heteroskedasticity simultaneously.
Finally, in Case 5, we have the covariates from the ratio of two standard normals to let very large and very small values possible.The man goal is to have arbitrary number of high leverages with arbitrary degrees.This time we observe the failure of HC5 and HC4 again in Case a of homoskedasticity.All other HCCMEs have good performances especially when the sample size is greater than 30.When heteroskedasticity is introduced in Case b, the full sample estimates become much worse and the short sample HCCMEs are much better compared to them.Again HC2 is the best followed by HC1, HC3, and HC0.Both HC4 and HC5, but especially HC5, are too bad.The slope coefficient's variance in Case d deserves the attention since almost all estimators fail very badly in full sample, this case is a proper example of the benefit from detecting and eliminating the high leverage points.Note that with this removal the full sample percentage differences that are greater than 1000 are tamed to deviations less than 20%.
The computer code run is ready to be sent in case one asks it from us for all settings.The initializations in the           program are modified to try different alternatives.The regressors and the variance patterns are available upon request to return the same simulation results as well.
Although the percentage differences give a very sound idea of the HCCME performances in full and short samples, we include the symmetric loss as well in Table 7.The symmetric loss can be formulated as Sun and Sun (2005) [20] and can be used to assess the performances.
Case 1, covariates generated from the Uniform distribution, does not have high leverage(s) or the high leverage observations are very limited.That is why the short and full sample losses are very close.Still, the overall picture reveals that the short sample symmetric losses are slightly lower.The difference between short and full samples becomes significant when the covariates are generated from the Normal Distribution in Case 2. We note that the symmetric losses of the full sample are substantially lower than the full sample for all patterns of heteroskedasticity.The differences are more when the sample sizes are low at T = 20 and 30.We skip Case 3 with similar results in order to save space and in Case 4, the difference becomes massive, especially at lower sample sizes.Indeed, these results are in line with the Tables including the percentage differences.The differences between HCCME performances reflected to the symmetric losses are drastic, sometimes more than 20 fold (HC0 in Case 4a, T = 20 and without loose bounds the same Case for HC5).Similar comments are applicable for Case 5 as well.The other point that deserves attention is the increase in symmetric losses of the short sample for HC4 (Table 8).

Concluding Remarks
The purpose of this paper is the improvement of the HCCMEs with the removal of the high leverage points and Table 6.HCCME performances at full and short samples, Case 5.  this purpose is proven to be realized under the settings we used.Although there are exceptional cases where full sample performance is better than the short sample, in general the elimination of high leverage observations helps improve the HCCME performances.The study at the same time compares the HCCME performances.According to this comparison, the HCCME by Horn et al. (1975) [5] is the best performer under almost all settings with and without the high leverage points.This estimator is followed by Hinkley's (1977) [4] estimator.The improvement in HCCME by Cribari-Neto (2004) [7] and Cribari et al. (2007) [8] makes it a good competitor in the absence of high leverage observations.If the high leverage observations are not removed, both HC4 and HC5 are performing too badly.And Efron's (1982) [6] jackknife estimator appears as sometimes the second and sometimes the third best performer depending on the setting.Regarding the underestimation and overestimation of the HCCMEs, the percentage differences we report for White's HCCME are always negative which suggests that HC0 underestimates the true covariance matrix.The same is true for HC1 despite of a few exceptional occasions of 0 and positive figures whereas HC2 is negative for the majority of the cases.To the contrary, HC3, HC4, and HC5 are almost always positive.They overestimate the true covariance.Note that the removal of the high leverage points places HC4 and HC5 to the list of the top three performers.
The other contribution of the paper is the surprise faced in the performance of the two HCCMEs introduced recently, HC4 and HC5.We document that these two estimators are worst performers and the percentage differences of these estimators are dramatically high where HC4 is slightly better than HC5.This finding is in line with MacKinnon (2011) [10].
Under homoskedasticity, OLS is the best performer and there is no need to make use of the HCCMEs.Also there is not a significant improvement of detecting and removing the high leverage points.For the remaining

1 T
× vector of disturbance terms.The disturbance terms are assumed to have flexible variances to let heteroskedasticity, i.e.
estimate Ω led to the HCCMEs we list with references as:

Table 3 .
HCCME performances at full and short samples, Case 2.

Table 4 .
HCCME performances at full and short samples, Case 3.

Table 5 .
HCCME performances at full and short samples, Case 4.
Conference on Econometrics, Operations Research and Statistics, especially Prof. James MacKinnon who has kindly commented his positive opinion on our paper presented.