Empirical Determination of the Tolerable Sample Size for Ols Estimator in the Presence of Multicollinearity ( ρ )

This paper investigates the tolerable sample size needed for Ordinary Least Square (OLS) Estimator to be used when there is presence of Multicollinearity among the exogenous variables of a linear regression model. A regression model with constant term (β0) and two independent variables (with β1 and β2 as their respective regression coefficients) that exhibit multicollinearity was considered. A Monte Carlo study of 1000 trials was conducted at eight levels of multicollinearity (0, 0.25, 0.5, 0.7, 0.75, 0.8, 0.9 and 0.99) and sample sizes (10, 20, 40, 80, 100, 150, 250 and 500). At each specification, the true regression coefficients were set at unity while 1.5, 2.0 and 2.5 were taken as the hypothesized value. The power value rate was obtained at every multicollinearity level for the aforementioned sample sizes. Therefore, whether the hypothesized values highly depart from the true values or not once the multicollinearity level is very high (i.e. 0.99), the sample size needed to work with in order to have an error free estimation or the inference result must be greater than five hundred.


Introduction
There has been a serious argument between the researchers that multicollinearity problem could be solved with the increase of the sample size while some researchers say that Multicollinearity problem will also increase with the increase in the size of the sample.[1] stated that Multicollinearity problem could be solved by increase of the size of the sample if the presence of multicollinearity is due to errors of measurement as well as when intercorrelation happens to exist only in our original sample but not in the population [2].Because of these arguments this paper then investigates the tolerable sample size needed for Ordinary Least Square Estimator to be used when there is presence of Multicollinearity among the exogenous variables of a linear regression model before we can say that multicollinearity problem could be solved with increase of the sample size method.
Regression theory postulates that there exists a stochastic relationship between a variable Y and a set of other variables ( ) . In other words, Y (called the dependent, endogenous or explained variable) depends on other observed variables, 1 2 , , , n X X X  (called independent, exogenous or explanatory variables).However, one of the assumptions of this model is that the explanatory variables are independent.This is not often the case in economic variables.Variables like age and year of experience do exhibit a form of linear relationship.When this assumption is violated, it results into multicollinearity problem [3].
Multicollinearity could be perfect or imperfect.When it is perfect, estimates obtained are not unique [4].If multicollinearity is not perfect, the OLS estimator has been shown to be unbiased but inefficient.Other consequences or indications of multicollinearity problem include: 1. Small changes in the data can produce significant changes in the parameter estimates (regression coefficients).
2. The regression coefficients may have wrong signs and/or unreasonable magnitudes.
3. Regression coefficients have high standard errors which result in very low values of the t-statistic and thus affect the significance of the parameters [3] [5].
Thus, the presence of multicollinearity in a data set does not only affect parameter estimation using the OLS estimator but also inferences on the parameters of the model.Consequently, with generated collinear data, this paper attempts to investigate empirically the most tolerable sample size where power rate value of 0.99 or 1 would be obtained with ordinary least square (OLS) estimator.

Methodology
Consider the regression model of the form . If these variables are correlated, then 1 X and 2 X can be generated with the equations where ( ) and 1 ρ < is the value of correlation between the two variables [6]; and [7].Monte Carlo experiments were performed 1000 times for eight sample sizes (n = 10, 20, 40, 80, 100, 150, 250 and 500) and eight levels of multicollinearity (ρ = 0, 0.25, 0.5, 0.7, 0.75, 0.8, 0.9 and 0.99) with stochastic regressors that are normally distributed.At a particular specification of n and ρ (ascenario), the first replication was obtained by generating were generated using Equation (2) such that they exhibit ρ correlation.The values t y in Equation (1) were obtained by taking the true regression coefficients as unity.This process is continued until all the 1000 replications had been done.Another scenario is then started until all the scenarios were completed.For each replication in the scenario, the OLS estimator of parameter estimation was used to obtain estimate of the regression coefficients and hypothesis about the true regression coefficient was tested at 0.05 level of significance using the t-statistic to examine the type II error of the regression coefficients.All these were done by writing a computer program using the Time Series Processor (TSP) software.The result of the effect of type II error rate on OLS estimators by [8] was considered by taken the type II error rate ( ) β away from 1 to obtain the power rate value for every sample sizes at all le- vels of multicollinearity.These power rate values were then considered at all levels of multicollinearity for all the selected sample sizes.Then the sample size with the power rate value of 0.999 or 1.0 was chosen as the most tolerable sample size at each level of multicollinearity and different parameter values, [9] on effects of multicollinearity on the power rates of the Ordinary least Squares Estimators.

Results and Discussion
The summary of the most tolerable sample sizes at different level of multicollinearity and different possible combination of the parameter values are shown for 0 β , 1 β and 2 β in Tables 1-8.
When the true values of 1 β and 2 β are maintained and that of 0 β is allowed to change, The summary of the tolerable sample sizes required for the parameter 0 β to have a power rate value of 0.99 or 1 was determin- ed at different levels of multicollinearity and hypothesized values.The results for these are shown in Table 1.Likewise, when the true values of 0 β and 2 β are maintained and that of 1 β is allowed to change, The summary of the tolerable sample sizes required for the parameter 1 β to have a power rate value of 0.99 or 1 was determined at different levels of multicollinearity and hypothesized values.The results for these are shown in Table 2.
When the true values of 0 β and 1 β are maintained and that of 2 β is allowed to change, The summary of the tolerable sample sizes required for the parameter 2 β to have a power rate value of 0.99 or 1 was determin- ed at different levels of multicollinearity and hypothesized values.The results for these are shown in Table 3.The summary of the tolerable sample sizes at different levels of multicollinearity and hypothesized values are shown in Table 3.
Also, for all other possible combinations of the parameter values similar results were obtained.
From Table 1 to Table 8 the tolerable sample size value decreases as the hypothesized values departed from the true values in all lower levels of multicollinearity, whereas at higher levels of multicollinearity the required Tolerable sample sizes increases as the hypothesized values departed from the true value.But at very high level of multicollinearity (0.99) the Tolerable sample size needed must be greater than 500 before a result with.

Conclusion
In conclusion, at every multicollinearity level the most tolerable sample size was then obtained as the one with the highest value of power rate, which we were able to obtain at a sample size equal or greater than five hundred.This study has revealed that whether the hypothesized values highly depart from the true values or not once the multicollinearity level is very high (i.e.0.99), and the sample size needed to work with in order to have an error free estimation or inference result must be greater than five hundred, if and only if, increments of the size of the sample method would be used as a measure of correction to the presence of multicollinearity.

Table 1 .
The tolerable sample sizes for 0 β when the true values of 1 β and 2 β are maintained and that of 0 β are chan- ging at different levels of multicollinearity.

Table 2 .
The tolerable sample sizes for 1 β when the true values of 0 β and 2 β are maintained and that of 1 β is al- lowed to change at different levels of multicollinearity.

Table 3 .
The tolerable sample sizes for 2 β when the true values of 0 β and 1 β are maintained and that of 2 β is allow- ed to change, at different levels of multicollinearity.

Table 4 .
The tolerable sample sizes for 1 β when the true value for 0 β is maintained and that of 1 β and 2 β are al- lowed to change at different levels of multicollinearity.

Table 5 .
The tolerable sample sizes for 2 β when true value of is maintained and that of 1 β and 2 β are allow to change at different levels of multicollinearity.

Table 6 .
The tolerable sample sizes for 0 β when all the values for 0 β , 1 β and 2 β are allowed to change at different le- vels of multicollinearity.

Table 7 .
The tolerable sample sizes for 1 β when all the values for 0 β , 1 β and 2 β are allowed to change at different le- vels of multicollinearity.

Table 8 .
The tolerable sample sizes for 2