^{1}

^{*}

^{2}

^{3}

This paper investigates the tolerable sample size needed for Ordinary
Least Square (OLS) Estimator to be used when there is presence of Multicollinearity
among the exogenous variables of a linear regression model. A regression model
with constant term (*β*_{0})
and two independent variables (with *β*_{1} and *β*_{2} as their respective
regression coefficients) that exhibit multicollinearity was considered. A Monte
Carlo study of 1000 trials was conducted at eight levels of multicollinearity
(0, 0.25, 0.5, 0.7, 0.75, 0.8, 0.9 and 0.99) and sample sizes (10, 20, 40, 80,
100, 150, 250 and 500). At each specification, the true regression coefficients
were set at unity while 1.5, 2.0 and 2.5 were taken as the hypothesized value.
The power value rate was obtained at every multicollinearity level for the
aforementioned sample sizes. Therefore, whether the hypothesized values highly
depart from the true values or not once the multicollinearity level is very
high (i.e. 0.99), the sample size
needed to work with in order to have an error free estimation or the inference
result must be greater than five hundred.

There has been a serious argument between the researchers that multicollinearity problem could be solved with the increase of the sample size while some researchers say that Multicollinearity problem will also increase with the increase in the size of the sample. [

Regression theory postulates that there exists a stochastic relationship between a variable

Multicollinearity could be perfect or imperfect. When it is perfect, estimates obtained are not unique [

1. Small changes in the data can produce significant changes in the parameter estimates (regression coefficients).

2. The regression coefficients may have wrong signs and/or unreasonable magnitudes.

3. Regression coefficients have high standard errors which result in very low values of the t-statistic and thus affect the significance of the parameters [

Thus, the presence of multicollinearity in a data set does not only affect parameter estimation using the OLS estimator but also inferences on the parameters of the model. Consequently, with generated collinear data, this paper attempts to investigate empirically the most tolerable sample size where power rate value of 0.99 or 1 would be obtained with ordinary least square (OLS) estimator.

Consider the regression model of the form

where

Now, suppose

where

Monte Carlo experiments were performed 1000 times for eight sample sizes (n = 10, 20, 40, 80, 100, 150, 250 and 500) and eight levels of multicollinearity (ρ = 0, 0.25, 0.5, 0.7, 0.75, 0.8, 0.9 and 0.99) with stochastic regressors that are normally distributed. At a particular specification of n and

The summary of the most tolerable sample sizes at different level of multicolinearity and different possible com- bination of the parameter values are shown for

When the true values of

Table 1. The tolerable sample sizes for when the true values of and are maintained and that of are chan- ging at different levels of multicollinearity.

Parameter values | 0 | 0.25 | 0.5 | 0.7 | 0.75 | 0.8 | 0.9 | 0.99 |
---|---|---|---|---|---|---|---|---|

1.5, 1, 1 | 250 | 250 | 250 | 250 | 250 | 250 | 250 | 250 |

2, 1, 1 | 80 | 40 | 40 | 40 | 40 | 40 | 40 | 40 |

2.5, 1, 1 | 20 | 20 | 20 | 20 | 40 | 40 | 40 | 40 |

Table 2. The tolerable sample sizes for when the true values of and are maintained and that of is allowed to change at different levels of multicollinearity.

Parameter values | 0 | 0.25 | 0.5 | 0.7 | 0.75 | 0.8 | 0.9 | 0.99 |
---|---|---|---|---|---|---|---|---|

1, 1.5, 1 | 150 | 150 | 150 | 250 | 250 | 250 | 250 | >500 |

1, 2, 1 | 40 | 40 | 80 | 80 | 80 | 100 | 250 | >500 |

1, 2.5, 1 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | >500 |

Table 3. The tolerable sample sizes for when the true values of and are maintained and that of is allow- ed to change, at different levels of multicollinearity.

Parameter values | 0 | 0.25 | 0.5 | 0.7 | 0.75 | 0.8 | 0.9 | 0.99 |
---|---|---|---|---|---|---|---|---|

1, 1, 1.5 | 150 | 150 | 250 | 250 | 250 | 250 | 250 | 500 |

1, 1, 2 | 80 | 40 | 40 | 40 | 40 | 80 | 80 | >500 |

1, 1, 2.5 | 20 | 20 | 20 | 20 | 40 | 80 | 80 | 500 |

Table 4. The tolerable sample sizes for when the true value for is maintained and that of and are allowed to change at different levels of multicollinearity.

Parameter values | 0 | 0.25 | 0.5 | 0.7 | 0.75 | 0.8 | 0.9 | 0.99 |
---|---|---|---|---|---|---|---|---|

1, 1.5, 2 | 150 | 150 | 150 | 250 | 250 | 250 | 500 | >500 |

1, 1.5, 2.5 | 150 | 150 | 150 | 250 | 250 | 250 | 500 | >500 |

1, 2, 1.5, | 40 | 40 | 40 | 80 | 80 | 80 | 100 | >500 |

1, 2, 2.5 | 40 | 40 | 40 | 80 | 80 | 80 | 100 | >500 |

1, 2.5, 1.5 | 40 | 40 | 40 | 80 | 80 | 80 | 100 | >500 |

1, 2.5, 2 | 40 | 40 | 40 | 80 | 80 | 80 | 100 | >500 |

Table 5. The tolerable sample sizes for when true value of is maintained and that of and are allow to change at different levels of multicollinearity.

Parameter values | 0 | 0.25 | 0.5 | 0.7 | 0.75 | 0.8 | 0.9 | 0.99 |
---|---|---|---|---|---|---|---|---|

1, 1.5, 2 | 150 | 150 | 150 | 250 | 250 | 250 | 500 | >500 |

1, 1.5, 2.5 | 150 | 150 | 150 | 250 | 250 | 250 | 500 | >500 |

1, 2, 1.5, | 40 | 40 | 40 | 80 | 80 | 80 | 100 | >500 |

1, 2, 2.5 | 40 | 40 | 40 | 80 | 80 | 80 | 100 | >500 |

1, 2.5, 1.5 | 40 | 40 | 40 | 80 | 80 | 80 | 100 | >500 |

1, 2.5, 2 | 40 | 40 | 40 | 80 | 80 | 80 | 100 | >500 |

Table 6. The tolerable sample sizes for when all the values for, and are allowed to change at different le- vels of multicollinearity.

Parameter values | 0 | 0.25 | 0.5 | 0.7 | 0.75 | 0.8 | 0.9 | 0.99 |
---|---|---|---|---|---|---|---|---|

1.5, 2.5, 2 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 |

2, 1.5, 2.5 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 |

2.5, 2, 1.5 | 20 | 20 | 20 | 20 | 20 | 20 | 20 | 20 |

Table 7. The tolerable sample sizes for when all the values for, and are allowed to change at different le- vels of multicollinearity.

Parameter values | 0 | 0.25 | 0.5 | 0.7 | 0.75 | 0.8 | 0.9 | 0.99 |
---|---|---|---|---|---|---|---|---|

2, 1.5, 2.5 | 100 | 100 | 100 | 150 | 250 | 250 | 500 | 500 |

2.5, 2, 1.5 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 500 |

1.5, 2.5, 2 | 40 | 40 | 40 | 40 | 40 | 40 | 80 | 80 |

Table 8. The tolerable sample sizes for when all the values for, and are allowed to change at different le- vels of multicollinearity.

Parameter values | 0 | 0.25 | 0.5 | 0.7 | 0.75 | 0.8 | 0.9 | 0.99 |
---|---|---|---|---|---|---|---|---|

2.5, 2, 1.5 | 150 | 150 | 250 | 250 | 250 | 250 | 500 | 500 |

1.5, 2.5, 2 | 40 | 40 | 40 | 40 | 40 | 80 | 80 | >500 |

2, 1.5,2.5 | 40 | 40 | 80 | 80 | 80 | 150 | 250 | 500 |

Likewise, when the true values of

When the true values of

The summary of the tolerable sample sizes at different levels of multicollinearity and hypothesized values are shown in

Also, for all other possible combinations of the parameter values similar results were obtained.

From

In conclusion, at every multicollinearity level the most tolerable sample size was then obtained as the one with the highest value of power rate, which we were able to obtain at a sample size equal or greater than five hundred. This study has revealed that whether the hypothesized values were highly depart from the true values or not once the multicolinearity level is very high (i.e. 0.99), and the sample size needed to work with in other to have an error free estimation or inference result must be greater than five hundred, if and only if, increments of the size of the sample method would be used as a measure of correction to the presence of multicollinearity.