Performance of Existing Biased Estimators and the Respective Predictors in a Misspecified Linear Regression Model ()
1. Introduction
It is well known that the misspecification of the linear model is unavoidable in practical situations. Misspecification may occur due to including some irrelevant explanatory variables or excluding some relevant explanatory variables. Excluding some relevant explanatory variables from a regression model causes these variables to become part of the error term. In this case the mean of error term of the model is not zero. Furthermore, the excluded variables may be correlated with the variables in the model. According to the assumptions of linear regression model, the error term of the model should be independently and identically normally distributed with mean zero and variance
. Therefore, one or more assumptions of the linear regression model will be violated when the model is misspecified, and hence the estimators become biased and inconsistent.
Further, it is well known that the ordinary least square estimator (OLSE) does not hold its desirable properties if multicollinearity exists among the explanatory variables in the regression model. To overcome this problem, biased estimators based on the sample model
, or by combining sample model with the exact or stochastic restrictions have been used in the literature. The motivation of this article is to examine the performance of the existing biased estimators in the misspecified linear regression model when multicollinearity exists.
Sarkar [1] examined the consequences of omission of some relevant explanatory variables from a linear regression model when multicollinearity exists among the explanatory variables. Recently, Şiray [2] and Wu [3] discussed the efficiency of r-d class estimator and r-k class estimator over some existing estimators, respectively. Teräsvirta [4] was discussed the case of biased estimation with stochastic linear restrictions in the misspecified regression model due to including an irrelevant variable with incorrectly specified prior information. Later, the efficiency of Mixed Regression Estimator (MRE) under misspecified regression model due to excluding relevant variable with correctly specified prior information was discussed by Mittelhmmer [5] , Ohtani and Honda [6] , Kadiyala [7] and Trenkler and Wijekoon [8] . Further, the superiority of MRE over the Ordinary Least Squares Estimator (OLSE) under the misspecified regression model with incorrectly specified sample and prior information was discussed by Wijekoon and Trenkler [9] . Hubert and Wijekoon [10] have considered the improvement of Liu estimator under a misspecified regression model with stochastic restrictions.
In this paper, the performance of existing biased estimators of the linear regression model based on the sample information such as Principal Component Regression Estimator (PCRE) introduced by Massy [11] , Ridge Estimator (RE) defined by Hoerl and Kennard [12] , r-k class estimator proposed by Baye and Parker [13] , Almost Unbiased Ridge Estimator (AURE) proposed by Singh et al. [14] , Liu Estimator (LE) proposed by Liu [15] , Almost Unbiased Liu Estimator (AULE) proposed by Akdeniz and Kaçiranla r [16] , r-d class estimator proposed by Kaçıranlar and Sakallıoğlu [17] were examined under misspecified regression model without combining any prior information to the sample model. A generalized form to represent all the above estimators was used for comparing these estimators and their respective predictors easily.
The rest of this article is organized as follows. The model specification and respective OLSE are written in section 2. In section 3, generalized form to represent the estimators under the misspecified regression model is proposed. In section 4, the Mean Square Error Matrix (MSEM) and Scalar Mean Square Error (SMSE) comparison between two generalized estimators and their respective predictors are considered. In section 5, the numerical example and Monte Carlo simulation are given to illustrate the theoretical results in SMSE criterion. Finally, some concluding remarks are stated in section 6. The references and Appendix are given at the end of the paper.
2. Model Specification
Assume that the true regression model is given by
(2.1)
where
is the
vector of observations on the dependent variable,
and
are the
and
matrices of observations on the
regressors,
and
are the
and
vectors of unknown coefficients,
is the
vector of disturbances with mean vector zero
and dispersion matrix
.
Let us say that the researcher misspecifies the regression model by excluding
regressors as
(2.2)
According to Singh et al. [14] , by applying the spectral decomposition of the symmetric matrix
(since
is a positive definite matrix) we have
, where
is the orthogonal matrix and
being the ith eigenvalue of
. Let
be the remaining column of
having deleted
columns where
. Hence,
.
Let
and
then models (2.1) and (2.2) can be written in canonical form as
(2.3)
(2.4)
The OLS estimator of model (2.4) is given by
(2.5)
Using (2.3),
can be written as
(2.6)
Hence, the expectation vector and the dispersion matrix of
are given by
(2.7)
and
(2.8)
respectively.
3. Modified Biased Estimators, Predictors and Its Generalized Form
To combat multicollinearity several researchers introduce different types of biased estimators in place of OLSE, and seven such estimators are RE, AURE, LE, ALUE, PCRE, r-k class estimator and r-d class estimator given bellow respectively:
(3.1)
(3.2)
(3.3)
(3.4)
(3.5)
(3.6)
(3.7)
where
,
,
and
is the OLS estimator of the model (2.1).
Further, Xu and Yang [18] showed that Equations (3.5)-(3.7) could be rewritten as
(3.8)
(3.9)
(3.10)
In the case of misspecification, the RE, AURE, LE, AULE, PCRE, r-k class estimator and r-d class estimator for the model (2.4) can be written as
(3.11)
(3.12)
(3.13)
(3.14)
(3.15)
(3.16)
(3.17)
respectively.
where
,
,
,
,
,
and
.
It is clear that
and
are positive definite, and
,
and
are non-negative definite.
Now consider
and
Hence,
and
are also positive definite.
Since RE, AURE, LE, AULE, PCRE, r-k class estimator and r-d class estimator are based on
so we can use the following generalized form:
(3.18)
where
is positive definite matrix if it stands for
,
,
and
, and it is non-negative definite matrix if it stands for
and
.
The expectation vector, bias vector, dispersion matrix and the mean square error matrix can be calculated with
(3.19)
(3.20)
(3.21) (3)
Based on 3.19 to 3.21, the respective expectation vector, bias vector and dispersion matrix of the RE, AURE, LE, AULE, PCR, r-k class estimator and r-d class estimator can easily be obtained and given in Table A1 in the Appendix.
By using the approach of Kadiyala [7] , and Equations ((2.3) and (2.4)), the generalized prediction function can be defined as follows:
(3.22)
(3.23)
where
is the actual value and
is the corresponding predictor value.
The MSEM of the generalized predictor is given by
(3.24)
Note that the predictors based on the OLSE, RE, AURE, LE, AULE, PCRE, r-k class estimator and r-d class estimator are denoted by
and
respectively.
4. Mean Square Error Comparisons
4.1. Mean Square Error Matrix (MSEM) Comparison of Generalized Estimators
If two generalized biased estimators
and
are given, the estimator
is said to be superior to
with respect to MSEM sense if and only if
.
Let us consider
Now let
,
,
, then the above difference can be written as
(4.1)
The following theorem can be stated for superiority of
over
with respect to the MSEM criterion.
Theorem 1: If
is positive definite,
is superior to
in MSEM sense when the regression model is misspecified due to excluding relevant variables if and only if
and
, where
is the largest eigenvalue of
,
,
and
.
Proof: Assume that
is positive definite, which implies
.
Due to Lemma 3 (see Appendix),
if
, where
is the largest eigenvalue of
.
Hence, according to Lemma 2 (see Appendix),
is non-negative if and only if
, which completes the proof.
4.2. Scalar Mean Square Error (SMSE) Comparison of Generalized Estimators
If two generalized biased estimators
and
are given, the estimator
is said to be superior to
with respect to SMSE sense if and only if
.
The following theorem can be stated for superiority of
over
with respect to the SMSE criterion.
Theorem 2:
is superior to
when the regression model is misspecified due to excluding relevant variable with respect to SMSE sense if and only if
where
,
and
.
Proof: Let us consider
Using (4.1) we can write
Then
is superior to
if
.
if and only if
which completes the proof.
4.3. Mean Square Error Matrix (MSEM) Comparison of Generalized Predictors
If two generalized predictors
and
are given, the estimator
is said to be superior to
with respect to MSEM sense if and only if
.
Let us consider
The following theorem can be stated for superiority of
over
with respect to the MSEM criterion.
Theorem 3:
is superior to
in MSEM sense when the regression model is misspecified due to excluding relevant variables if and only if
,
and
, where
,
,
stands for column space of
and
is an independent choice of g-inverse of
.
Proof: Using (4.1) MSEM difference of the two generalized predictor can be written as
(4.2)
After some straight forward calculation, equation (5.1) can be written as
where
,
and
.
Due to Lemma 1 (see Appendix),
is non-negative definite matrix if and only if
,
and
, where
,
,
stands for column space of
and
is an independent choice of g-inverse of
, which completes the proof.
Note that, obviously the conditions derived under Theorem 1 are sufficient for
. Consequently we may say that there are situations where
is superior to
in MMSE sense.
4.4. Scalar Mean Square Error (SMSE) Comparison of Generalized Predictors
Using (4.2) SMSE difference of the two generalized predictor can be written as
The following theorem can be stated for superiority of
over
with respect to the SMSE criterion.
Theorem 4:
is superior to
in SMSE sense when the regression model is misspecified due to excluding relevant variables if and only if
Proof:
is superior to
if
.
if and only if
which completes the proof.
Based on Theorem 1, Theorem 2, Theorem 3 and Theorem 4 we can obtain the corresponding results for each of the biased estimators and respective predictors by plugging the values for
,
,
,
and
. The results are summarized in Tables A2-A6 in the Appendix.
5. Illustration of Theoretical Results
5.1. Numerical Example
To illustrate our theoretical results, we consider a dataset which gives total National Research and Development Expenditures―as a Percent of Gross National Product by Country: 1972-1986. It represents the relationship between the dependent variable
the percentage spent by the United States and the four other independent variables
,
,
and
. The variable
represents the percent spent by former Soviet Union,
that spent by France,
that spent by West Germany, and
that spent by the Japan. The data was discussed in Gruber [19] , and the data has been analysed by Akdeniz and Erol [20] , Li and Yang [21] and among others. Now we assemble the data as follows:
,
Note that the eigenvalues of the
are 312.932, 0.754, 0.045, 0.037, 0.002, the condition number is 299 and Variance Inflation Factor (VIF) values are 6.91, 21.58, 29.75, 1.79. Since condition number is greater than 100 and first three VIF values are greater than 5, which implies the existence of serious multicollinearity in the data set.
After the standardization of the data, the corresponding OLS estimator is
For the standardized data (since there are ten observations and four parameters), we obtain
Table 1 shows the estimated SMSE values of OLSE, RE, AURE, LE, AULE, PCRE, r-k class estimator and r-d class estimator for the regression model when
,
, and
with respect to shrinkage parameters (k/d), where
denotes the number of variable in the model and p denotes the number of misspecified variables.
Table 2 shows the estimated SMSE values of the predictor of OLSE, RE, AURE, LE, AULE, PCRE, r-k class estimator and r-d class estimator for the regression
![]()
Table 1. Estimated SMSE values of the estimators.
From Table 1, it can be observed that the minimum SMSE of the estimators depends on the values of shrinkage parameters and the level of misspecification, which is agreed with our theoretical findings.
![]()
Table 2. Estimated SMSE values of the predictors.
According to Table 2, it can be observed that the predictors behave differently than the respective estimators, which is also agreed with our theoretical findings.
model when
,
, and
for some selected shrinkage parameters (k/d). For simplicity we choose shrinkage parameter values k and d in the range (0, 1).
5.2. Monte Carlo Simulation Study
For further clarification, a Monte Carlo simulation study is done under different levels of misspecification using R 3.2.5. Following McDonald and Galarneau [22] , we can get explanatory variables as follows:
where
is an independent standard normal pseudo random number, and
is specified so that the theoretical correlation between any two explanatory variables is given by
. A dependent variable is generated by using the equation
where
is a normal pseudo random number with mean zero and variance one. In this study, we choose
as the normalized eigenvector corresponding to the largest eigenvalue of
for which
. We consider the following set up to investigate the effects of different degrees of multicollinearity on the estimators:
, condition number = 6.06 and VIF = (4.84, 4.83, 4.82, 4.81, 4.87)
, condition number = 20.12 and VIF = (46.09, 46.12, 46.02, 45.97, 46.56)
, condition number = 64 and VIF = (458.3, 459.2, 458.1, 457.8, 463.4)
Three different sets of observations are considered by selecting
,
and
when
, where
denotes the number of variable in the model and p denotes the number of misspecified variables. For simplicity, we select values k and d in the range
.
The simulation is repeated 2000 times by generating new pseudo random numbers and the simulated SMSE values of the estimators and predictors are obtained using the following equations:
and
respectively.
Tables 3-5 are showing the estimated SMSE values of the estimators for the regression model when
,
and
, and
,
and
for the selected values of shrinkage parameters (k/d), respectively. Tables 6-8 are showing the corresponding estimated SMSE values of the predictors for the regression model, respectively.
From Tables 3-8, we can summarise the results as shown in Table 9.
6. Conclusions
In this study, a common form of superiority conditions were obtained for comparison among the biased estimators (RE, AURE, LE, AULE, PCRE, r-k class estimator and r-d class estimator) and their predictors by using a generalized form for the misspecified linear regression model when explanatory variables are multicollinearity. Furthermore, the theoretical findings were analyzed by using a numerical example and a Monte Carlo simulation study.
The simulation study shows that the LE and RE outperform the other estimators when weak multicollinearity exist, and RE, r-k class and r-d class estimators
![]()
Table 3. Estimated SMSE values of the estimators when
and
According to Table 3, it can be observed that,
is superior over other estimators when
and
is superior over other estimators when
under diferent levels of misspecifications for the weak multicollinearity.
![]()
Table 4. Estimated SMSE values of the estimators when
and
According to Table 4, it can be observed that,
is superior over other estimators when
and
is superior over other estimators when
under different levels of misspecifications for the moderated multicollinearity.
![]()
Table 5. Estimated SMSE values of the estimators when
and
According to Table 5, it can be observed that,
is superior over other estimators when
under all level of misspecifications,
is superior over other estimators when
and
is superior over other estimators when
under
and
, and
is superior over other estimators when
and
is superior over other estimators when
under
for high multicollinearity.
![]()
Table 6. Estimated SMSE values of the predictors when
and
According to Table 6, it can be observed that,
is superior over other estimators when
and
is superior over other estimators when
under different levels of misspecifications.
![]()
Table 7. Estimated SMSE values of the predictors when
and
According to Table 7, it can be observed that,
is superior over other estimators when
and
is superior over other estimators when
under different levels of misspecifications.
![]()
Table 8. Estimated SMSE values of the predictors when
and
According to Table 8, it can be observed that,
is superior over other estimators when
and
is superior over other estimators when
under different levels of misspecifications.
![]()
Table 9. Shrinkage parameter ranges for superior estimators and predictors.
outperform the other estimators when moderated and high multicollinearity exist for selected values of shrinkage parameters, respectively. It can also be noted that, the predictors based on the LE and RE are always superior to the other predictors for selected values of shrinkage parameters when multicollinearity exists among explanatory variables.
One of the limitation of this study is that we assume the error variance is equal for all models even when the variables are omitted from the model.
Appendix
Lemma 1: (Baksalary and Kala [23] )
Let
of type
matrix,
is a
vector and
is a positive real number. Then the following conditions are equivalent.
i)
2)
,
and
, where
stands for column space of B and
is a independent choice of g-inverse of B.
Lemma 2: (Trenkler and Toutenburg [24] )
Let
and
be two linear estimator of
. Suppose that
is positive definite, then
is non negative if and only if
, where
,
and
denote dispersion matrix, mean square error matrix and bias vector of
respectively,
.
Lemma 3: (Wang et al. [25] )
Let
matrices
, then
if and only if
, where
is the largest eigenvalue of the matrix
.
![]()
Table A1. Expectation vector, Bias vector and Dispersion matrix of the estimators.