Performance of Existing Biased Estimators and the Respective Predictors in a Misspecified Linear Regression Model

In this paper, the performance of existing biased estimators (Ridge Estimator (RE), Almost Unbiased Ridge Estimator (AURE), Liu Estimator (LE), Almost Unbiased Liu Estimator (AULE), Principal Component Regression Estimator (PCRE), r-k class estimator and r-d class estimator) and the respective predictors were considered in a misspecified linear regression model when there exists multicollinearity among explanatory variables. A generalized form was used to compare these estimators and predictors in the mean square error sense. Further, theoretical findings were established using mean square error matrix and scalar mean square error. Finally, a numerical example and a Monte Carlo simulation study were done to illustrate the theoretical findings. The simulation study revealed that LE and RE outperform the other estimators when weak multicollinearity exists, and RE, r-k class and r-d class estimators outperform the other estimators when moderated and high multicollinearity exist for certain values of shrinkage parameters, respectively. The predictors based on the LE and RE are always superior to the other predictors for certain values of shrinkage parameters.


Introduction
It is well known that the misspecification of the linear model is unavoidable in practical situations.Misspecification may occur due to including some irrelevant explanatory variables or excluding some relevant explanatory variables.Excluding some relevant explanatory variables from a regression model causes these variables to become part of the error term.In this case the mean of error term of the model is not zero.Furthermore, the excluded variables may be correlated with the variables in the model.According to the assumptions of linear regression model, the error term of the model should be independently and identically normally distributed with mean zero and variance 2 σ .Therefore, one or more assumptions of the linear regression model will be violated when the model is misspecified, and hence the estimators become biased and inconsistent.
Further, it is well known that the ordinary least square estimator (OLSE) does not hold its desirable properties if multicollinearity exists among the explanatory variables in the regression model.To overcome this problem, biased estimators based on the sample model y Xβ ε = + , or by combining sample model with the exact or stochastic restrictions have been used in the literature.The motivation of this article is to examine the performance of the existing biased estimators in the misspecified linear regression model when multicollinearity exists.
Sarkar [1] examined the consequences of omission of some relevant explanatory variables from a linear regression model when multicollinearity exists among the explanatory variables.Recently, Şiray [2] and Wu [3] discussed the efficiency of r-d class estimator and r-k class estimator over some existing estimators, respectively.Teräsvirta [4] was discussed the case of biased estimation with stochastic linear restrictions in the misspecified regression model due to including an irrelevant variable with incorrectly specified prior information.Later, the efficiency of Mixed Regression Estimator (MRE) under misspecified regression model due to excluding relevant variable with correctly specified prior information was discussed by Mittelhmmer [5], Ohtani and Honda [6], Kadiyala [7] and Trenkler and Wijekoon [8].Further, the superiority of MRE over the Ordinary Least Squares Estimator (OLSE) under the misspecified regression model with incorrectly specified sample and prior information was discussed by Wijekoon and Trenkler [9].Hubert and Wijekoon [10] have considered the improvement of Liu estimator under a misspecified regression model with stochastic restrictions.
In this paper, the performance of existing biased estimators of the linear regression model based on the sample information such as Principal Component Regression Estimator (PCRE) introduced by Massy [11], Ridge Estimator (RE) defined by Hoerl and Kennard [12], r-k class estimator proposed by Baye and Parker [13], Almost Unbiased Ridge Estimator (AURE) proposed by Singh et al. [14], Liu Estimator (LE) proposed by Liu [15], Almost Unbiased Liu Estimator (AULE) proposed by Akdeniz and Kaçiranla r [16], r-d class estimator proposed by Kaçıranlar and Sakallıoğlu [17] were examined under misspecified regression model without combining any prior information to the sample model.A generalized form to represent all the above estimators was used for comparing these estimators and their respective predictors easily.
The rest of this article is organized as follows.The model specification and respective OLSE are written in section 2. In section 3, generalized form to represent the estimators under the misspecified regression model is proposed.In section 4, the Mean Square Error Matrix (MSEM) and Scalar Mean Square Error (SMSE) comparison between two generalized estimators and their respective predictors are considered.In section 5, the numerical example and Monte Carlo simulation are given to illustrate the theoretical results in SMSE criterion.Finally, some concluding remarks are stated in section 6.The references and Appendix are given at the end of the paper.

Model Specification
Assume that the true regression model is given by where y is the  ( ) ) Let us say that the researcher misspecifies the regression model by excluding p regressors as According to Singh et al. [14], by applying the spectral decomposition of the symmetric matrix

X X
′ is a positive definite matrix) we have ( ) , , , l T t t t =  is the orthogonal matrix and 0 i λ > being the i th eigenvalue of , , , r r T t t t =  be the remaining column of T having deleted l r − columns where r l ≤ .Hence, ( ) then models (2.1) and (2.2) can be written in canonical form as The OLS estimator of model (2.4) is given by ( ) Using (2.3), γ can be written as Hence, the expectation vector and the dispersion matrix of γ are given by ( )

Modified Biased Estimators, Predictors and Its Generalized Form
To combat multicollinearity several researchers introduce different types of biased estimators in place of OLSE, and seven such estimators are RE, AURE, LE, ALUE, PCRE, r-k class estimator and r-d class estimator given bellow respectively: ( ) where [ ] Further, Xu and Yang [18] showed that Equations (3.5)-(3.7)could be rewrit- In the case of misspecification, the RE, AURE, LE, AULE, PCRE, r-k class estimator and r-d class estimator for the model (2.4) can be written as respectively.
where ( )  The expectation vector, bias vector, dispersion matrix and the mean square error matrix can be calculated with ) ) Based on 3.19 to 3.21, the respective expectation vector, bias vector and dispersion matrix of the RE, AURE, LE, AULE, PCR, r-k class estimator and r-d class estimator can easily be obtained and given in Table A1 in the Appendix.
By using the approach of Kadiyala [7], and Equations ((2.3) and (2.4)), the generalized prediction function can be defined as follows: where 0 y is the actual value and ( ) ˆj y is the corresponding predictor value.
The MSEM of the generalized predictor is given by Let us consider ) ) ( ) , then the above difference can be written as The following theorem can be stated for superiority of ( ) ˆj γ over ( ) ˆi γ with respect to the MSEM criterion.
sense when the regression model is misspecified due to excluding relevant variables if and only if * 1 Hence, according to Lemma 2 (see Appendix), ( ) ≤ , which completes the proof.

Scalar Mean Square Error (SMSE) Comparison of Generalized Estimators
If two generalized biased estimators ( ) ˆi γ and ( ) ˆj γ are given, the estimator ( ) ˆj γ is said to be superior to ( ) ˆi γ with respect to SMSE sense if and only if The following theorem can be stated for superiority of Proof: Let us consider ( ) Using (4.1) we can write

Mean Square Error Matrix (MSEM) Comparison of Generalized Predictors
( ) The following theorem can be stated for superiority of ( ) stands for column space of A and 1 A − is an independent choice of g-inverse of A .
Proof: Using (4.1) MSEM difference of the two generalized predictor can be written as After some straight forward calculation, equation (5.1) can be written as ( ) where stands for column space of A and 1 A − is an independent choice of g-inverse of A , which completes the proof.
Note that, obviously the conditions derived under Theorem 1 are sufficient for 0 A ≥ .Consequently we may say that there are situations where ( ) ˆj y is supe- rior to ( ) ˆi y in MMSE sense.

Scalar Mean Square Error (SMSE) Comparison of Generalized Predictors
Using (4.2) SMSE difference of the two generalized predictor can be written as The following theorem can be stated for superiority of ( ) which completes the proof.X that spent by West Germany, and 4 X that spent by the Japan.The data was discussed in Gruber [19], and the data has been analysed by Akdeniz and Erol [20], Li and Yang [21] and among others.Now we assemble the data as follows:

Numerical Example
Note that the eigenvalues of the For the standardized data (since there are ten observations and four parameters), we obtain  1, it can be observed that the minimum SMSE of the estimators depends on the values of shrinkage parameters and the level of misspecification, which is agreed with our theoretical findings.Open Journal of Statistics levels of misspecification using R 3.2.5.Following McDonald and Galarneau [22], we can get explanatory variables as follows: ( ) where ij z is an independent standard normal pseudo random number, and ρ is specified so that the theoretical correlation between any two explanatory variables is given by 2 ρ .A dependent variable is generated by using the equation ; 1, 2, , .
where i ε is a normal pseudo random number with mean zero and variance one.
In this study, we choose ( )   From Tables 3-8, we can summarise the results as shown in Table 9.

Conclusions
In this study, a common form of superiority conditions were obtained for comparison among the biased estimators (RE, AURE, LE, AULE, PCRE, r-k class estimator and r-d class estimator) and their predictors by using a generalized form for the misspecified linear regression model when explanatory variables are multicollinearity.Furthermore, the theoretical findings were analyzed by using a numerical example and a Monte Carlo simulation study.
The simulation study shows that the LE and RE outperform the other estimators when weak multicollinearity exist, and RE, r-k class and r-d class estimators

Theorem 1 ,
Theorem 2, Theorem 3 and Theorem 4 we can obtain the corresponding results for each of the biased estimators and respective predictors by plugging the values for ( ) results are summarized in Tables A2-A6 in the Appendix.

1 X , 2 X , 3 X and 4 X . The variable 1 X 2 X
To illustrate our theoretical results, we consider a dataset which gives total National Research and Development Expenditures-as a Percent of Gross National Product by Country: 1972-1986.It represents the relationship between the dependent variable Y the percentage spent by the United States and the four other independent variables represents the percent spent by former Soviet Union, that spent by France,

3
when 50 n = , where l denotes the number of variable in the model and p denotes the number of misspecified variables.For simplicity, we select values k and d in the range ( ) 0,1 .The simulation is repeated 2000 times by generating new pseudo random numbers and the simulated SMSE values of the estimators and predictors are obtained using the following equations: showing the estimated SMSE values of the estimators for the regression model when ( ) ( ) the selected values of shrinkage parameters (k/d), respectively.Tables 6-8 are showing the corresponding estimated SMSE values of the predictors for the regression model, respectively.
1 n × vector of observations on the dependent variable, 1 X

Mean Square Error Comparisons 4.1. Mean Square Error Matrix (MSEM) Comparison of Generalized Estimators
) Note that the predictors based on the OLSE, RE, AURE, LE, AULE, PCRE, r-k class estimator and r-d class estimator are denoted by 4.

Table 1
shows the estimated SMSE values of OLSE, RE, AURE, LE, AULE, PCRE, r-k class estimator and r-d class estimator for the regression model when ( ) ( )

Table 1 .
Estimated SMSE values of the estimators.
From Table

Table 3 .
Estimated SMSE values of the estimators when

Table 4 .
Estimated SMSE values of the estimators when

Table 5 .
Estimated SMSE values of the estimators when

Table 6 .
Estimated SMSE values of the predictors when

Table 7 .
Estimated SMSE values of the predictors when

Table 8 .
Estimated SMSE values of the predictors when .58 131.09 163.68 201.37 244.14 292.01According to Table 8, it can be observed that, ˆd y is superior over other estimators when