Optimal Generalized Biased Estimator in Linear Regression Model

The paper introduces a new biased estimator namely Generalized Optimal Estimator (GOE) in a multiple linear regression when there exists multicollinearity among predictor variables. Stochastic properties of proposed estimator were derived, and the proposed estimator was compared with other existing biased estimators based on sample information in the the Scalar Mean Square Error (SMSE) criterion by using a Monte Carlo simulation study and two numerical illustrations.


Introduction
To overcome the multicollinearity problem in the linear regression model, several biased estimators were defined in the place of Ordinary Least Squares Estimator (OLSE) to estimate the regression coefficients, and the properties of these were discussed in the literature.Some of these estimators are based on only sample information such as Ridge Estimator (RE) [1], Almost Unbiased Ridge Estimator (AURE) [2], Liu Estimator (LE) [3] and Almost Unbiased Liu Estimator (AULE) [4].However for each case, the researcher has to obtain their properties and to compare the superiority of one estimator over another estimator when selecting a suitable estimator for a practical situation.Therefore [5] proposed a Generalized Unrestricted Estimator (GURE) ˆĜURE A β β = to represent the RE, AURE, LE and AULE, where β is the OLSE, and A is a positive definite matrix which depends on the cor- responding estimators of RE, AURE, LE and AULE.
However, the researchers are still trying to find the best estimator by changing the matrix A compared to the already proposed estimators based on sample information.Instead of changing A, in this research we introduce a more efficient new biased estimator based on optimal choice of A.
The rest of the paper is organized as follows.The model specification and estimation is given in Section 2. In Section 3, we propose a biased estimator namely Generalized Optimal Estimator (GOE), and we obtain its stochastic properties.In Section 4 we compare the proposed estimator with some biased estimators in the Scalar Mean Square Error criterion by using a real data set and a Monte Carlo simulation.Finally some conclusion remarks are given in Section 5.

Model Specification and Estimation
First we consider the multiple linear regression model where y is an 1 n × observable random vector, X is an n p × known design matrix of rank p, β is a 1 p × vector of unknown parameters and ε is an 1 n × vector of disturbances.The Ordinary Least Square Estimator of β and 2 σ are given by and respectively, where ′ = S X X .The Ridge Estimator (RE), Almost Unbiased Ridge Estimator (AURE), Liu Estimator (LE) and Almost Unbiased Liu Estimator (AULE) are some of the biased estimators proposed to solve the multicollinearity problem which are based only on sample information.The estimators are given below: RE: Since RE, LE, AURE and AULE are based on OLSE, [5] proposed a generalized form to represent these four estimators, the Generalized Unrestricted Estimator (GURE) which is given as where ( ) i A is a positive definite matrix and ( ) The bias vector, dispersion matrix and MSE matrix of ˆGURE β are given as ( ) ( ) ( ) and respectively.

Instead of changing the matrix ( )
i A to introduce a new biased estimator, in this research we obtain the optimal choice of ( ) i A by minimizing the Mean Square Error Matrix (MSEM) of GURE with respect to ( ) i A .

The Proposed Estimator
From (7) the following equation can be obtained by taking trace operator as ( ) By minimizing (8) with respect to ( ) i A , the optimum ( ) where

′ ′ = − − B I A A A I A
Now we can simplify the matrix B as Now we will use the following three results (see [6], p. 521, 522) to obtain the (a) Let M and X be any two matrixes with proper order.Then ( ) ( ) By applying (a) we can obtain Now we consider By using (b) and (c) we can obtain and respectively.Hence Substituting ( 11) and ( 15) to ( 9), we can derive that Equating ( 16) to a null matrix, we can obtain the optimal matrix ( ) Note that ( ) (see [6], p. 494).Now we are ready to propose a biased estimator namely Generalized Optimal Estimator (GOE) as The bias vector, dispersion matrix, mean square error matrix and scalar mean square error of GOE  β can be obtained as respectively.
Note that since ′ = P ββ is symmetric and Therefore R is symmetric and idempotent matrix.Now we write the optimal matrix ( ) . Now the bias vector, dispersion matrix, mean square error matrix and scalar mean square error of GOE  β can be rewritten as ( ) ( ) respectively.
For practical situations we have to replace the unknown parameters β and 2 σ .For an estimated value for β the OLSE, RE, LE, AURE or AULE can be used.For the estimated value for 2 σ we can use either (2) or replace β in (2) by RE, LE, AURE or AULE accordingly.In the next section we will discuss the superiority of estimators when replacing each of these estimated values by using a simulation study, and then we use numerical examples for further illustration.

Monte Carlo Simulation
To study the behavior of our proposed estimator, we perform the Monte Carlo Simulation study by considering different levels of multicollinearity.Following [7] we generate explanatory variables as follows: ( ) , where ij z is an independent standard normal pseudo random number, and γ is specified so that the theoretical correlation between any two explanatory variables is given by 2 γ .A dependent variable is generated by using the equation., 1, 2, , , ε is a normal pseudo random number with mean zero and variance 2 i σ . [8]have noted that if the MSE is a function of 2 σ and β , and if the explanatory variables are fixed, then subject to the constraint ′ β β , the MSE is minimized when β is the normalized eigenvector corresponding to the largest eigenvalue of the ′ X X matrix.In this study we choose the normalized eigenvector corresponding to the largest eigenvalue of ′ X X as the coefficient vector β , 50 n = , 4 p = and 2 1 i σ = .Four different sets of correlations are considered by se- lecting the values as 0.8 γ = , 0.9, 0.99 and 0.999.Table 1 can be obtained by using estimated SMSE values obtained by using equations ( 7) and (21) for different shrinkage parameter d or k values selected from the interval (0, 1).The SMSE of GOE-OLSE, GOE-RE, GOE-LE, GOE-AURE and GOE-AULE are obtained by substituting OLSE, RE, LE, AURE and AULE in equation ( 21) respectively instead of β and 2 σ .

Numerical Example
To further illustrate the behavior of our proposed estimator we consider two data sets.First we consider the data set on Portland cement originally due to [9].This data set has since then been widely used by many researchers such as [10]- [13].This data set came from an experimental investigation of the heat evolved during the setting and hardening of Portland cements of varied composition and the dependence of this heat on the percentages of four compounds in the clinkers from which the cement was produced.The four compounds considered by [9] 2 can be obtained by using estimated SMSE values obtained by using Equations ( 7) and (21) for different shrinkage parameter d or k values selected from the interval (0, 1).The SMSE of GOE-OLSE, GOE-RE, GOE-LE, GOE-AURE and GOE-AULE are obtained by substituting OLSE, RE, LE, AURE and AULE in Equation ( 21) respectively instead of β and 2 σ .
From Table 2 we can notice that the GOE-OLSE, GOE-RE, GOE-LE, GOE-AURE and GOE-AULE have the smallest scalar mean square error value with compared to RE, LE, AURE, AULE.Therefore we can suggest the GOE to estimate the regression coefficients.When k is large, GOE-OLSE has smallest SMSE than GOE-RE.When d is small, GOE-OLSE has smallest SMSE than GOE-LE.Now we consider the second data set on Total National Research and Development Expenditures as a Percent of Gross National product originally due to [14], and later considered by [15]- [17].
The four column of the 10 × 4 matrix X comprise the data on 1 x , 2 x , 3 x and 4 x respectively, and y is the predictor variable.
For this particular data set, we obtain the following results: a) The eigen values of ′ X X : 302.9626, 0.7283, 0.  3 can also be obtained by using estimated SMSE values obtained by using Equations ( 7) and (21) for different shrinkage parameter d or k values selected from the interval (0, 1).The SMSE of GOE-OLSE, GOE-RE, GOE-LE, GOE-AURE and GOE-AULE are obtained by substituting OLSE, RE, LE, AURE and AULE in Equation (21) respectively instead of β and 2 σ .
From Table 3, we can say that our proposed estimator is superior to RE, LE, AURE and AULE, GOE-RE, GOE-LE, GOE-AURE and GOE-AULE.

Conclusions
In this paper we proposed a new biased estimator namely Generalized Optimal Estimator (GOE) in a multiple linear regression when there exists multicollinearity problem in the independent variables.The proposed estimator is superior to biased estimators which are based on sample information and takes the form Based on Tables 1-3, it can be concluded that the proposed estimator has smallest scalar mean square error values compared with RE, LE, AURE and AULE.We can also suggest that GOE-OLSE is the best estimator with compared to GOE-RE, GOE-LE, GOE-AURE and GOE-AULE.

For
this particular data set, we obtain the following results: a) The eigen values of ′ X X : 105, 810, 5965, 44663 b) The condition number = 20.58464c) The OLSE of β is β : 7283 and 0.00345 b) The condition number = 93.68c) The OLSE of β is ( )