Variable Selection via Biased Estimators in the Linear Regression Model

Least Absolute Shrinkage and Selection Operator (LASSO) is used for variable selection as well as for handling the multicollinearity problem simultaneously in the linear regression model. LASSO produces estimates having high variance if the number of predictors is higher than the number of observations and if high multicollinearity exists among the predictor variables. To handle this problem, Elastic Net (ENet) estimator was introduced by combining LASSO and Ridge estimator (RE). The solutions of LASSO and ENet have been obtained using Least Angle Regression (LARS) and LARS-EN algorithms, respectively. In this article, we proposed an alternative algorithm to overcome the issues in LASSO that can be combined LASSO with other exiting biased estimators namely Almost Unbiased Ridge Estimator (AURE), Liu Estimator (LE), Almost Unbiased Liu Estimator (AULE), Principal Component Regression Estimator (PCRE), r-k class estimator and r-d class estimator. Further, we examine the performance of the proposed algorithm using a Monte-Carlo simulation study and real-world examples. The results showed that the LARS-rk and LARS-rd algorithms, which are combined LASSO with r-k class estimator and r-d class estimator, outperformed other algorithms under the moderated and severe multicollinearity.


Introduction
Due to the simplicity and interpretability, linear regression models play a signif-icant role in modern statistical methods. The linear regression model aspires to find the linear relationship between the dependent variable and the non-stochastic explanatory variables for the prediction purpose.
Let us consider the linear regression model where y is the 1 n × vector of observations on the dependent variable, X is the n p × matrix of observations on the non-stochastic predictor variables, β is a 1 p × vectors of unknown coefficients, and  is the 1 n × vector of random error terms, which is independent and identically normally distributed with mean zero and common variance 2 σ , that is ( ) 0 . It is well known that the Ordinary Least Square Estimator (OLSE) is the Best Linear Unbiased Estimator (BLUE) for finding the unknown parameter vector in model (1), which can be obtained by minimizing Error Sum of Squares (ESS), with respect to β , and defined as However, the OLSE is unstable and produces parameter estimates with high variance when multicollinearity exists on X . As a curative action to the multicollinearity problem, the biased estimators have been used by many researchers. The following biased estimators are popular in statistical literature:  Principal Component Regression Estimator (PCRE) [1] ( ) where 0 k > is the regularization parameter, and I is the p p × identity matrix.  r-k class estimator [3] ( ) ( ) Note that r-k class estimator is a combination of PCRE and RE.  Almost Unbiased Ridge Estimator (AURE) [4] ( )  Liu Estimator (LE) [5] ( ) ( ) ( Note that r-d class estimator is a combination of PCRE and LE. According to Kayanan and Wijekoon [8], the generalized form to represent the estimators RE, AURE, LE, AULE, PCR, r-k class estimator and r-d class estimator is given by In recent studies, Kayanan and Wijekoon [8] have shown that r-k class estimator and r-d class estimator outperformed other estimators for the selected range of regularization parameter values when multicollinearity exists among the predictor variables. However, biased estimators introduce heavy bias when the number of predictor variables is high, and the final model may contain some irrelevant predictor variables as well. To handle this issue, Tibshirani [9] proposed the Least Absolute Shrinkage and Selection Operator (LASSO) as where 0 t ≥ is a turning parameter. Note that we cannot find an analytic solution for LASSO since 1 p j j β = ∑ is a non-differentiable function. Tibshirani [9] and Fu [10] have used the standard quadratic programming technique and shooting algorithm, respectively, to find solutions for LASSO. Apart from these two methods, the Least Angle Regression (LARS) algorithm proposed by Efron et al. [11] is a popular one in the recent literature to find LASSO solutions. The LASSO wields both the multicollinearity problem and variable selection simultaneously in the linear regression model. However, LASSO failed to outperform RE if high multicollinearity exists among predictors, and it is unsteady when the number of predictors is higher than the number of observations [12]. To over- The LARS-EN algorithm, which is a modified version of the LARS algorithm, has been used to obtain solutions for ENet.
In this work, we propose generalized version of LARS algorithm that can be combined LASSO with other biased estimators such as AURE, LE, AULE, PCRE, r-k class and r-d class estimators. Further, we compared the prediction performance of proposed algorithm with existing algorithms of LASSO and Enet using a Monte-Carlo simulation study and real-world examples. The structure of the rest of the article is as follows: Section 2 contains proposed algorithms, Section 3 shows the comparison of proposed algorithm, Section 4 concludes the article, and references are provided at the end of the paper.

Generalized Least Angle Regression (GLARS) Algorithm
Based on Efron et al. [11] and Hettigoda [13], now we propose GLARS algorithm as follows: Step 1: Standardize the predictor variables X to have a mean of zero and a standard deviation of one, and response variable y to have a mean zero.
Step 2: Start with the initial estimated value of β as ˆ0 = β , and the residual 0 = r y .
Step 3: Find the predictor 1 j X most correlated with 0 r as follows Then increase the estimate of In a similar way, the i th variable ji X eventually earns its way into the active set, and then GLARS proceeds in the equiangular direction between . Continue adding variables to the active set in this way moving in the direction defined by the least angle direction. In the intermediate steps, the coefficient estimates are updating using the following formula: where i α is a value between [0,1] which represents how far the estimate moves in the direction before another variable enters the model and the direction changes again, and i u is the equiangular vector.
The direction i u is calculated using the following formula: where i E is the matrix with column ( ) for any j such that for any j such that Step 4: If Step 5: Proceed Step 3 until 1 i α = .
In  Mean Square Error (RMSE) criterion, which is described in Section 3. We can use GLARS to combine LASSO and any of estimators as listed in Table 1.

Selection of Regularization Parameter Values
According to Efron et al. [11] and Zou and Hastie [12], the conventional tuning parameter of LARS-LASSO is

Comparison of Proposed Algorithms
Proposed algorithms are compared with the LARS-LASSO and LARS-EN algorithms using the RMSE criterion, which is the expected prediction error of the algorithms, and is defined as where ( )
where , i j z is an independent standard normal pseudo random number, and ρ is the theoretical correlation between any two explanatory variables.
In this study, we have used a linear regression model of 100 observations and 20 predictors. A dependent variable is generated by using the following equation Cross-validated RMSE of the algorithms are displayed in Figures 1-3, and the median cross-validated RMSE of the algorithms are displayed in Table 2-4. From Figures 1-3 and Tables 2-4, we can observe that LARS-PCRE, LARS-rk and LARS-rd algorithms show better performance compared to other algorithms in weak, moderated and high multicollinearity, respectively.

Real-World Examples
Two real-world examples, namely the Prostate Cancer Data [15], and the UScrime dataset [16], are considered to compare the performance of the proposed algorithms. Prostate Cancer Data: In the Prostate Cancer Data, the predictors are eight clinical measures: log cancer volume (lcavol), log prostate weight (lweight), age, log of the amount of benign prostatic hyperplasia (lbph), seminal vesicle invasion (svi), log capsular penetration (lcp), Gleason score (gleason) and percentage Gleason score 4 or 5 (pgg45). The response is the log of prostate specific antigen (lpsa), and the dataset has 97 observations. The Variance Inflation Factor (VIF) values of the predictor variables of the dataset are 3.09, 2.97, 2.47, 2.05, 1.95, 1.37, 1.36 and 1.32, and the condition number is 243, which shows high multicollinearity among the predictor variables. Stamey et al. [15] have examined the correlation between the level of prostate specific antigen with those eight clinical measures. Further, Tibshirani [9], Efron et al. [11] and Zou and Hastie [12] have used this data to examine the performance of LASSO, LARS algorithm and Enet estimators. This data set is attached with "lasso2" R package. We have used 67 observations to fit the model, and 30 observations to calculate the RMSE. The cross-validated RMSE of the algorithms are displayed in Table 5, and Coefficient paths of each algorithm are displayed in Figure 4.
From Table 5, we can observe that LARS-rd algorithm outperforms other algorithms on Prostate Cancer Data. From    Table 6, and Coefficient paths of each algorithm are displayed in Figure 5.
From Table 6, we can observe that LARS-rd algorithm outperforms other algorithms on UScrime Data. From Figure 5, we can observe that:

Conclusions
This study clearly showed that the proposed LARS-rk and LARS-rd algorithms performed well in the high dimensional linear regression model when moderated and high multicollinearity existed among the predictor variables, respectively.
The appropriate algorithm for a particular practical problem can be chosen based on the variables of interest and prediction performance by referring to the plot of coefficient paths.