Double-Penalized Quantile Regression in Partially Linear Models

In this paper, we propose the double-penalized quantile regression estimators in partially linear models. An iterative algorithm is proposed for solving the proposed optimization problem. Some numerical examples illustrate that the finite sample performances of proposed method perform better than the least squares based method with regard to the non-causal selection rate (NSR) and the median of model error (MME) when the error distribution is heavy-tail. Finally, we apply the proposed methodology to analyze the ragweed pollen level dataset.


Introduction
Since semiparametric regression models combine both parametric and nonparametric components, they are much more flexible than the linear regression model, and are easier interpretation of the effect of each variable than completely nonparametric regressions.Therefore, semiparametric regression models are very popular models in practical applications.In this paper, we consider a partially linear model where β is a p-dimensional unknown parameter vector with its true value 0 β , ( ) g ⋅ is a twice-differentiable unknown smooth function,  is independent of X and is a random error satisfying ( ) . Since [1] first applied the partially linear model to study the relationship between weather and electricity sales, this model had received a considerable amount of research in the past several decades.
In practice, many potential explanatory variables should be involved in this model, but the number of impor-tant ones is usually relatively small.Therefore, selection of important explanatory variables is often one of the most important goals in the real data analysis.In this paper, we are interested in automatic selection, and estimation for parametric components, and treat ( ) g ⋅ as a nuisance effect.There are many authors developed several approaches in the literature.For example, in the kernel smoothing framework, [2] first extended the penalized least squares criterion to partially linear models.[3] introduced a class of sieve estimators using a penalized least squares technique for semiparametric regression models.[4] studied variable selection for semiparametric regression models.[5] considered variable selection for partially linear models when the covariates were measured with additive errors.[6] combined the ideas of profiling and adaptive Elastic-Net [7] to select the important variables in X.In the framework of spline smoothing, [8] achieved sparsity in the linear part by using the SCADpenalty [9] for partially linear models for high dimensional data, but the nonparametric function was estimated by the polynomial regression splines.[10] applied a shrinkage penalty on parametric components to obtain the significant variables and used the smoothing spline to estimate the nonparametric component.
It is very important to note that many of those methods are closely related to the classical least squares method.It is well known that the least squares method is not robust and can produce large bias when there are outliers in the dataset.Therefore, the outliers can give rise to serious problems for the least squares based methods in variable selection.In this article, we propose the double-penalized quantile regression estimators.Based on the quantile regression loss function (check function), we apply a shrinkage penalty for parametric parts to yield the significant variables, and use the smoothing spline to estimate the nonparametric component.Simulation studies illustrate that the proposed method can achieve a consistent variable selection when there are outliers in the dataset or the error term follows a heavy-tailed distribution.
The rest of this paper is organized as follows.In Section 2, we first introduce the double-penalized quantile regression estimators in a partially linear regression model, and then propose an iterative algorithm to solve the proposed optimization problem.In Section 3, simulation studies are conducted to compare the finite-sample performance of the existing and proposed methods.In Section 4, we apply the proposed method to analyze a real data analysis.Finally, we conclude with a few remarks in Section 5.

Double-Penalized Quantile Regression Estimators
Suppose that ( )  satisfy a following partially linear regression model,
Without loss of generality, we assume that , and ( )

∫
To simultaneously achieve the selection of important variables and the estimation of the nonparametric function ( ) g ⋅ , [10] proposed a double-penalized least squares (DPLS) estimators by minimizing where ( ) p λ β is nonnegative and nondecreasing functions in j β .Under some regular conditions, [10] proved that the proposed estimators could be as efficient as the oracle estimator.
To our knowledge, the ordinary least squares (OLS) estimator is not robust.If there are outliers in the dataset or the error follows a heavy-tailed distribution, it can product the large bias.In contrast to the least squares method, quantile regression introduced by [11] serves as a robust alternative since the asymptotic properties of quantile regression estimator do not depend on the variance of the error.In the following, we introduce a double-penalized quantile regression (DPQR) in partially linear models.For 0 1 τ < < , the DPQR estimators can be obtained by minimizing the following function, where ( ) ( ) and the order statistics of a random sample { } [12], we have where K is an n n × matrix given by 1 T × − matrix of second differences, with entries R is a symmetric tridiagonal matrix of order ( ) with elements ij r ( )

Algorithm
To solve the optimization problem (4), we propose the following iterative algorithm.The estimation procedures are stated as follows: Step 1 Given , obtaining the estimator ˆn β by minimizing the following objective function, Step 2 Given ˆn β , obtaining ˆn G by solving Step 3 Repeat Step 1 and Step 2 until convergence.Remark 1 In the above algorithm, we first obtain the initial estimators by minimizing the following objective function . Given β and 1 λ , we obtain ˆn G by ( 5), ( ) . We plug ˆn G into (5), we have Finally, the estimator ˆn G of nonparametric component is obtained as follows: ( ) Remark 2 Since the check function ( ) τ ρ ⋅ is not smooth, we use the majorization-minimization (MM) algo- rithm introduced by [13] to optimize Step 1 and Step 2.
Advocated in [14], the check function can be approximated by its perturbation for some small 0 >  , Furthermore, ( ) can be majorized at ( ) k t by the following surrogate function given in [14], The penalty functions can be approximated by the local quadratic approximation advocated in [15] ( ) The minimization problem in Step 1 and Step 2 is a quadratic function after above these approximations, and can be solved in closed form.In our implementation, we set 6 10 − =  .

Simulation Study
In this section, we conduct simulation studies to evaluate the finite-sample performance of the proposed estimators.We simulate 100 data sets from the following model ( 6) with sample sizes 80,100,150 n = .
In this simulation, we choose

( )
T 0 1, 2, 2, 2, 0, 0, 0, 0 = β , ( ) ( ) , X i 's follows a 8-dimensional standard normal distribution, and the error term follows the following two distributions: standard normal distribution We compare our proposed estimators (DPQR) with the DPLS estimators and Oracle estimator based on the quantile regression.In order to measure the finite-sample performance, for the parameteric component, we calculate the non-causal selection rate (NSR) [9], the positive selection rate (PSR) [16] as well as the median of the model error (MME) advocated by [9], where the model error is defined as follows: The simulation results are reported in Table 1 and Table 2. From Table 1, we can see that all these methods obtain the same PSR and NSR when the error term follows the standard normal distribution, but the DPLS estimator yields smaller the MME than the DPQR estimator and Oracle estimator.Whereas, when the error follows a Cauchy distribution, we find from Table 2 that the MME of our proposed method is smaller than the DPLS method.In variable selection, the PSR is around 1 for all three methods.However, what distinguishes DPQR from DPLS is NSR.Indeed, the NSR of the DPQR estimator is as close 1 as that of the oracle estimator, while the NSR of the DPLS estimator is about 30%.This illustrates that our proposed method leads to a consistent variable selection to errors with heavy tails.

Real Data Application
In this section, we illustrate our proposed double-penalized quantile regression method through application to  Table 3.Estimated regression coefficients from the ragweed pollen level data.x x 1 3 x x This dataset consists of 87 observations, and contains the following four variables: ragweed (the daily ragweed pollen level (grains/m 3 )), rain x 1 (indicator of significant rain of the following day: 1 = at least 3 hours of steady or brief but intense rain, 0 = otherwise), temperature x 2 (temperature of following day (degrees Fahrenheit)), wind speed x 3 (wind speed forecast for following day (knots)), and day (day number in the current ragweed pollen season).The ragweed is the response variable, and the rest are the explanatory variables.
The goal is to understand the effect of the explanatory variables on ragweed, and to obtain accurate models to predict the ragweed.According to [17], we take y ragweed = . Histogram of y in Figure 1(a) indicates that the response is rather skewed.Therefore, there are outliers in the response or the error follows a heavy-tailed distribution.In addition, we plot y against day in Figure 1(b).From Figure 1(b), we can find that there is a strong nonlinear relationship between y and the day number.As a consequence, a semiparametric regression model with a nonparametric baseline g (day) is very reasonable.In this paper, we add some quadratic and interaction terms, and consider a more complex semiparametric regression model.
In the following, we apply the DPLS method and DPQR method to fit the semiparametric regression model.For the DPQR method, we take 0.5 τ = . The results are summarized in Table 3. From Table 3, we find that there also exists a nonlinear relationship between y and temperature and between y and wind speed by the DPQR method.

Discussion
In this paper, we introduced a double-penalized quantile regression method in partially linear models.The merits of our proposed methodology were illustrated via simulation studies and a real data analysis.According to numerical simulations, our proposed method could achieve a consistent variable selection when there were outliers in the dataset or the error followed a heavy-tailed distribution.

β
is a n -consistent estimator to 0 β .For example, we can use least squares estimator.

Figure 1 .
Figure 1.(a) Histogram of y and (b) y against day.

Table 1 .
Simulation results under normal error.

Table 2 .
Simulation results under cauchy error.