A One-Step Variable Selection Procedure for SCAD Penalized Quantile Regression Models ()
1. Introduction
Analyzing high-dimensional data has become an important task in diverse research fields such as biology, genetics, economics, and finance1. It has gained increasing attention in the application of emerging statistical methods that are proficient in analyzing and interpreting large datasets. Selection of the best representative covariate of the response variable is a key part of this process, but the traditional variable (or component) selection methods like subset selection, stepwise selection, backward and forward selection are not stable and accurate when the size of covariates is near or equal to the size of observations. These procedures are also not applicable when the number of observations is less than the number of covariates.
Multiple studies on penalized regression methods have addressed the above drawbacks. For example, the LASSO (least absolute shrinkage and selection operator) penalty was proposed by Tibshirani (1996), the adaptive LASSO by Zou (2006), the elastic net by Zou and Hastie (2005), and the smoothly clipped absolute deviation (SCAD) penalty by Fan and Li (2001). These methods are good with respect to prediction accuracy and efficiency. They also have the oracle property, but are limited to the small size paradigm of high-dimensional data (Fan & Li, 2001)2. Also, these methods work well with a fixed number of covariates. However, in the case of high-dimensional data, both classical and penalized techniques face problems like speed, stability and accuracy (Fan & Lv, 2008).
To overcome these issues, various variable selection methods in the context of quantile regression (QR) have been proposed. These include penalized quantile regression approaches using the LASSO penalty, and those using a non-convex penalty such as SCAD; see, e.g., Wang et al. (2012), Sherwood and Wang (2016), Khan et al. (2019) and the references therein. More recently, various penalized quantile regression approaches have been developed. For instance, Sherwood and Maidman (2022) proposed a method for simultaneous estimation and variable selection of an additive nonlinear quantile regression model in a high-dimensional setting. The R-rqPen package of Sherwood et al. (2025) readily solves the problem. Song et al. (2022) consider, as a generalized regression model, the single-index regression model with the SCAD and Laplace error penalty. Additionally, Chen and Lee (2023) addressed sparse linear quantile regression through the so-called
approach, which focuses on selecting a subset of covariates; see, e.g., Mai (2024) for an extensive literature review.
Using the SCAD penalty, the present paper introduces a computationally feasible and efficient approach, called one-step quantile estimation procedure, for selecting the nonzero QR parameters. The proposed method is similar in spirit to the one-step local linear approximation estimator of Zou and Li (2008). Given ordinary least squares estimates as initial values, these authors obtain a one-step estimator for the nonzero linear regression parameters by maximizing a penalized likelihood function. We use the same idea within the context of a penalized additive QR model3. However, we replace the usual
-norm objective function by a smooth parametric approximation of this function, via iterative least squares computations. This approach offers an efficient way to obtain conditional quantile estimates in high-dimensions.
The rest of the paper is organized as follows. Section 2 briefly describes the standard QR minimization problem. Section 3 first introduces an initial QR estimator without variable selection. Next, a final QR estimator with variable selection is proposed. Finally, we discuss SCAD consistency in variable selection in this section. In Section 4, we assess the finite-sample properties of the estimator through a Monte Carlo simulation study. This section also contains a comparison with a similar one-step LASSO estimator. Section 5 contains a real data illustration. Section 6 provides some summary remarks.
2. Preliminaries
Let
be a response variable which depends on a
vector of random covariates
,
of the
th observation,
. In the “traditional”, additive QR framework, interest is focussed on the
conditional quantile of
given
, i.e.,
(1)
where
is an unknown
parameter vector, including an intercept, depending on
. Note that for including an intercept term, the corresponding covariate needs to take value 1 for each observation.
For a fixed value of
, the parameter vector
can be estimated by minimizing the objective function
(2)
where
,
,
is the quantile check function, and
is the usual indicator function. To simplify the notation from now on, we suppress the argument
of
. We also omit the subscript “
” from
whenever no confusion is caused.
Linear programming algorithms can be employed to solve Equation (2). To describe the basic solution (feasible or not) of the QR minimization problem, we first introduce the following additional notations. In particular, let
, and let
denote a
-dimensional index, which identifies proper
observations in the sample of size
. Further, let
denote the
submatrix of
with rows
. Likewise, let
be a
-dimensional vector with coordinates
. Then the basic solution (Abdelmalek, 1974; Koenker & Bassett, 1978) of the QR minimization problem may be expressed as
(3)
Clearly, only the
data points identified by
supply information about the QR estimator of
, although the whole observed sample is typically used to find the “optimal”
. Observe that the design matrix
is partitioned into a critical active dataset, whose cardinality is
, and a non-active complement, yielding
with
a
-dimensional zero vector.
3. Description of the QR Estimators
It is well known that conditional QR estimates are severely affected by the so-called curse-of-dimensionality when
is large. To tackle this issue, several authors have proposed penalized QR methods which identify relevant and irrelevant components consistently. In this section, first we discuss an initial estimator for
based on a parametric smooth approximation of the objective function (2). Then we will explain how to use it in variable selection when constructing a one-step sparse conditional quantile estimator.
3.1. An Initial QR Estimator without Variable Selection
Muggeo et al. (2011) proposed an exact path-following algorithm leading to the QR solution (3). For a fixed value of
, they approximate
by an asymmetric “bent arc” expressed as a piecewise polynomial denoted by
, where
is a small but known positive constant. Specifically, let
,
, be a partition of the sample units. Then the approximation
is defined as follows
(4)
where
,
, are weights given by
(5)
and where
,
, are the following indicator functions
Note that we suppressed the dependence of
and
on
. This simplification in notation also holds in the rest of the paper.
The smooth objective function takes the form
(6)
where
, i.e., the cardinality of the set
. Now for minimizing Equation (6), the idea is to define a piecewise linear solution using a decreasing sequence of values, say
, such that
and
. Additionally, let
be a fixed value and
the corresponding minimizer of (6), with
chosen such that
. Given this setup, the set of constant weight indicator functions is given by
. Then the
-dimensional gradient vector of
can be written as follows
(7)
where
,
,
. Here
and
is an
-dimensional vector of ones. Solving
gives the
-dimensional weighted least squares estimator
(8)
where
(9)
For every new value of
, information from the previous solution is used. Then, it is not hard to show that
(10)
where is an estimate of the random error
. Thus if
, then
, with
, is an iteratively weighted least squares (IWLS) estimator of
.
Suppose that we have
with sign vector
. The proposed path-following algorithm is based on the following theorem.
Theorem 1. (Muggeo et al., 2011) Let
be a fixed value such that
. Then under the assumption that for any
, ,
, the solution curve evaluated at
is equal to the solution of (2), namely
.
3.2. A Final QR Estimator with Variable Selection
It is well known that conditional QR estimates are severely affected by the so-called curse-of-dimensionality when
is large. To tackle this issue, several authors have proposed dimension reduction methods. The general framework for these methods can be characterized by a set of nonzero coefficients
, and
is the cardinality of
. The set
is unknown and can be estimated by various (non)parametric approaches such as kernel-based smoothing or B-spline basis functions. Suppose that the true parameter vector
is sparse. Hence, we can write
, where
is a
vector and
. Moreover, it is unknown which coefficients are nonzero and which are zero. To this end, one can use a penalty function
with tuning parameter
which depends on
, and where for ease of presentation, we set
.
In this paper, we consider the SCAD penalty which is defined on
by its first derivatives as
(11)
where
and
are tuning parameters, and where
. Note that (11) is symmetric, non-convex on
, and singular at the origin. Its derivative vanishes outside
. Following the recommendation of Fan and Li (2001), we use
in the numerical applications. The parameter
controls the complexity of the selected model, and
as
. The function (11) overcomes some of the undesirable properties of the LASSO penalty, where
.
For a fixed
, let
,
, denote the univariate conditional (marginal) quantile estimator of
obtained through the IWLS method of Section 3.1. Then, for given values of
and
, the proposed final one-step conditional QR estimator
of
can be obtained by minimizing the following penalized objective function
(12)
Thus, we use
as an initial estimate to determine the penalty weights for individual coefficients in QR. This adaptive approach is a natural extension of the local linear approximation as proposed by Zou and Li (2008) and briefly discussed in Section 1. The approach allows the procedure to apply smaller penalties to coefficients with strong evidence of being non-zero, thereby improving selection accuracy4.
Let
be the number of nonzero estimates. Then, following Sherwood and Wang (2016), we choose the value of
that minimizes the following high-dimensional Bayesian-type information criterion (BIC):
(13)
In the case
in (12) is replaced by an initial estimator not depending on
, simulation results by Lee et al. (2014) show that if the choice of
is based on (13), all relevant components are selected consistently as
. Further, similar to Theorem 1, it follows that
is equal in sign to the true
(denoted by
) if and only if .
3.3. Asymptotic Properties
Due to the non-smoothness and non-convexity of the SCAD penalty function (11), the classical Karush-Kuhn-Tucker optimality condition is not applicable to study the asymptotic properties of the final estimator
. One common used technique to get around it in QR is to decompose a non-convex objective function as the difference of two convex (check loss) functions; see, e.g., Wu and Liu (2009) and Sherwood and Wang (2016). Using this technique to decompose the SCAD function, and assuming that certain regularity conditions hold, the following theorem can be proven.
Theorem 2. (Consistency) Assume that
. Suppose
is a continuous function on
and there is some
such that
as
, and
. In addition, let
be the SCAD function and
, then we have
.
The result follows from similar arguments as in the proof of Theorem 4.2(i) in Noh and Lee (2014) and, hence, has been omitted from the text. Zhao and Yu (2006) study the consistency of LASSO variable selection in a high-dimensional linear regression setting.
Remark (Oracle property) Under certain regularity conditions on the design of the model, Amin et al. (2015), Theorem 3.2, proved sparsity and asymptotic normality of the SCAD-penalized quantile estimator
.
3.4. Selecting c
Following Muggeo et al. (2011), we now present an algorithm for selecting
. Ideally,
should be small enough to get a good approximate solution to the objective function (12). However, for small values of
estimation will be more difficult as the objective function
approaches
having a step-function gradient. The proposed pseudo code is given by Algorithm 1. High values of
lead to high
and then to better convergence performance but to more biased estimates; small values of
are likely to cause failures in the estimating algorithm. Throughout the next two sections, we set
and
. These choices are recommended by Muggeo et al. (2011). In particular, these authors provide guidance on selecting
and
when
is large (
). Among other things, their results indicate that it is possible to reduce
(or equivalently
) without affecting the convergence performance of the IWLS.
Algorithm 1. An adaptive algorithm for selecting c.
4. Numerical Studies
4.1. Simulation
In this example, we simulate 100 datasets from the
-dimensional linear regression model
(14)
where
and
, and where
is an independent and identically (i.i.d.) distributed random variable. The nonzero parameters are given by
,
,
, and the rest of the
’s are zero.
We generate the covariates
from the
multivariate normal distribution, where
with
and
. For the distribution of the error term, we consider the following three scenarios: 1) ; 2) , i.e., the Student
distribution with 3 degrees of freedom; and 3) a contaminated
distribution having 10% outliers from a Cauchy distribution, denoted by C10-
. We show results for quantile levels
, and 0.75. We focus on the one-step quantile regression method SCAD penalty (denoted by QS), and also consider the one-step quantile regression method with LASSO penalty (denoted by QL) as a “benchmark”5. As a second benchmark for comparison, we compute the Oracle estimator, which is the least squares estimator of the true model.
To assess the performance of both quantile regression methods, we adopt the following criteria: 1) NN: average number of nonzero covariates; 2) True Positives (TP): average number of covariates that are correctly included in the model; 3) False Positives (FP): average number of covariates that are incorrectly included in the model; 4) AE: average of the absolute value of the estimation error which is defined as .
Table 1 summarizes the simulation results. The variable selection performance of QS is quite descent for all values of
and
. Indeed, except for one case (QL with
,
, and ), all computed TP values are equal to the true active model size of 3, with standard errors equal to zero. Hence, to keep the paper clear and readable, all computed TP values have been omitted from Table 1. The QL method tends to have substantially higher AE values than QS at different quantile levels for almost all
values and error distributions. In the case , the QS method is near the oracle method in terms of AE, but the performance of QL is relatively poor. Also, the advantage of the QS method over the QL method can be seen by its lower AE values when the distribution of
is heavy-tailed. In general, the proposed QS method tends to be efficient for the three error distributions under study.
Table 1. Variable selection performance of one-step quantile regression with LASSO (QL) and SCAD (QS) type penalties for
. Standard errors are in parentheses. NN = Average number of nonzero covariates; FP = False positives; AE = Average of the absolute values of the estimation error.
Method |
|
|
|
|
|
NN |
FP |
AE |
NN |
FP |
AE |
NN |
FP |
AE |
QL |
200 |
|
4.05 |
1.05 |
1.16 |
3.75 |
0.75 |
1.16 |
4.25 |
1.25 |
1.15 |
(1.07) |
(1.07) |
(0.41) |
(0.87) |
(0.87) |
(0.34) |
(1.17) |
(1.17) |
(0.40) |
QS |
3.27 |
0.27 |
0.48 |
3.29 |
0.29 |
0.44 |
3.25 |
0.58 |
0.47 |
(0.57) |
(0.57) |
(0.26) |
(0.57) |
(0.57) |
(0.24) |
(0.52) |
(0.52) |
(0.25) |
Oracle |
3.00 |
0.37 |
0.00 |
3.00 |
0.31 |
0.00 |
3.00 |
0.34 |
0.00 |
(0.00) |
(0.17) |
(0.00) |
(0.00) |
(0.14) |
(0.00) |
(0.00) |
(0.17) |
(0.00) |
QL |
400 |
4.11 |
1.10 |
1.30 |
3.69 |
0.68 |
1.15 |
3.89 |
0.90 |
1.36 |
(1.14) |
(1.13) |
(0.41) |
(0.83) |
(0.83) |
(0.34) |
(1.09) |
(1.09) |
(0.41) |
QS |
3.26 |
0.26 |
0.50 |
6.14 |
3.10 |
0.80 |
3.27 |
0.27 |
0.49 |
(0.54) |
(0.54) |
(0.25) |
(16.60) |
(16.37) |
(1.98) |
(0.53) |
(0.53) |
(0.25) |
Oracle |
3.00 |
0.35 |
0.00 |
3.00 |
0.31 |
0.00 |
3.00 |
0.36 |
0.00 |
(0.00) |
(0.18) |
(0.00) |
(0.00) |
(0.16) |
(0.00) |
(0.00) |
(0.20) |
(0.00) |
QL |
200 |
|
3.87 |
0.87 |
1.37 |
3.55 |
0.55 |
1.20 |
3.92 |
0.92 |
1.32 |
(0.98) |
(0.98) |
(0.55) |
(0.69) |
(0.69) |
(0.38) |
(0.93) |
(0.93) |
(0.42) |
QS |
3.17 |
0.17 |
0.52 |
3.17 |
0.17 |
0.48 |
3.16 |
0.16 |
0.53 |
(0.40) |
(0.40) |
(0.52) |
(0.38) |
(0.38) |
(0.31) |
(0.37) |
(0.37) |
(0.31) |
Oracle |
3.00 |
0.40 |
0.00 |
3.00 |
0.35 |
0.00 |
3.00 |
0.36 |
0.00 |
(0.00) |
(0.22) |
(0.00) |
(0.00) |
(0.20) |
(0.00) |
(0.00) |
(0.23) |
(0.00) |
QL |
400 |
4.19 |
1.17 |
1.63 |
3.60 |
0.59 |
1.39 |
4.11 |
1.11 |
1.57 |
(1.31) |
(1.28) |
(0.58) |
(0.83) |
(0.83) |
(0.43) |
(1.18) |
(1.17) |
(0.56) |
QS |
3.20 |
0.20 |
0.51 |
6.13 |
3.11 |
0.87 |
3.20 |
0.20 |
0.50 |
(0.45) |
(0.45) |
(0.34) |
(16.60) |
(16.48) |
(2.50) |
(0.47) |
(0.47) |
(0.29) |
Oracle |
3.00 |
0.41 |
0.00 |
3.00 |
0.35 |
0.00 |
3.00 |
0.43 |
0.00 |
(0.00) |
(0.18) |
(0.00) |
(0.00) |
(0.17) |
(0.00) |
(0.00) |
(0.21) |
(0.00) |
QL |
200 |
C10-
|
3.96 |
0.96 |
1.21 |
3.55 |
0.55 |
1.10 |
3.94 |
0.94 |
1.21 |
(0.86) |
(0.86) |
(0.37) |
(0.73) |
(0.73) |
(0.34) |
(1.08) |
(0.94) |
(0.39) |
QS |
3.44 |
0.60 |
0.97 |
3.53 |
0.68 |
0.95 |
3.48 |
0.64 |
1.03 |
(0.80) |
(0.99) |
(1.71) |
(1.26) |
(1.49) |
(1.84) |
(1.14) |
(1.36) |
(1.92) |
Oracle |
3.00 |
0.32 |
0.00 |
3.00 |
0.34 |
0.00 |
3.00 |
0.37 |
0.00 |
(0.00) |
(0.18) |
(0.00) |
(0.00) |
(0.17) |
(0.00) |
(0.00) |
(0.16) |
(0.00) |
QL |
400 |
4.07 |
1.07 |
1.43 |
3.56 |
0.56 |
1.21 |
3.88 |
0.88 |
1.36 |
(1.13) |
(1.13) |
(0.45) |
(0.81) |
(0.81) |
(0.36) |
(1.06) |
(1.06) |
(0.45) |
QS |
5.14 |
2.29 |
3.33 |
4.30 |
1.42 |
1.00 |
3.39 |
0.53 |
0.91 |
(13.63) |
(13.77) |
(17.24) |
(9.71) |
(9.53) |
(2.04) |
(0.99) |
(1.30) |
(0.91) |
Oracle |
3.00 |
0.36 |
0.00 |
3.00 |
0.31 |
0.00 |
3.00 |
0.35 |
0.00 |
(0.00) |
(0.21) |
(0.00) |
(0.00) |
(0.18) |
(0.00) |
(0.00) |
(0.24) |
(0.00) |
4.2. Comparing Execution Time
We next compare the execution time of the QL and QS methods with initial quantile parameter estimates obtained by two methods: 1) the IWLS method of Section 3.1, and by 2) the well-known quantile regression method rq in the R-quant package and denoted here by the short-hand notation “bet”. The simulation setting is the same as in Section 4.1 with Scenario 1. However, in this case
,
, and
. Using the R-package microbenchmark, Figure 1(a) and Figure 1(b) show plots of the microbenchmark timings for three different values of
, with “neval” denoting the number of replicates.
(a)
(b)
Figure 1. Autoplots of microbenchmark timings: plot (a) is for (
,
), and plot (b) is for (
,
).
Figure 1(a) shows that there is hardly any difference between quantile estimates obtained with the QS-IWLS method and the QL-bet method for most values of
. However, we see from Figure 1(b) that the difference between both methods is more noteworthy for the case
. Indeed, QL is about 2 times faster than QS, for all values of
. This difference in computing time was also noted for simulation results obtained with the Student
error distribution and with the contaminated
error distribution, as specified earlier in Section 4.1. Thus, in terms of computing time, there is a price to be paid for using the proposed one-step QS-IWLS estimation method when
is large.
5. An Illustrative Application
In this section, we illustrate the one-step SCAD penalized QR estimator using a real dataset corresponding to beta-carotene levels (ng/mL) in 315 patients. The dataset is available online at http://lib.stat.cmu.edu/datasets/Plasma_Retinol. Also it is included in the R-Lock5Data package under the name NutritionStudy. The dataset contains observations on the following 14 variables: Age (years), Cholesterol (mg per day), Fiber (Grams consumed per day), Sex (1 = Female, 0 = Male), Smoking Status (1 = Never, 2 = Former, 3 = Current Smoker), Quetelet (Weight/Height2), Vitamin Use (1 = Yes, fairly often, 2 = Yes, not often, 3 = No), Calories (Number consumed per day), Fat (Grams consumed per day), Alcohol (Number of drinks consumed per week), Betadiet (Dietary beta-carotene consumed (mcg) per day), Retdiet (Dietary retinol consumed (mcg per day)), Betaplasma (Plasma beta-carotene (ng/mL)), and Retplasma (Plasma retinol (ng/mL)). Study subjects were patients who had an elective surgical procedure during a three-year period to biopsy or remove a lesion of the lung, colon, breast, skin, ovary or uterus that was found to be non-cancerous. The medical interest of the work was to investigate the relation between personal characteristics and diet factors on the one hand, and plasma concentrations of retinol, beta-carotene, or other carotenoids may be as associated with greater risk of developing certain types of cancer.
A boxplot of the response variable (Retplasma) indicated several outliers. In addition, the Shapiro-Wilk test showed that this variable is far from normally distributed with a p-value of 0.00. There was one extremely high (203) leverage point in the series “Alcohol” that was deleted prior to the analysis. Thus the actual set of observations consists of
values. All series are standardized, except for the integer series Age, Sex, Smoking Status, and Vitamin Use. Figure 2 shows the estimated additive fits of seven covariates. It appears that the relationship between the response variable and each of these covariates is nonlinear. The remaining six covariates seem to be linearly related to Retplasma. So, it is natural to analyze the complete dataset via penalized QR.
5.1. Low Dimension
To assess the performance of the one-step variable selection method, we randomly selected 200 observations from the complete dataset. We used these observations
Figure 2. Plots of the estimated additive functions for seven predictors, with associated standard errors (dotted lines), using the R-gam function.
for model fitting and variable selection. The remaining 114 observations are used for checking the prediction performance of the fitted model. This procedure is repeated 100 times through the bootstrapping technique. Given this setup, and to guard against false positives, we declare a covariate to be relevant when it is selected 60% of the time.
Panel I of Table 2 contains the variable selection results for QL and QS, respectively. For
, QL identifies different covariates than QS. However, for
both methods select the variables Age and Alcohol as relevant covariates. On the other hand, when
we see that the variables Age and Smoke1 are selected frequently. In terms of median absolute prediction error (MAPE) computed across all covariates, we have MAPE (QS) = 0.27 and MAPE (QL) = 0.37 for
. When
, MAPE (QL) = MAPE (QS) = 0.36, and when
the MAPEs are equal to 0.36 for both methods.
Table 2. Frequency (≥60%) of covariates selected by QL and QS at three different quantiles based on 100 bootstrap replicates. Selected frequencies are in parentheses.
Method |
|
|
|
|
I. Low dimension (Original dataset),
|
QL |
Age (61), Alcohol (95) |
Age (99), Alcohol (94) |
Age (99), Alcohol (76) |
Betaplasma (97) |
|
Sex (85), Smoke1 (71) |
QS |
Sex (100) |
Age (90), Alcohol (62)) |
Age (78), Smoke1 (70) |
|
Sex (96) |
|
|
II. High dimension (Augmented dataset),
|
QL |
Alcohol (84), Betaplasma (70) |
Age (98), Alcohol (89) |
|
QS |
|
Age (85) |
|
5.2. High Dimension
To investigate the QL and QS methods in high dimensions, we add 200 irrelevant covariates to the dataset. These new variables are randomly drawn from a U (0, 1) distribution. Hence, the augmented dataset consists of 215 covariates (200 irrelevant and 15 relevant ones). Using the same procedure as above, Panel II of Table 2 summarizes the variable selection results. For
and
, no covariates are selected by QS while QL selects covariates at these quantiles. Clearly, QS tends to produce more sparse models than QL. Further, in terms of MAPE, the QS method performs better than QL at
. Specifically, MAPE (QS) = 0.28 and MAPE (QL) = 0.37. For
MAPE (QL) = MAPE (QS) = 0.36, while for
MAPE (QL) = 0.35 and MAPE (QS) = 0.34.
Figure 3 presents a boxplot of the prediction errors of QL and QS at different quantiles using the augmented plasma dataset, and based on 100 bootstrapped replications. The QS method at
provides a good prediction, followed by the same method at
and
. Overall, the one-step QS method is found to be a good variable selection procedure in detecting the significant variables, even when some irrelevant variables are included in the dataset under study. The prediction errors of the proposed method are also smaller than those using the one-step QL method.
Figure 3. Boxplot of the prediction errors for the augmented plasma dataset.
6. Summary
In this paper, we propose a one-step estimation procedure for variable selection in sparse, high-dimensional, linear QR models. In particular, we used the SCAD penalty with an iteratively weighted least squares estimator to determine the penalty weights. In a simulation study, we have shown that the resulting one-step QS method performs better than a one-step QL method for three different scenarios representing the error distribution. Additionally, we illustrated the merit of our method via a real data example.
A limitation of the proposed procedure is that there is a computational speed trade-off, as briefly discussed in Section 4.2. However, this result is outweighed by an increase in estimation accuracy when the sample size and/or the number of covariates increases. Another issue worth mentioning is the practical implication of the IWLS method. For instance, in specific fields like genomics or financial modeling, analyzing a large number of observations, our method’s robustness to heavy-tailed distributions and outliers would be particularly advantageous.
Several possible extensions to this paper can be pursued. For instance, one may wish to estimate nonlinear regressions via the QS method. Perhaps, in that case, efficiency may be further enhanced by a more prudent choice of the initial estimator than using the IWLS method. It may also be of interest to assess the robustness of the IWLS method by changing the level of temporal correlation among the covariates. Selecting the optimal SCAD tuning parameter
by a different data-driven method (e.g., AIC) and then investigating its impact on the QS estimates is another fruitful future research direction.
NOTES
1Throughout the paper the number of variables is denoted by p and the number of measurements (observations) by n. With high-dimensional data, we refer to the case that p is assumed to be large, possibly larger than n.
2The oracle property of a method means that it can correctly select the nonzero parameters with probability converging to one, and that the estimators of the nonzero parameters are normal with the same mean and covariance that they would have if there is a priori knowledge about the active covariates.
3The concept of additivity, as a simplifying structure, is basic to many proposed models in science. Within economics, Deaton and Muelbauer (1980) provide many microeconomic examples in which the structure is convenient for analysis and important for interpretability. Indeed, additivity is widely used in parametric, semi- and nonparametric analysis of economic data.
4Note that in (12) can be written as , the above minimization problem can be restated as an unpenalized weighted QR problem; see, e.g., Sherwood and Wang (2016).
5The OL method is also a one-step adaptive procedure using the same IWLS initial estimates as the proposed QS method to ensure the comparison is on equal footing.