Automatic Variable Selection for Single-index Random Effects Models with Longitudinal Data

We consider the problem of variable selection for the single-index random effects models with longitudinal data. An automatic variable selection procedure is developed using smooth-threshold. The proposed method shares some of the desired features of existing variable selection methods: the resulting estimator enjoys the oracle property; the proposed procedure avoids the convex optimization problem and is flexible and easy to implement. Moreover, we use the penalized weighted deviance criterion for a data-driven choice of the tuning parameters. Simulation studies are carried out to assess the performance of our method, and a real dataset is analyzed for further illustration.


Introduction
With the increasing availability of longitudinal data, both theoretical and applied works in longitudinal data analysis have become more popular in recent years.Diggle et al. [1] provided an excellent overview of the longitudinal data analysis.To avoid the so-called "curse of dimensionality" in the multivariate nonparametric regression with longitudinal data and to generate an association correlation structure between the repeated measurements, we consider the following single-index random effects models with longitudinal data, ( ) where 0 β is a 1 p × index coefficients vector associated with the covariates ij X ; i b are independent 1 q × vectors of random effects with mean zero and covariance matrix Φ , ( ) g ⋅ is an unknown link function; ij ε are independent mean zero random variables with variance 2 > 0 ε σ . Here Φ is a positive definite matrix de- pending on a parameter vector φ ; ij X and ij Y are the observable random variables, and ij Z are 1 q × known fixed design vectors.We suppose that i b and ij ε are mutually independent and follow gaussian dis- tribution, and 0 1 β = with the first nonzero element of 0 β being positive to ensure identifiability.Pang and Xue [2] considered estimators of parameters and non-parameter for model (1).Yang et al. [3] considered simultaneous confidence band for the model (1).
Since the single-index models are popular and efficient modeling tools in multivariate nonparametric regression, the single-index models have recently received much attention, including those from Carroll et al. [4], Xia et al. [5], Zhu and Xue [6], Wang et al. [7], and among others.Pang and Xue [2], Yang et al. [3] and Chen et al. [8] considered the single-index models for longitudinal/panel data.Further, random effects models have become very popular for the analysis of longitudinal or panel data, because they are flexible and widely applicable.Given the importance of the random effects models, it is not surprising that methodologies for random effects models have emerged in the extensive literatures, such as Zeger and Diggle [9], Ke and Wang [10], Wu and Zhang [11] and Field et al. [12], and among others.However, it has a lot of challenges for the studies and the applications of single-index models with longitudinal data when the random effects in the models exist.Pang and Xue [2] proposed an iterative estimation procedure to estimate the index parameter vector and the link function, and they proved the asymptotic properties of the resulting estimators.However, in practical application, we do not know which covariates X have significant effects on the corresponding variable Y.In this paper, we consider the problem of variable selection for the single-index random effects models with longitudinal data.
Various penalty functions have been used in the variable selection literature for linear regression models.Frank and Friedman [13] considered the q L penalty, which yields a "Bridge Regression".Tibshirani [14] proposed the Lasso, which can be viewed as a solution to the penalized least squares with the 1 L penalty.Zou [15] further developed the adaptive lasso.Through combining both ridge ( ) and lasso ( ) L penalty together, Zou and Hastie [16] proposed the Elastic-Net, which also has the sparsity property, to solve the collinearity problems.Fan and Li [17] proposed the SCAD penalty method and proved that the SCAD estimators enjoy the Oracle properties.All these variable selection procedures are based on penalized estimation using penalty functions, which have a singularity at zero.Consequently, these estimation procedures require convex optimization, which incurs a computational burden.To overcome this problem, Ueki [18] developed a new variable selection procedure called the smooth-threshold estimating equations that can automatically eliminate irrelevant parameters by setting them as zero.In addition, the resulting estimator enjoys the oracle property in the sense that Fan and Li [17] suggested.Li et al. [19] focus on marginal longitudinal generalized linear models and develop a variable selection technique.
Motivated by the idea of Ueki [18] and Li et al. [19], an automatic variable selection procedure is developed for the single-index random effects models.There are two difficulties.One notable difficulty in our setting is that we have to treat the nuisance parameters Φ and 2 ε σ involved in the working covariance matrix, which affect the final estimator of β .Computationally, we need to update the values of these nuisance parameters together with the main parameter of interest.We propose an iterative algorithm to implement the procedures in Section 2 and obtain the efficient estimator of β .The proposed method shares some of the desired features of existing variable selection methods: the resulting estimator enjoys the oracle property; the proposed procedure avoids the convex optimization problem and is flexible and easy to implement.Moreover, we use the penalized weighted deviance criterion for a data-driven choice of the tuning parameters, see, Li et al. [19].Simulation studies are carried out to assess the performance of our method, and a real dataset is analyzed for further illustration.
The paper is organized as follows.In Section 2, the iterative estimation procedure is given for model (1) and the asymptotic properties of the proposed estimator are established in Section 3. In Section 4 simulation studies are conducted to evaluate the performance of the proposed method, and a real data set is analyzed to illustrate the proposed method.

Estimation Procedure
Throughout this paper, let 0 β be the fixed true value of β and let n → ∞ , while the m is uniformly bounded.the number of true zero parameters.Suppose that the sample comes from model (1).Let . Model (1) can be rewritten as It is easy to see that ( ) ( ) , where m I is the m m × identity matrix and i Z is m q × known fixed design matrix.A naive idea to estimate 0 β is to minimize Since 0 1 β = means that the true value of 0 β is the boundary point on the unit sphere, ( ) does not have derivative at the point 0 β .However, we must use the derivative of ( ) on 0 β , when con- structing the estimating equation for 0 β .The "delete-one-component" method (see Zhu and Xue [6], Wang et al. [7]) is used to solve this problem.The detail is as follows.Let be a 1 p − dimensional parameters vector deleting the rth component r β .Without loss of generality, we may assume that the true vector 0 β has a positive component r β .Then, the true parameter ( ) r β satisfies the constraint β is infinitely differentiable in a neighborhood of the true parameter ( ) r β , the Jacobian matrix is where ( ) Step 0: We first give a consistent estimator of 0 β , which is denoted by β  .
Step 1: Estimation of the link function ( ) g ⋅ and its derivative ( ) g′ ⋅ .Given the initial estimator β  , we apply the local linear regression technique in Fan and Gijbels [20] to estimate the link function and its derivative.
The estimators of ( ) g ⋅ and ( ) g′ ⋅ are obtained by minimizing the weighted sum of squares with respect to a and b where ( ) ( ) K ⋅ be a kernel function, and Specifically, the local linear estimators of ( ) ( ) g′ ⋅ are defined as ( ) for the initial estimator β  .By some simple calculations, we have ; ; ; , ; , 0,1.
Step 2: Estimation of the variance components.To obtain the estimator of index parameter, we need to get the consistent estimators of the variance components.Suppose that the variance-covariance matrix for model ( 2) σ can be written as Then we can obtain the Λ 's estimator ( ) ( ) ( ) Step 3: Estimation of parameter.Based on the initial estimator β  and the estimator of i Λ , the estimator ( ) can be obtained by solving the following estimating equation  Motivated by the idea of Ueki [18] and Li et al. [19], we can use the following smooth-threshold estimating equations to estimate ( ) where ∆ is the diagonal matrix whose diagonal elements are ( ) ( )  Unfortunately, we cannot direct obtain the estimator of ( ) r β by solving (3), because (3) involves i δ , which need be chosen using some data driven criteria.For the choice of i δ , Ueki [18] suggested that i δ can be cho- sen by ( ) , where ( ) , λ γ are two tuning parameters, which can be computed by a penalized weighted deviance criterion, see Li et al. [19].Similarly, we can define the active set which is the set of indices of nonzero parameters.Replacing ∆ in Equation ( 3) by ∆ with diagonal elements ˆ, 1, , , we propose the following modified iterative procedure for A β , where . Repeat (4) until convergence.
We denote the final estimator of A β by ˆA β .

Remark 1:
In Step 0, we need to choose a suitable initial estimator of 0 β .For the numerical studies and real data analysis in Section 4, the initial estimator can be obtained using two steps.In the first step, we use inde- In the second step, we average Remark 2: It is well-known that the convergence rate of the estimator ( ) is slower than that of the es-timator ( ) ĝ u if the same bandwidth is used.This leads to a slower convergence rate than root-n for the estimator β of 0 β .This motivates us to introduce another bandwidth 1 h to control the variability of the estimator of ( ) g u ′ .Remark 3: We use following penalized weighted deviance criterion (see Li et al. [19]) to select tuning parameters ( ) , λ γ : where ∑ denotes the number of nonzero parameters with ( ) with the deviance residual ( )  .We can choose ( ) , λ γ by minimizing the ( ) , PWD λ γ .

Asymptotic Properties
In this section, we assume, under the regularity conditions, the initial estimator using the full model is consistent and asymptotically normally distributed by solving the GEE (see Liang and Zeger [21]).Following Fan and Li [17], it is possible to prove the oracle properties for the estimators, including n -consistency, variable selection consistency, and asymptotic normality.Theorem 2. Suppose that the conditions of Theorem 1 hold, as n → ∞ , we have 1) variable selection consistency, i.e. ( ) 2) asymptotic normality, i.e.
( ) ( ) ( ) where Ω is the limit in probability of as n → ∞ .The proof of Theorem 1 and Theorem 2 can be obtained similarly to the proof of Theorem 1 and Theorem 2 in Li et al. [19].

Numerical Studies
In this subsection, we conduct simulation studies to illustrate the finite sample properties of proposed procedure.Throughout the simulation studies, we take Epanechnikov kern for estimating the link function, and the bandwidth h is chosen by the cross validation (CV) method.
For each case we repeat the experiment 100 times and applied the penalized weighted deviance criterion to select the tuning parameters.We consider the following example.
For a single-index random effects model, ( ) where ( ) T 0 3 2, 0, 0.5, 0, 0 β = , ij X is a five-dimensional vector with independent uniform [0,1], i b is a normal variable with mean zero and variance 2 b σ , ij ε is a standard normal variable.For the simulations, we consider the number of subjects n = 50, 100 subjects and m = 3.For comparison, we Consider 0.5,1 , respectively.Based on the experiment time M = 100, the simulation results are reported in Table 1.In the tables, values in the column labeled "Correct" denote the average number of coefficients of the true zeros, correctly set to zero, and those in the column labeled "Incorrect" denote the average number of the true nonzeros incorrectly set to zero.
Table 1 and Figure 1 indicate the following simulation results: 1) From Table 1, it is easy to see that "correct" increases to 3 (true number) as n increases.Therefore, the proposed method is able to correctly identify the true submodel.
2) From Table 1, we find that "correct" increases to 3 as b σ and the ε σ decrease, respectively.3) Figure 1 shows that the estimators of β have asymptotic normality.

Application to Real Data
The data set comes from an epileptic study (Thall and Vail [22], Bai et al. [23], and Pang and Xue [2]).Two different treatments (placebo and antiepileptic drug progabide) were administered to 59 epileptics during the experimental period.Patients were randomized to receive either of the two treatments.The patients attended clinic visits every two weeks for four consecutive times and the number of seizures occurring over the previous two weeks was reported.For this dataset, the number of seizures in a two-week period (NS) is taken as the response variable, the logarithm of age in year (LA), and the baseline seizure count (which is divided by 4 and then log-transformed, let BSC) are considered as the covariates.A scientific question here is whether the drug helps to reduce the rate of epileptic seizures.To illustrate the proposed method, we consider the following single-index model, ( ) Table 1.Variable selections for model (5) using our method.  .Therefore, the proposed method is feasible in practical application.

Concluding Remarks
In this paper, we have done automatic variable select to parameters of index β for single-index random effects model with longitudinal data.We further derive the asymptotic distributions for estimator of β for single-in- dex random effects model.The proposed estimator has good asymptotic behavior and select number of zero parameters very close to the nominal level in our simulation study.A real data analysis illustrates the practical use of the variable select.The methodology in this paper is general and widely applicable, and therefore, we expect further research along these lines to yield deep theoretical results with interesting applications for other nonparametric or semiparametric models with random effects.
We partition 0 β into active (nonzero) and inactive (zero) coefficients as follows: let Based on the estimation procedure in Pang and Xue[2] and Yang et al.[3], we outline the iterative steps for estimating procedures for 0 β , ( ) g ⋅ and its derivative ( ) g′ ⋅ .

2 ε σ and 2 b
is the m-vector of ones.Assume that the random effects i b and the error ij ε are Gaus- sian distributed, then the observation i Y have independent β Λ distributions.Based on the esti- mator β  and the estimator ( ) ˆ, g u β  , the log-likelihood function for Therefore equation (3) can yield a sparse solution.

Theorem 1 .Note 1 :
Under mild regularity conditions, for any positive λ and γ such that1 The mild regularity conditions in Theorem 1 are same with the conditions in Yang et al.[3] Theorem 1.
with independent uniform [0,1].The estimation procedure proposed in Section 2 is used to estimate the single-index model (6), the non zero estimated of the index coefficients (standard error of estimated) are 1 β = 0.8342 (0.0563)