Composite Quantile Regression for Nonparametric Model with Random Censored Data

The composite quantile regression should provide estimation efficiency gain over a single quantile regression. In this paper, we extend composite quantile regression to nonparametric model with random censored data. The asymptotic normality of the proposed estimator is established. The proposed methods are applied to the lung cancer data. Extensive simulations are reported, showing that the proposed method works well in practical settings.


Introduction
Consider the following nonparametric regression model with random censored data:

 
where is an unknown smoothing function,   is a positive function representing the standard deviation and  is the random error with mean 0 and variance 1.
Let C denote the censoring variable, whose distribution may depend on U, where U is vector of observed covariates.In this paper, we focus on random right censoring, we only observe the triples U Y  , , , where and are the observed response variable and the censoring indicator respectively, where is the survival time.T Censored quantile regression was first studied by [1] for fixed censoring.[2] proposed an estimator for a conditional quantile assuming that the regression models at lower quantiles are all linear.A recursively weighted estimation procedure that can be regarded as a generalization of the Kaplan-Meier estimator to conditional quantiles was described in their paper.Afterward, [3] presented an alternative approach that is based on the Nelson-Aalen estimator of the cumulative hazard function but still requires the same global-linearity assumption as Portnoy's.Their method provides a more direct approach to the asymptotic theory and a simpler computation algorithm.More recent studies by [4], proposed to overcome the global-linearity assumption by directly estimating the conditional censoring distribution nonparametrically using the local Kaplan-Meier method.Their computational algorithm is more stable and simpler to implement than Portnoy's or Peng and Huang's.Moreover, the local nonparametric estimator on which the model is based performs best when the covariates can be assumed independent.
Intuitively, the composite quantile regression (CQR) should provide estimation efficiency gain over a single quantile regression; see [5].A composite quantile regression model assumes that there exist common covariate effects in a range of quantiles such that the quantile levels only differ in terms of the intercept.From a more general regression perspective, composite quantile regression seeks to model a set of parallel regression curves, and thus it can be viewed as a compromise between a set of quantile regression curves with different intercepts and slopes and a single summary regression curve.[6] proposed the local polynomial CQR estimators (LCQR) for estimating the nonparametric regression function and its derivative.It is shown that the local CQR method can significantly improve the estimation efficiency of the local least squares estimator for commonlyused non-normal error distributions.Furthermore, [7] studied semiparametric CQR estimates for semiparametric varying-coefficient partially linear model.They compared CQR with least squares and quantile regression, and the results showed that CQR outperformed both least squares and quantile regression.[8] considered CQR

  
estimates for single-index models.Recently, [9] extended the CQR method to linear model with randomly censored data.This motivates us to extend the CQR method to nonparametric model with censored data (LCQRC).The paper is organized as follows.In Section 2, local composite quantile regression for nonparametric model with censored data is introduced, and the main theoretical results are also given in this section.Both simulation examples and a real data application are given in Section 3 to illustrate the proposed procedures.Final remarks are given in Section 4. The technical proofs are deferred to the Appendix.

Local Composite Quantile Regression with Censored Data
We first consider an ideal situation where   0 i F t U , the conditional cumulative distribution function of the survival time i T given i U , is assumed to be known.In this case, we define the following weight function: where and , where is a smooth kernel function, is the bandwidth converging to zero as .By plugging , via minimizing the locally weighted objective function , be q check loss functions at q quantile positio  

Denote by
 the marginal density function of the  d . To prove main results in this paper, the following technical conditions are imposed.

 
, which are needed for deriving the asymptotic normality result.Assumption A2 ensures that the expectation of the estimating function has a unique zero, and it is needed to establish the asymptotic distribution.Assumptions A3-A6 are the same conditions for establishing the asymptotic normality of local composite quantile regression ( [6]).
We state the asymptotic normality for in the following theorem.
Theorem 1. Assume that the triples  constitute and i.i.d.multivariate random sample, and that the censoring variable i is independent of i T conditional on the covariate .Suppose that 0 is an interior of the support of stands for convergence in distribution and censoring rates (CR): 20%, 30% and 40%.For each censoring rate, the sample sizes are taken to be 100 and 200.To evaluate the finite sample performance of our estimator.Two distance measures are approximated, the first one the mean absolute deviation error (MADE) is

Numerical Studies
In this section, we conduct simulation studies to assess the finite sample performance of the proposed procedures and illustrate the proposed methodology on a lung cancer data set.Moreover, we compare the performance of the newly proposed method with LCQR ( [6]) and nonparametric quantile regression with censored data (NQRC) that was proposed by [10].
In the proposed compute process, we take 100 m Y  and . The bandwidth h * can be obtained by 10-fold cross-validation method (see [4]), and we use the short-cut strategy method to select (see [6]).

Example 1
The data are generated from the following model   .The value of the constant c in the model determines the censoring proportion.In our simulations, we consider three , and the second one the mean squared error (MSE) defines as  1 and 2, we can make the following observations: the performance of proposal method is better than that of LCRQ and NQRC.Moreover, LCQRC estimators are much more accurate when sample sizes increase.Figure 1 summarize the Curve estimates for three censoring rates of 20%, 30% and 40% with different sample sizes.It shows that the performance of LCQRC is very close to the true value.

Example 2
It is necessary to investigate the effect of heteroscedastic errors.The observations   , , , 1, , the first one the mean absolute deviation error Copyright © 2013 SciRes.

Example 3
As an illustration, we now apply the proposed LCQRC to the lung cancer data.The data contain 228 observations on ten variables.The censoring percentage is 27%, so the estimators are expected to perform well.More details about the study can be found in [11], and the dataset is included in the R package .We are interested in estimating the conditional of survival time (in days) given age (in years).Here, we use model (1)  .Next, we report and com- pare results with LCQR and NQRC for estimating the survival time.The simulation results for the LCQR, LCQRC and NQRC are given in Table 5.It shows that LCQRC is better than that of LCRQ and NQRC. Figure 3 summarize the simulation results for LCQRC5.It   shows that the proposal is valid.

Conclusion
In this work, we have focused on the LCQR for nonparametric model with censored data and its nice theoretical properties have been proven.The proposed approaches are demonstrated by simulation examples and real data applications.In addition, we believe the method can be extended to varying coefficient model (see [7]).
is the minimizer of the following criterion: To apply the identity ( [13]) .
nditional independence of T and C given U , we have So, we can obtain      where Note that the error is symmet s , then it follows that ric, thu This completes the proof.
we obtain the estimated local weights   .Consider estimating the value of  

2 .
Assumption A1 is needed for the local Kaplan-Meier estimator.It allows us to obtain the local expansions of and to obtain the uniform consistency and the linear representation of

Figure 3 .
Figure 3. Curve estimates for lung cancer data.
is any value sufficiently large to exceed all U f

Table 4 .
are generated following the same way as in Example 1.The means and standard deviations of MADE, MSE, RMADE and RMSE are respectively reported in Table 3 and The   C

Table 4 . Simulation results of  m with n = 200 for Example 2.
to fit the lung cancer data, where is the 10 (survival time) and U is the age/100.To evaluate the performance of our estimator.Two distance measures were approximated,