Concave Group Selection of Nonparameter Additive Accelerated Failure Time Model

In this paper, we have studied the nonparameter accelerated failure time (AFT) additive regression model, whose covariates have a nonparametric effect on high-dimensional censored data. We give the asymptotic property of the penalty estimator based on GMCP in the nonparameter AFT model.


Introduction
With the development of the Internet, high-dimensional data has been widely collected in life, especially in the field of medical research and finance, the results or responses of data are censored, so the study of high-dimensional censored data is meaningful. However, due to the impact of "disaster of dimension", the study of high-dimensional data becomes extremely difficult, and some special methods must be adopted to deal with it. As the number of data dimensions increases, the performance of high-dimensional data structures declines rapidly. In low-dimensional spaces, we often use Euclidean distance to measure the similarity between data; but in high-dimensional spaces, this kind of similarity no longer exists, which makes the data mining of high-dimensional data very severely challenging. On the one hand, the performance of the data mining algorithm based on the index structure is reduced; on the other hand, many mining methods based on the entire spatial distance function will fail. By reducing the number of dimensions, the data can be reduced from high to low dimensions, and then using low-dimensional data processing methods. Therefore, the study of effective dimensionality reduction methods becomes significant in statistics.
variable selection method in an AFT model with high-dimensional predictive variables, which consists of a set of algorithms based on two widely used techniques in the field of variable selection in survival analysis synthesis: Buckley-James method and Dantzig selector.
In this article, based on potential predictors, we applied the GMCP (Group Minimax Concave Penalty) penalty method for the first time to the study of a high-dimensional nonparametric accelerated failure time additive regression model (2.1) (MCP, [23]). The weighted least squares solution of the model based on GMCP penalty is given. We also derived the group coordinate descent algorithm used to calculate the GMCP estimate in this model. Our simulation results show that the weighted least squares estimation based on GMCP penalty works well in the high-dimensional nonparameter accelerated failure time additive regression model, and is superior to the GLasso (Group Least Absolute Shrinkage and Selection Operator) penalty method.
The rest of the paper is organized as follows. In Section 2, we describe the nonparameter accelerated failure time additive regression (NP-AFT-AR) model and our research methods. In Section 3, we give the asymptotic oracle property of GMCP estimation. The simulation results are given in Section 4. Verification of actual data is given in Section 5. The conclusion is given in Section 6.

Model
In this paper, we study the following nonparametric accelerated failure time additive regression (NP-AFT-AR) model to describe the relationship between the independent predictors or covariates X j 's and the failure time T:  is a 1 p × vector of covariates, f j 's are unknown smooth functions with zero means, i.e., ( ) 0 j j Ef X = and ε is the random error term with mean zero and a finite variance 2 σ . We consider sample size is small n p < , assuming that some additive components j f are zero, the main purpose of our research is to find the non-zero components and zero components; the second goal is to find the specific functional form of the non-zero components in order to propose a more parsimonious model. In this

Weighted Least Squares Estimation
We define i    In order to carry out variable selection at the group and individual variable levels simultaneously. In our case, the GMCP penalty function is

Weighted Least Square Estimation of GMCP Penalty
where γ is a parameter that controls the concavity of ρ and λ is the penalty parameter. Here We require 0 λ ≥ and 1 γ > . The term where for any 1 m × vector a , 1 a is the L 1 norm: We can conduct group or component selection and estimation by minimizing ; it implies that the function component j f is deleted, otherwise, it is selected, further, the individual basis functions within a group can be selected.

Computation
We derive a group coordinate descent algorithm for computing β . This algorithm is a natural extension of the standard coordinate descent algorithm ( [27]).
It has also been used in calculating the penalized estimates based on concave penalty functions ( [28]).
The group coordinate descent algorithm optimizes a target function with respect to a single group at a time, iteratively cycling through all groups until convergence is reached. It is particularly suitable for computing β , since it has a simple closed form expression for a single-group model, see (2.11) below.
In particular, when γ = ∞ , we have which is the GLasso estimate for a single-group model ( [29]).
The group coordinate descent algorithm can now be implemented as follows.
Suppose the current values for the group parameter ( ) , and write ( ) For any given ( ) , λ γ , we use (2.11) to cycle through one component at a be the initial value. The proposed coordinate descent algorithm is as follows.
 , carry out the following calculation until convergence. For 1, , j p =  , repeat the following steps.
Step 1: Calculate Step 2: Update ( ) ( ) 1 ; , Step 3: Update The last step ensures that r holds the current values of the residuals. Although the objective function is not necessarily convex, it is convex with respect to a single group when the coefficients of all the other groups are fixed.

Asymptotic Oracle Properties of GMCP
Let A denote the cardinality of any set

1, , ; and
(C4) Denote T τ and C τ as the least upper bounds of T and C, respectively.
These assumptions correspond to the conditions in [30]. In the random censorship model, (C1) is a basic assumption. (C2) given the failure time T, the censoring indicator is independent of the X . (C3) in least-squares estimation, we need the second moment. (C4) assumes the probability of an event being observed is greater than zero, which guarantees the consistency of the estimator.
(C5) is a fundamental condition for the consistency and convergence rate in the proofs, and is used in the entropy calculation. In this subsection, we simply write This is the oracle least squares estimator. Of course, it is not a real estimator, since the oracle set is unknown.
We first consider the case where the 2-norm GMCP objective function is convex. This necessarily requires min 0 c > where min c be the smallest eigenvalue of Σ , and recall 1 n − ′ Σ = B B . As in [32], define the function This function arises from an upper bound for the tail probabilities of the chi-square distributions given in Lemma A.2 in Appendix. This is derived from an exponential inequality for chi-square random variables of [33]. This is the price we need to pay in search for a lower-dimensional space that contains the true model.

Numerical Simulation
In this section, we conduct simulation studies to evaluate the performance of the

Scenario 1 (Covariates Are Independent)
In this scenario, we consider independent covariates and set the intercept 0 0 η = : The logarithm of failure times, , 1, , The results for the the GMCP, GSCAD and GLasso methods are given in Table 1 and Table 2 based on 100 replications. The columns in Table 1 Figure 1.   Table 2, Table 4.

Scenario 2 (Covariates Are Correlated)
In this scenario, we consider correlated covariates and set the intercept 0 0 , , , p X X X = X  are generated from ( )  The simulation study results are reported in Tables 5-8. The conclusions for Scenario 2 are very similar to those for Scenario 1. When the censoring rate increases, the estimation and selection performance decreases for all methods. The results in Table 6, Table 8 show that the GMCP estimator is more accurate than   under the GLasso approach. The results in Table 5, Table 7 show that the GMCP method conducts component selection more precisely than the GLasso method, while the GLasso method chooses many zero component functions as nonzero functions. To examine the estimated nonparametric functions from the GMCP, we plot them along with the true function components in Figure 3,

Application in NA-AFT-Model
In this section, we will use Shedden 2008 (for short) to conduct an empirical analysis of part of the collected lung adenocarcinoma data to illustrate the proposed method. For more information, see [36].    From Figure 5, we find that the larger the dimension, the worse the GLasso method estimation, but it has little effect on the GMCP estimation. Therefore, the verification of the actual data shows that the GMCP penalty is better than the GLasso penalty, and the accuracy is higher, and the calculation cost of the two is the same. Under the same conditions, the GMCP method is more suitable than the GLasso.

Concluding Remarks
In this paper, we study the weighted least squares estimation and selection attributes of GMCP in the NP-AFT-AR model with high-dimensional data. For the GMCP method, our simulation results show that GLasso tends to select some unimportant variables. In contrast, GMCP has progressive predictability, which shows that it also has selection consistency.

Conflicts of Interest
The author declares no conflicts of interest regarding the publication of this paper.
h t k is defined in (3.2).
This lemma is a restatement of the exponential inequality for chi-square distributions of [33].