Dependence Model Selection for Semi-Competing Risks Data

We consider the model selection problem of the dependency between the terminal event and the non-terminal event under semi-competing risks data. When the relationship between the two events is unspecified, the inference on the non-terminal event is not identifiable. We cannot make inference on the non-terminal event without extra assumptions. Thus, an association model for semi-competing risks data is necessary, and it is important to select an appropriate dependence model for a data set. We construct the likelihood function for semi-competing risks data to select an appropriate dependence model. From simulation studies, it shows the performance of the proposed approach is well. Finally, we apply our method to a bone marrow transplant data set.


Introduction
Semi-competing risks data [1] were often encountered in a biomedical study in which a terminal event censors a non-terminal event. A common example of the terminal event is death, and the non-terminal event usually is disease progression or relapse. When the relationship between the two events is unspecified, the inference on the non-terminal event is not identifiable. We cannot make inference on the non-terminal event without extra assumptions. Thus, an association model for semi-competing risks data is necessary for copula-based approaches [1]- [6], and it is important to select an appropriate dependence model for a data set. Hsieh J.-J. Hsieh, C.-F. Tsai the likelihood function under several candidate models for the semi-competing risks data and use the likelihood function to select a most fitted model. In simulations, we compare our proposed methods with Hsieh et al. [4]. This paper is organized as follows. In Section 2, we introduce the structure of semi-competing risks data and copula models. In Section 3, we derive the likelihood function for semi-competing risks data and introduce three model selection methods. We examine the finite sample performance of the proposed methods and compare them with Hsieh et al. [4] in Section 4. In Section 5, we use the Bone Marrow Transplant data from Klein and Moeschberger [7] to illustrate our suggested methods.
Finally, we make some conclusions in Section 6.

Data and Model Assumption
Semi-competing risks data consist of a terminal event and a non-terminal event, which a terminal event may censor a non-terminal event. Let T be the time from the initial event (e.g. disease diagnosis) to the non-terminal event (e.g. a status of disease progression), D be the time from the initial event to the terminal event (e.g. death), and C be the time from the initial event until lost to follow-up or the end of study. In general, we assume that C is independent of ( ) With semi-competing risks data, we are interested in its dependence structure between T and D and require to ensure the validity for the inference of the non-terminal event time T. The most commonly used model for dependence is the copula model [8]. We assume ( ) , T D follow a copula model as , C α ⋅ ⋅ is a parametric copula function defined on a unit square, and indexed by a single real parameter, α , which is related to Kendall's tau [9]. To define Kendall's tau, suppose that ( ) 1 1 , T D and ( ) 2 2 , T D are two independent realizations of the joint distribution. Then, τ is the difference between the probability of concordance and the probability of discordance of these two observations, namely, The relationship between Kendall's tau τ and the Clayton copula parameter α is given by and its generator is The relationship between Kendall's tau τ and the Frank copula parameter α is given by The Gumbel copula is given by and its generator is The relationship between Kendall's tau τ and the Gumbel copula parameter α is given by

The Proposed Model Selection Methods
In statistical analysis, model selection is an important issue. Several candidate models are considered to fit data. Which model is the most appropriate for the considered data? Under semi-competing risks data, the observed data can be denoted as { } 1 2 , , , | 1,2, , To specify the dependency of ( ) , T D , we usually assume ( ) , T D follows an AC model. Our goal is to choose a best copula model for the dependency of T and D among some candidate models, and the idea is to use the likelihood function information to choose the most fitted copula model from a candidate copula model set. Therefore, we need to derive the likelihood function under different copula models. We derive the likelihood func- , .
Summarizing the above situations, we can derive the likelihood function for one observation as We can use the Kaplan-Meier estimator . Then we have a likelihood function for the n observations as Based on the likelihood function, we consider three approaches to select an appropriate copula model, which are where j i L is the corresponding maximum likelihood function described in (4) based on the ith approach ( ) For the first method, 1 j L , we use the estimator ( ) S x of ( ) S x by Lakhal et al. [3], which is extended from Zheng and Klein [10], and we have the likelihood as , , , | 1,2, ,.
Now this function can be represented in the form of only one unknown parameter α . Next, we apply the ( ) optimize in R to obtain the maximum likelihood, and define ( ) For the second method, 2 j L , Kaplan-Meier [11] noted that the survival function can be written as ( ) , , , k T T T  . By Lakhal et al. [3], we can estimate the α parameter, which is also studied by Wang [2] and Heuchenne et al. [6]. From the above, we can 1 : : | , , , | 1,2, ,11 Next, we use the PSO (Particle Swarm Optimization, Kennedy and Eberhart [12]) algorithm, which is a computational approach that optimizes the corresponding likelihood function by iteratively trying to improve a candidate solution, to obtain the mle of h, which is denoted as ˆm le h , and define ( ) Then, use the PSO (Particle Swarm Optimization) algorithm to obtain the maximum likelihood, and define ( )

Simulation Studies
This section examines the performance of the proposed model selection methods and compares it with Hsieh et al. [4] through several simulation settings.
Simulated data are generated from three copula models which are the Clayton model, Frank model, and Gumbel model. Based on simulated data from one copula among the three copulas in the above, three candidate models are considered to fit the simulated data. There are two different settings under two different censoring rates: High censoring rate: Case 1: (The censoring rate is about 22% for T and about 14% for D.) In the above situations, we also set three different Kendall's tau, 0.2,0.5 τ = , and 0.8 to determine which model is the most fitted candidate for simulated data.
The sample size is 100 with 500 replications.
Tables 1-4 summarize the simulation results, and it presents the model selected percentage under different simulation data. Note that A i is the method based on the A j i approach, 1,2,3 i = , and k D is the method by Hsieh et al. [4].
From the results, the performance of the three proposed selection methods is better than Hsieh et al. [4], especially for Frank and Gumbel models. Thus, our proposed methods are more stable than Hsieh et al. [4]. From the Tables, we can find that the probability of choosing a correct model rises with increasing Kendall's tau, and also find that the performance under low censoring rate is better than high censoring rate. Based on the comparisons, we recommend using the first method, A 1 , because it takes less computer running time than A 2 and A 3 .

Data Analysis
In this section, we apply our proposed methods to analyze the bone marrow transplant data from Klein and Moeschberger [7]. There were 137 leukemia patients     Tables 5-8. From the results, our methods choose Clayton copula for ALL group, AML high risk group, and all patients; select Gumbel model for AML low risk group. Hsieh et al. [4] choose Gumbel copula for ALL group, AML high risk group, and all patients; selects Frank copula for AML low risk group.

Concluding Remarks
In this paper, we study the model selection problem under semi-competing risks data. Because the non-terminal event is dependently censored by the terminal event, we cannot make inference on the non-terminal event without extra assumptions. Thus, an association model for semi-competing risks data is necessary, and a model selection method is necessary for the association model. We construct the likelihood function for semi-competing risks data under a copula model and propose three approaches based on the likelihood function to select a fitted model.
The simulation analysis shows the performance of the proposed methods are more stable than Hsieh et al. [4], and A 1 takes less time than A 2 and A 3 . With covariates, we can stratify the data according to the covariates and apply the model selection approach for each stratum. For the continuous covariates, we can group it as a categorical variable. Finally, we apply our proposed methods to analyze the Bone Marrow Transplant data. Base on the selected model, an interesting problem is to consider the goodness-of-fit test, which is treated as future work.