Transformation Models for Survival Data Analysis with Applications

When the event of interest never occurs for a proportion of subjects during the study period, survival models with a cure fraction are more appropriate in analyzing this type of data. Considering the non-linear relationship between response variable and covariates, we propose a class of generalized transformation models motivated by Zeng et al. [1] transformed proportional time cure model, in which fractional polynomials are used instead of the simple linear combination of the covariates. Statistical properties of the proposed models are investigated, including identifiability of the parameters, asymptotic consistency, and asymptotic normality of the estimated regression coefficients. A simulation study is carried out to examine the performance of the power selection procedure. The generalized transformation cure rate models are applied to the First National Health and Nutrition Examination Survey Epidemiologic Follow-up Study (NHANES1) for the purpose of examining the relationship between survival time of patients and several risk factors.


Introduction
Survival data analysis is an important topic in statistics that focuses on analyzing the expected duration of time until one or more events occur, such as death or cancer in a targeted population.In a standard survival model, it is often assumed that all uncensored subjects will eventually experience the event of interest, which is described by a monotone decreasing survival function ( ) S t .The function ( ) S t goes to 0 when time t tends to infinity.Survival time T is a continuous nonnegative random variable representing the time of an event.The probability of a subject's surviving till time t is given by ( ) ( ) ( ) , which represents the total amount of risk up to time t.Usually covariates, such as gender, age, weight, blood pressure, heart rate, stage of surgery, etc., are modeled through survival models.In this paper, we assume that the covariates are independent of time.
Cox [2] brought the idea of separating time t and individual covariate vector in the hazard function, which led to the popular proportional hazard model with ( ) ( ) ( ) 0 , exp , h t x h t x β ′ = where ( ) 0 h t was the baseline hazard function and β was a vector of regression coefficients.However, in some situations, the event of interest never occurs for a significant proportion of subjects.For example, in a cancer clinical trial, the endpoint of interest is often recurrence.For some patients, the disease will never relapse after being treated.These patients are considered cured.Sometimes, subjects with long-term censored times can be viewed as "cured" as well.Survival models with a cure fraction are very popular in analyzing this type of cancer clinical trials.
Motivated by the transformed proportional time cure model introduced by Zeng et al. [1], we propose a class of generalized transformation models to characterize the non-linear relationship between survival function ( ) S t and relate covariates.Statistical properties of the proposed models are investigated, which include iden- tifiability, asymptotic consistency, and asymptotic normality of the estimated regression coefficients.Powers of fractional polynomials within the proposed models are selected based on the likelihood function.A simulation study is carried out to examine the performance of the power selection procedure.The generalized transformation cure rate models are applied to coronary heart disease and cancer related medical data from both observational cohort studies and clinical trials.
The first cure rate model is the mixture cure rate model proposed by Berkson and Gage [3], which combines the cured and non-cured populations by using a summation function.In their model, the survival function for the entire population, denoted by ( ) 1 S t , is given by ( ) ( ) ( ) where π is the proportion in the cured group and ( ) S t is the survival function for the non-cured group in the entire population.Notice that ( ) 1 S t is not a proper survival function since ( ) > .This mixture model has been fully discussed by many authors, including Farewell [4], Gary and Tsiatis [5], Sposto et al. [6], Laska and Meisner [7], Sy and Taylor [8], and Lu and Ying [9].
Even though the mixture model introduced by Berkson and Gage [3], is attractive and widely used, it has several drawbacks.One of them is that the mixture model cannot have a proportional hazards structure if the covariates are modeled through π .Ibrahim et al. [10] also pointed out that a mixture model sometimes yields improper posterior distribution when noninformative improper priors are used from the Bayesian point of view.Yakovlev and Tsodikov [11], Tsodikov [12], Chen et al. [13], and Zeng et al. [1] proposed and studied promotion time cure model.Instead of dividing the population into two sub-populations so that some subjects are long-term survivors with probability π and others have a proper survival function ( )  The promotion time cure model avoids the drawbacks of a mixture model and has a proportional hazards structure through the cure rate parameter.Chen et al. [13] also proposed classes of noninformative and informative priors for promotion time cure rate model that lead to proper posterior distributions.
The promotion time cure rate model and the mixture cure rate model are linked by a mathematical relation-ship, and can be rewritten in a uniform format.Zeng et al. [1] proposed a general promotion time cure model with transformation.Their model includes proportional hazards model and proportional odds model as special cases.To take into account the unknown and unobservable risk factor for each individual, they used a subjectspecific frailty variable in model (1.1).The survival function for the time to relapse is given by ( ) Different parametric distributions may be applied to the frailty i ξ .The most commonly used one is the gamma distribution ( ) , where 0 γ ≥ .The mean of the gamma distribution needs to be one due to the model identification issue.Taking expectations with respect to i ξ on both sides in (1.2), the survival function becomes ( ) As Zeng et al. [1] pointed out, (1.3) provides a very wide class of transformation cure models with the form: where The proportional hazards model in (1.1) is a special case of the transformation families (1.5) and (1.6) corresponding to 0 γ = and 1 γ = , respectively.Another popular survival model, the proportional odds model, is also a special case of (1.5) and (1.6) when 1 γ = and 0 γ = , respectively.
From model (1.4) the cure fraction is ( ) ( ) The covariates can be modeled through a known and strictly positive increasing link function ( ) , where β is the regression vector including an intercept term.
In this paper, we extend the transformed proportional time cure model proposed by Zeng et al. [14] to a more general class of transformation models, in which fractional polynomials are used instead of the simple linear combination of the covariates.The statistical properties of our proposed models will be investigated.Estimation and model selection procedures will be discussed.The rest of the paper is organized as follows.In Section 2, we introduce the generalized transformation models and study the identifiability and asymptotic properties of the proposed models.In Section 3, simulation studies are conducted for the purpose of assessing the performance of the power selection procedure.In Section 4, the proposed models will be applied to some real datasets and compared with other models.Conclusions and some discussions are given in Section 5. Proofs of the theorems in Section 2 are provided in Appendix.

Proposed Models and Their Properties
In survival data analysis, the relationship between hazard rates and covariates is quite often nonlinear.Motivated by Zeng et al. [1], we propose a generalized transformation cure model by using a general additive function of ( ) instead of the strictly positive increasing link function ( ) ( ) X .The additive models were introduced by Stone [15], which is defined by ( ) , where ( ) are arbitrary univariate functions, one for each covariate.Additive models retain the important additive feature of the linear regression models and are much more flexible to use in practice.Royston and Altman [14] suggested using fractional polynomials for each ( ) f X , which is a family of functions of positive covariates.For simplicity, let us consider a single covariate X first.A fractional polynomial with degree m is described as ( ) , log , and   ( ) , , , , is a real-valued vector of powers.If  is a fractional polynomial with degree 5 m = and ( ) Royston and Altman [14] pointed out that special attention should be paid to low-order fractional polynomials with degrees one and two, since models with degree higher than two are rarely used in practice.They also suggested that the powers could be chosen from the set ( ) ( ) 2, 1, 0.5, 0, 0.5,1, 2, , max 3, m − − −  , since the set is rich enough to cover all conventional polynomials of interest.It is well known that the best estimates of the powers in a transformation model may be determined based on the maximum likelihood method.
For some data sets, especially data from medical studies, fractional polynomials may give a better fit compared to the conventional polynomial.In our proposed models we use a fractional polynomial instead of X β ′ in the link function ( ) η ⋅ .Even though in practice fractional polynomials with degree higher than two are not used very often, we consider the following general form for the function ( ) , , , , , are categorical covariates such as ordinal covariates or dummy variables, and { } 1 , , are positive continuous covariates.An intercept term 0 β is also con- sidered in (2.2) when we assume that 0 1 X ≡ .Moreover, we assume that 0 i i α α ≠ for 1 q i p + ≤ ≤ , i.e., a degree of three fractional polynomial is used for each continuous covariate i X .For example, for a given based on the definition in (2.1).In a typical survival analysis setting, survival times are often right censored, which means for some subjects we do not know when exactly the failures occurred, but we do know that the survival time is at least beyond some certain time point C. Suppose that there are n right censored subjects.For the ith individual the survival time and the fixed censoring time are denoted by i T and i C , respectively.The i T 's are assumed to be independent and identically distributed with a distribution function F.
The observed time point for the ith subject is ( ) The exact survival time i T will be observed only if the failure occurred before being censored, otherwise i Y is equal to the censoring time.A triple of random variables ( ) Y X ∆ is used to describe each subject, where i X is the covariate vector and i ∆ is defined as the following, 1, 0, .
In a proportional hazard model, the regression coefficient β is estimated by maximizing the partial likelihood function, In the model (1.4) with link function (2.2), if the parameter γ is given the likelihood function is expressed by ( ) .
) and following the discussion in Zeng et al. [1], the maximum likelihood estimates of ( ) .
The three pieces of products in (2.4) are for failures, censored cases, and subjects who never experience failure or censoring, respectively.The estimate of ( ) , can be obtained by using the nonparametric maximum likelihood estimation approach and Newton-Raphson algorithm iteratively.

Model Identifiability
For the statistical properties of our proposed models, we first discuss the identifiability of generalized transformation models.Suppose that we use models (1.4) and (1.5) with the link function defined in (2.2).The observed-data likelihood function of parameters ( ) .
The following two lemmas give sufficient conditions of identifiability to a more general class of transformations that include the transformation (1.5) as a special case.Proofs of the lemmas are given in Appendix.

If ( )
. G γ satisfies the following conditions: (G1) ( ) is strictly monotonic and twice continuously differentiable with ( ) ( ) ( ) It can be shown that the transformation family given in (1.5) satisfies both conditions (G1) and (G2).Specifically, we have ( ) Other transformation families can also be considered as long as the conditions (G1) and (G2) hold.For example, the Box-Cox type transformation discussed in Zeng et al. [1], also satisfies conditions (G1) and (G2) with ( ) , , where ( ) η ⋅ is strictly monotonic, ( ) ( ) , im p and in q are not equal to zeros simultaneously, and , m n ∑ is used for a finite summation since the number of parameters in our proposed models is finite.
Function in (2.6) is a more general function than that defined in (2.2).The following lemma show that the parameters j β 's and imn β 's in the function are all identifiable.
Lemma 2. For the function ( ) Based on the results in Lemma 1 and Lemma 2, we have the following theorem on the identifiability of the generalized transformation models.
Theorem 1.For the generalized transformation models defined in (1.4) and (1.5) with the link function specified in (2.2), if , for any y R + ∈ and any X, then , , and F F =  .In other words, the generalized transformation models are identifiable.

Estimation
Zeng et al. [1] discussed semiparametric transformation models for survival data with a cure fraction and established theorems describing the asymptotic properties of the maximum likelihood estimation of ( ) , where β is the vector of coefficients and ( ) .

F
is the promotion time cumulative distribution function in the model.In our proposed generalized transformation cure models, fractional polynomials are used instead of the simple linear combination of the covariates.Similar to Theorem 1 and Theorem 2 in Zeng et al. [1], we can prove the asymptotic properties of the maximum likelihood estimation of ( ) , F β in the proposed models.To obtain consistency and asymptotic normality, we make the following assumptions: (C1) The covariate X belongs to a compact set  .(C2) The vector of regression coefficients β belongs to a compact set 0  .The true value of β , denoted by 0 β , belongs to the interior of set 0  .
(C3) F is a distribution function with jumps when 1 ∆ = .The true F, denoted by 0 F , is differentiable with (C5) The positive link function ( ) . η is a strictly increasing and twice continuously differentiable for X .
(C6) The transformation G satisfies ( ) G x exists and is conti- nuous.

Simulations
In this section, we conduct simulations to study the empirical properties of the generalized transformation models and to examine the performance of the proposed power selection procedure on generalized transformation models.The model used in this simulation was given in Zeng et al. [1] and has a survival function of the form: where ( )

5). For the purpose of illustration, only one continuous variable 1
X and one categorical variable 2 X are considered in the simulation.Specifically, we take γ equal to zero in (1.5) and consider the following link func- tion, ( ) ( ) where 0 p is a nonzero power varying from −2 to 2. Covariate 1 X is a uniformly distributed random variable in [0.5, 2] and covariate 2 X is a Bernoulli random variable with probability 0.5.The coefficients 0 β , 1 β , and 2 β are assumed to be constants.When 0 0 p = , we use − in this simulation.Survival times of subjects with covariates 1 X and 2 X are generated.Each subject has a chance of being cured.We assume the survival life times T equal to ∞ for the cured population.For example, the ith individual in the simulated data set has a cure rate equal to  .i S X is the generalized transformation model given in (3.1) and (1.5).Therefore, the life time T will be generated from with probability ( ) ( ) where U has a uniform distribution in [0, 1].Assume each subject being right-censored with a probability 1 q < , for example q = 80%.So, the censoring time i C for the ith individual in the data set will equal ∞ with a 20% of chance.For the rest of the population, the censoring time is generated from an exponential distribution with mean one.
Table 1 shows the power selection results under the proposed generalized transformation model based on 200 simulated data sets with q = 80% and sample sizes 2000 and 5000, respectively.The columns labeled "mean" are the average of the selected powers and the columns labeled "freq."are the number of times of selecting the true power in the 200 simulations.When the sample size is 5000, the power selection procedure work well.The accurate rates of choosing the true power are higher than 50% and the means of the selected powers are very close to the true value for most of the cases.For example, when 0 1 p = − the true power is selected for 104 times and the estimated mean is −1.005.When the sample size decreases to 2000, the power selection results are less accurate.For both sample sizes, the accurate rates are higher when 0 2 p = − or 0 2 p = than other cases since we select the powers only in the range of −2 to 2. This also explains why the absolute values of means of the selected powers when 0 2 p = − or 0 2 p = tends to be smaller.If powers beyond −2 and 2 are allowed to be selected, 0 2 p = − or 0 2 p = should have less chance to be underestimated.Table 2 presents more results on power selection with 5000 n = and q = 80% based on 200 simulations.In the table each column represents one scenario.For example, when 0 1 p = − , the true power −1 is selected 104 times; Powers −1.5 and −0.5 are selected 44 and 42 times, respectively; and powers −1 and 0 are selected 5 times each.These results indicates that the selected powers are all centering around the true power.
In this simulation we assume that the probability of each subject being censored is q = 80%.In fact, the probability q basically does not affect the performance of the power selection procedure.When q takes different values while other factors in the simulation remain the same, the power selection results show a very similar pattern as that when q = 80%.

Applications
In this section, we will illustrate the applications of the proposed generalized transformation models and compare the proposed models with the Cox proportional hazards model and the Zeng et al. [1] transformation cure model by analyzing data from the First National Health and Nutrition Examination Survey Epidemiologic Follow-up Study (NHANES1).The NHANES1 data set is from the Diverse Populations Collaboration (DPC), which is a pooled database contributed by a group of investigators to examine issues of heterogeneity of results in epidemiological studies.The database includes 21 observational cohorts studies, 3 clinical trials, and 3 national samples.In the dataset NHANES1, information for 14,407 individuals was collected in four cohorts from 1971 to 1992.In this analysis, we use data from two of the four cohorts, the black female cohort and the black male cohort.After dropping all missing observations, a total of 2027 patients remains in these two cohorts, including 1265 black females and 762 black males.Survival times of the 2027 patients are used as the response variable.The endpoint is the overall survival time collected in 1992.In the two cohorts 848 patients, about 40% of the total number of patients, died at the end of followup with a maximum survival life time of 7691 days.There were 1179 patients whose survival times were right censored, among them 115 patients had survival time longer than 7691 days.We consider these 115 patients as cured subjects.
Covariates selected by fitting the Cox model and using the stepwise backward elimination algorithm will be level.The results show that males have a higher hazard rate than females and older patients have a higher hazard rate than younger patients.People with diabetes or coronary heart disease face a higher hazard rate than people who did not have such disease.The hazard of death increases by 0.4% when the Sbp level of a patient increases 1 mmHg.The results also show that the higher the value of BMI of a patient the lower the hazard rate she/he will face.Particularly, the hazard will decrease about 1.2% when the value of BMI increases by 1 kg/m 2 , which is not quite reasonable.The values of BMI often ranges from 15 kg/m 2 to 60 kg/m 2 .BMI in the range of 21 kg/m 2 to 25 kg/m 2 is considered as normal weight; 30 kg/m 2 or greater is considered as obesity.It is well known that being obesity will increase the hazard to develop many coronary heart diseases or even death.The relationship between survival time and BMI may not be linear.Therefore, a transformation on the covariate BMI may be needed for the NHANES1 data.
A transformation of 0 γ = is chosen with maximum likelihood from Zeng et al. [1] model with trans- formation family (1.5).The observed log-likelihood is shown in Figure 1 with different values of γ.The corresponding estimates of regression coefficients are summarized in Table 5.The results are comparable with that in the Cox proportional hazards model.
There are three continuous covariates in our analysis, Age, BMI, and Sbp.The main relationship of interest is between mortality and the factor BMI.In the next step, we will focus on choosing an appropriate power from the set A = (−2, −1.5, −1, −0.5, 0, 0.5, 1, 1.5, 2) for BMI within our proposed models.To do so, we fit models In model (4.1), when we fix Age and Sbp, power 01 2 p = − is selected for BMI.The observed log-likelihood is plotted in Figure 2(a The corresponding estimates of regression coefficients are listed in Table 6.Now let us compare the Cox model, Zeng et al. [1] models, and the proposed models by using the Brier score.The Brier score was originally proposed by Brier [16] to verify the accuracy of weather forecasts and then where n is the total sample size, i Y is the observed survival time of the ith patient, ( )  It is obvious that the Brier score takes minimum value of 0 for perfect prediction of survival status and its range is from 0 to 1.The lower the value of the Brier score, the better the prediction.To compare the Cox model, Zeng et al. [1] models, and proposed models, we calculated the Brier scores at the first quartile Q1, median Q2, and last quartile Q3 of 848 uncensored survival times in the NHANES1 study.The results are summarized in Table 7.We can see that the proposed models has the smallest Brier scores at all there time points.For example, at the median uncensored survival time Q2 = 3894.5days, the Brier score is 0.1396 for the Cox model.It is 0.1308 for Zeng et al. [1] model.The value of Brier score drops to 0.1297 for the selected proposed model, which indicates the chosen proposed model can well predict the survival outcome as the other two models, and sometimes better.

Conclusions and Discussion
In this paper, we proposed a class of generalized transformation models.Zeng et al. [14] introduced semiparametric transformation models for survival data with a cure fraction, which included the commonly used proportional hazards cure rate models and proportional odds models as special cases.Similar to the structure suggested in Zeng et al. [1], covariates related to the event of interest were modeled through a link function ( ) ( )

X
, where ( ) .η was a known and strictly positive increasing function, such as exponential functions.In our proposed models, we used generalized additive models instead of X ′ β in the link function ( ) .η .Specifically, we considered fractional polynomials proposed by Royston and Altman [14].We proved that the proposed model was identifiable as long as the transformation families

( )
. G γ to satisfy some very general conditions.To select transformation powers in fractional polynomials, we proposed choosing powers from set A = (−2, −1.5, −1, 0.5, 0, 0.5, 1, 1.5, 2) by comparing likelihood functions.Simulation results showed the power selection procedure works well.An improvement in this direction could consider the power as a parameter and estimate the power by using maximum likelihood methods rather than selecting the power from set A.
The proposed generalized transformation models can be applied to a variety of survival data.Even though the cure models are motivated from clinical trials where the end point is not death, such as relapse-free survival time, it can be used to overall survival time as well.In this article, the applications of the proposed models are illustrated by examining the relationship between the survival time of a patient and several risk factors based on two cohorts data from the First National Health and Nutrition Examination Survey Epidemiologic Follow-up Study.In terms of the Brier scores, the selected proposed model provides better fitting compared with the Cox proportional hazards model and the Zeng et al. [1] transformation cure model.It should be pointed out that even though the Brier score is commonly used in practice for model comparison, it has its own disadvantages.For instance, although the Brier score can be calculated at any arbitrary time point, but it dose not discriminate competing models over the whole time period.Other model comparison methodologies will be explored in our future study.For example, receiver operating characteristic (ROC) curves may be used to measure the differences of the models over all the relevant time periods.

Appendix: Proofs of the Main Results
In Appendix, we first prove the Lemmas on model identifiability listed in Section 2.1.Then we will show the asymptotic properties of the semi-parametric estimates in the proposed models under conditions (C1)-(C6) given in Section 2.2.Proofs of the Theorems 2 and 3 are similar to those of Theorems 1 and 2 in Zeng et al. [14] with some modifications.

a. Proofs of Model Identifiability
Proof of Lemma 1: Suppose that ( ) X θ can take two different non-zero values 1 α and 2 α , such that ( ) ( ) ( ) ( ) , , , , then we will have the following two equations about ( ) .
The inverse function of ( )  exists because of the monotonicity of ( ) to the above we get, ( . We want to show that ( ) ( ) are monotonic, which implies that both 1 β and 2 β can not be zero.Otherwise ( ) 0 g y ≡ when y takes different values.Take the ratio of the two equations in (A.2) and let

( )
s F y = .The following equation holds for Calculating the first and second order derivatives in both sides of (3), plugging in 0 s = , and taking ratio of the two equations, we will have 1 2 α α = .This contradiction leads to γ γ =  .This concludes that ( ) and therefore ( ) ( ) . Since ( ) η ⋅ is a strictly monotonic function, we have ( ) ( ) for example, and only consider ( ) , where q n q m q n q m q p q mn q q m n q p q mn q q m n X X M X X M Without loss of generality, assume that 1, 1, q m q m p p + + =  and 1, 1, q n q n q q + + =  , since we can always add more terms with coefficients zero to both sides of (A.5).Let 1,0 1,0 0 q q p q + + = = and 1,00 1,00 , q n q m q p q mn q mn q q m n X X Because the function in the left side on (A.6) is analytic in some interval I R + ∈ , it holds for any have different orders when 1 q X + → ∞ .But since their summation is always zero, the coefficients for each term must be zero.Therefore, we have . .
To prove the identifiability of the coefficient of a categorical covariate, for example 1 X can take at least two different values.

b. Proofs of Strong Consistency of the Maximum Likelihood Estimates
Let n E be the empirical measure of n iid observations and E be the expectation, respectively.For any measurable function ( ) , , , Suppose that there are n independent right censored observations.For the ith observation, we have { } .
In applications we may use 0, , .
to differ the cured and uncured population.which will not affect the proof of consistency and asymptotic normality of the maximum likelihood estimates.
The modified semi-parametric version observed-data likelihood function of parameters ( ) be the estimates of β and F such that ( ) where satisfy the restricted condition by the method of Lagrange multipliers, where ˆn λ is the Lagrange multiplier.That is, . , Equation (A.11) can be written as .
Actually, the right hand side of Equation (A.14) converges to . For the difference .
Letting n → ∞ , then 0 ε → , we obtain We then calculate the right hand side of (A.17) by using conditional expectations. ( ) exists and is positive.Therefore there exists positive constants 0 1 , c c such that ( ) for any T. Combining (A.17) and (A.18), we then have ( ) ( ) Therefore, for any i we have □ Based on the results in Lemma 4, for a given M there exists Class M  is a Donsker class (van der Vaart, A. W. and Wellner, J. A. [18]) because is bounded away from zero.Thus, ( ) Following the calculations in (A.18), we have Based on the expression of ( ) where n C is a constant such that Then it is obviously k Y is bounded away from zero, we have The calculation here is similar to that in (A. Because , n n F β are the maximum likelihood estimates, from (A.9) we have ( ) For the strong convergency of the maximum likelihood estimates, we need to show that * * 0 0 , F F Letting n → ∞ in (A.25), we have ( ) . By the Jenson inequality, we have , where "=" holds if and only if ( ) ( ) since the model is identifiable.Therefore, We only need to show ( ) . The maximum likelihood estimates ( ) Proof: First of all, we want to prove ( )  ) ( ) ( ) Thus from (A.26) we have ( ) and also concludes We have proved that any subsequence of n β , which is also denoted by

 c. Proofs of Asymptotic Normality of the Maximum Likelihood Estimates
We consider the likelihood function and write ( ) , , , where Then the log likelihood function, denoted by ( )  ( ) . Thus, it suffices to show that (3) For IR ν ∈ and ν is small enough, ( ) ( ) , where d is the dimension of β , we have 0 Particularly, we can construct ( ) H t satisfying conditions (1)-(3) through a function ( ) h t , which is defined on [ ) 0, ∞ with bounded total variation.The total variation of ( ) , where the supreme is taken over all finite partitions We can show ( ) H y in (A.31) satisfies conditions (1)-(3).Equation (A.30) can be written as Let us consider a modified semiparametric version likelihood function, ( For any β and any step function ( ) (2)′ The summation of ( ) , when ν is small enough.After some algebra, we obtain Now, let us consider functions h with bounded total variation such that ( ) ( ) and define a set of such functions as .
 is a Donsker class.By Theorem 3.3.1 in van der Vaart and Wellner [18], the left hand side of (A.41) equals

Theorem 2 .
The maximum likelihood estimates ( ) weakly to a Gaussian process.Sketched proofs of Theorem 2 and Theorem 3 are provided in Appendix.
the survival time i T is finite and follows the distribution and

X X .
In stead of using the linear terms as in Zeng et al.[1] models, we use the following four expressions in the function ( )

2 BMI
).In the next model (4.2), we fix BMI and Sbp, trying to find a transformation for Age.Power 02 1 p = is selected based on the log-likelihood, which is plotted in Figure 2(b).The selected model corresponds to Zeng et al's model.In many statistical models, the inverse of BMI, 1 BMI − , lean body mass index is used.So we fit a model (4.3)where 1 BMI − and Sbp are fixed.In model (4.4), − and Sbp fixed.Both model (4.3) and model (4.4) select power=1 for Age.The results are plotted in Figure 2(c) and Figure 2(d).As a summary, the best transformation based on log-likelihood from model (4.1)-(4.4) is

FLemma 3 .
is a proper distribution function.The structure of the limit function * F can be derived from the results of Lemmas 3 and 4. In particular, Lemma 3 shows the convergence of the lemma was given in Zeng et al.[1].Under conditions (C1)-(C6),

Lemma 4 .>
Under conditions (C1)-(C6), for 0 y ≤ < ∞ , and 18) with ( ) k Y in the denominator instead of s., where 0 β is the true value of β and function 0 F is the true promotion time cumulative distribution function.

G
⋅ is a monotone decreasing function, which implies that

F
y uniformly in y on [ ] 0, M for any fixed M and in y on [ ) 0, ∞ because of the continuity of 0 F .

A. 28 ) 5 .β is the true value of β and 0 F
Lemma For any β and any distribution function F with a density, we have is the true promotion time cumulative distribution function.

Lemma 6 .
With the notations defined in (A.39), we have is the likelihood function given in (A.27).Proof: It follows from (A.32) and (A.38) that ( ) hand side of (A.41) and notations (A.39), Equation (A.41) can be simplified as

Lemma 7 . 3 .
The linear operator ( ) ( − Ω .To prove Ω is invertible, we only need to show Under condition (C1)-(C6), t is defined as the instantaneous failure rate at time t conditional on survival until time t or later.The cumulative hazard f t .The hazard function ( ) h

Table 1 .
Results of power selection under the proposed generalized transformation model based on 200 simulated data sets

Table 2 .
Results of power selection under the proposed generalized transformation model based on 200 simulated data sets with sample size n = 5000 and the probability of each subject being right-censored q = 80%.different survival models.These covariates are Age, Systolic blood pressure (Sbp), Sex, Body Mass Index (BMI), Diabetes (Diab), and Coronary heart disease (Chd).Summary statistics of continuous covariates are list in

Table 3 .
Diab and Chd are categorical and only take the values of 0 and 1 for absence and presence of the corresponding disease.Among the 2027 patients in the two cohorts, there were 121 of them having diabetes and 82 of them having coronary heart disease.The results of the Cox proportional hazard model are summarized in

Table 4 .
All covariates are highly signi-

Table 3 .
Summary statistics of continuous covariates in the NHANES1 study.

Table 4 .
Fitted Cox proportional hazards model for the NHANES1 study.

Table 5 .
Estimates of regression coefficients in Zeng et al. [1] model based on transformation class (1.5) with γ = 0 for the NHANES1 study.

Table 7 .
Brier scores for different survival models for the NHANES1 study. * The proof is similar to that of Theorem 2 thus omitted.