^{1}

^{*}

^{1}

^{2}

Given a sample of regression data from (
*Y, Z*), a new diagnostic plotting method is proposed for checking the hypothesis
*H*
_{0}: the data are from a given Cox model with the time-dependent covariates
*Z*. It compares two estimates of the marginal distribution
*F _{Y}* of

*Y*. One is an estimate of the modified expression of

*F*under

_{Y}*H*

_{0}, based on a consistent estimate of the parameter under

*H*

_{0}, and based on the baseline distribution of the data. The other is the Kaplan-Meier-estimator of

*F*, together with its confidence band. The new plot, called the marginal distribution plot, can be viewed as a test for testing

_{Y}*H*

_{0}. The main advantage of the test over the existing residual tests is in the case that the data do not satisfy any Cox model or the Cox model is mis-specified. Then the new test is still valid, but not the residual tests and the residual tests often make type II error with a very large probability.

In this paper, we propose a new diagnostic plotting method for the proportional hazards (PH) model [

Denote the conditional hazard function and survival function of Y for given Z by h Y | Z ( ⋅ | ⋅ ) and S Y | Z ( ⋅ | ⋅ ) , respectively, or simply h ( ⋅ | ⋅ ) and S ( ⋅ | ⋅ ) . Let ( Y 1 , Z 1 ) , ⋯ , ( Y n , Z n ) , be i.i.d. copies of ( Y , Z ) with the distribution function F Y , Z . The PH model was first defined by h ( t | z ) = exp ( β ′ z ) h o ( t ) , where h o is the baseline hazard function, z is a p × 1 covariate vector, β is a p × 1 para- meter vector, h o and β are unknown, and p does not depend on n . The model is referred as the time-independent covariate PH (TIPH) model. This model has been extended in two ways: 1) the covariate is time-dependent, i.e., z = z ( t ) is a function of time t ; 2) the regression coefficient is time-varying, i.e., β = β ( t ) is a function of time t . For the time-dependent covariates PH (TDPH) model, Kalbfleisch and Prentice [

It is often that the external time-dependent covariate Z i can be written as Z i = Z i ( t ) = B ( U i , g i ( t ) ) , where B ( ⋅ , ⋅ ) is a function, U 1 , ⋯ , U n are i.i.d. co- pies from the time-independent random vector U and g i ( ⋅ ) is a function of time t. A simple example of B ( U , g ( t ) ) is w ( U ) g ( t ) , where w ( ⋅ ) is a function not depending on time t . Without loss of generality (WLOG), we can assume w ( U ) = U . Two simple examples of g i ( t ) are (a) g i ( t ) = 1 ( t ≥ a i ) and (b) g i ( t ) = 1 ( t ≥ a i ) ( t − a i ) , where a i is a constant but may depend on subject i [

An important step in the data analysis under the PH model is to check whether the model is indeed appropriate for the data. To this end, it is desirable to have some diagnostic plotting methods for the PH model. In the literature, some diagnostic plotting methods under the semi-parametric set-up are designed to inspect whether the data follow the TIPH model h ( t | z ) = h o ( t ) exp ( β ′ z ) , (i.e., g ( t ) is constant). One well-known diagnostic method for the PH model is the log-minus-log plots (log-log plots).

Several other graphical methods using residuals to check the PH model assumption have been proposed in the literature [

We provide a new diagnostic plotting method for the PH model. The main idea is to plot the Kaplan-Meier estimator (KME) of S Y against a proper estimator of the marginal distribution of Y under the selected model. Thus it is called the marginal distribution (MD) plot. The MD plot can be described as a 5-step procedure: 1) Fit the Cox model you have in mind to obtain the regression coefficients. 2) Choose a reference value for the covariate Z , say z o , such that there exist many observations in its neighborhood, say N ( z o ) . 3) Estimate the survival function of Y for Z = z o using N ( z o ) . 4) Use the estimator in 3) to estimate the marginal survival function of Y . 5) Compare the estimator of the marginal survival function in 4) with the KME of S Y .

The paper is organized as follows. In Section 2, we propose the MD plot and other supplementary diagnostic plots. In Section 3, we present simulation results on the performance of the plot. We also compare the MD plot to the current residual plots. In Section 4, we apply the new diagnostic plot to the long-term breast cancer follow-up data analyzed in Wong et al. [

The assumption and notations are given in § 2.1. The idea of the marginal approach is introduced in § 2.2. The method is explained in § 2.3 and § 2.4.

Let Θ p h be the collection of all PH models specified by

h Y | Z ( t | z ( t ) ) = e β ( t ) ′ z ( t ) h o ( t ) (1)

where β ( t ) and z ( t ) are now possible vectors of functions of t [

H 0 : thedataarefromModel ( 1 ) with Z ( t ) = B ( U , g ( t ) ) andgiven B ( ⋅ , ⋅ ) and g ( ⋅ ) , (2)

where U is a p -dimensional random covariate vector and the baseline hazard function h o ( ⋅ ) is unknown. Let Θ be the collection of all possible joint dis- tribution functions F Y , U of ( Y , U ) . Notice that F Y , U does not need to belong to Θ p h . Abusing notation, by z ( t ) = u g ( t ) , we mean that u g ( t ) = ( u 1 g 1 ( t ) , ⋯ , u p g p ( t ) ) = D ( g 1 ( t ) , ⋯ , g p ( t ) ) ( u 1 , ⋯ , u p ) ′ , where D ( g 1 ( t ) , ⋯ , g p ( t ) ) is a p × p diagonal matrix with diagonal elements g i ( t ) ’s.

Our method involves the mode, say a vector c ∈ ℝ p , of the distribution of the random vector U. That is, ∀ ϵ > 0 and ∀ η ∈ ℝ p , p r ( ‖ U − c ‖ < ϵ ) ≥ p r ( ‖ U − η ‖ < ϵ ) , where ‖ ⋅ ‖ is a norm, e.g., ‖ U ‖ = max i | U i | , where U = ( U 1 , ⋯ , U p ) .

Proposition 1. If ( Y , U ) satisfies Model (1), then for each c ∈ ℝ p , h Y | W ( t | w ( t ) ) = h 1 ( t ) exp { β ( t ) ′ w ( t ) } , where h 1 ( t ) = h o ( t ) exp { β ( t ) ′ c } and W ( t ) = Z ( t ) − c .

In view of Proposition 1, hereafter, WLOG, we can assume that

AS1. The zero vector 0 is a mode of U and it satisfies that ‖ B ( 0 , g ( t ) ) ‖ = 0 .

Otherwise, let U o be a mode of U and define W ( t ) = Z ( t ) − c , where c = Z ( U o ) . Then by Proposition 1, Model (1) is equivalent to another PH model, where h 1 takes place of the role of the baseline hazard function h o and h 1 ( t ) = h Y | W ( t | 0 ) is also unknown.

Since S Y | U may not satisfy the PH model in the null hypothesis H 0 (see (2)), one can define a new conditional survival function of a new response variable, say Y ∗ , for given Z such that S Y ∗ | Z satisfies the PH model in the null hypothesis H 0 . Correspondingly, one can define the new marginal survival function of Y ∗ , say S Y ∗ . That is,

S Y ∗ ( t ) = E ( S Y ∗ | U ( t | U ) ) , where S Y ∗ | U ( t | u ) = exp { − ∫ 0 t e β ( x ) ′ B ( u , g ( x ) ) h o ( x ) d x } . (3)

Notice that h o and S o are equivalent if h o exists. Abusing notation, write S Y ∗ | U ( t | u ) = S ( t , u , S o , β ) . Notice that S Y ∗ is a function of the unknown para- meter β . Given the distribution function F Y , U , if β is not a function of t , then β is the almost sure limit of the maximum partial likelihood estimator (MPLE) of β under H 0 (see Example 1 below), otherwise, it is conceivable that β ( t ) is some limiting point of the estimator of β ( t ) [

Example 1. Assume that F Y , Z is a uniform distribution in the region A 1 ∪ A 2 , where A 1 is the set bounded by the four straight lines y = 0 , y = 1 , x − y = 0 and x − y = − 1 , and A 2 is the set bounded by y = 0 , y = 1 , x = 3 and x = 4 . Then the family of distributions { S Y | Z ( ⋅ | z ) : z ∈ ( − 1 , 1 ) ∪ ( 3 , 4 ) } does not satisfy the PH model, and S Y ( t ) = S o ( t ) = 1 − t for t ∈ [ 0 , 1 ] . If one fits the TIPH model H 0 : h Y | Z ( t | z ) = h o ( t ) exp ( β z ) , with data from F Y , Z without knowing F Y , Z , then S Y ∗ ( t ) = E ( ( 1 − t ) e β Z ) by (3), where β ≈ − 0.045 , which is the limit of the MPLE based on the random sample from F Y , Z under H 0 . S Y ∗ and F Y ∗ , Z are uniquely determined by F Y , Z and H 0 .

Lemma 1. If S Y | U does not follow the model defined in H 0 (see (2)), then (a) S Y ∗ | U = S Y | U , and (b) S o ( t ) = S Y ∗ | U ( t | 0 ) = S Y | U ( t | 0 ) . Otherwise, (a) S Y ∗ | U = S Y | U , (b) S o ( t ) = S Y ∗ | U ( t | 0 ) = S Y | U ( t | 0 ) and (c) S Y ∗ = S Y .

The proof of Lemma 1 is trivial and is skipped.

Motivated by Lemma 1, the new plotting method we propose here is to plot an estimator of S Y ∗ , say S ^ Y ∗ , against S ^ Y (the KME of S Y ) together with the 95 % confidence band of S ^ Y , and

tocheckwhetherthegraphsof y = S ^ Y ( x ) and y = S ^ Y ∗ ( x ) areclose ( e .g . , whether y = S ^ Y ∗ ( x ) iswithinoroutsidethe95%confidencebandof S ^ Y ) ,

where S ^ Y ∗ ( t ) = ∑ i = 1 n S ^ Y ∗ | U ( t | u i ) / n , S ^ Y ∗ | U ( t | u ) = S ( t , u , S ^ o , β ^ ) (see (3)), β ^ is a

consistent estimator of β under H 0 , and S ^ o is a consistent estimator of S o = S Y | U ( ⋅ | 0 ) under the assumption that F Y , U ∈ Θ , even if the data do not satisfy the pre-assumed PH model in H 0 (see Remark 1). In the latter case, it is conceivable that S ^ Y ∗ gives a close image of S Y ∗ If the graphs of S ^ Y and S ^ Y ∗ are close, then it suggests that H 0 in (2) is true. We thus call this plot the marginal distribution plot.

Since the main issue of the MD plot is the estimator S ^ Y ∗ and it is not trivial to find S ^ Y ∗ due to h o ( ⋅ ) in (3), the main focus of the paper is to introduce how to construct S ^ Y ∗ . We shall explain in details how to obtain S ^ Y ∗ through various g ( t ) . We also give S ^ Y ∗ for the general piecewise continuous g ( t ) or B ( ⋅ , g ( t ) ) .

For simplicity, we shall first explain our method when U or Z is a univariate covariate and z ( t ) = B ( u , g ( t ) ) = u g ( t ) . We introduce the generalization to the case of a covariate vector (or matrix) in Remark 3 and to the time-dependent model for general B ( u , g ( t ) ) in § 2.3.4.

Suppose that ( Y 1 , U 1 , C 1 ) , ⋯ , ( Y n , U n , C n ) are i.i.d. copies from a random vector ( Y , U , C ) , where Y may be subject to right censoring by censoring variable C . The success of the MD plot relies on the proper estimators of S o , S Y ∗ | U ( t | u ) and S Y ∗ . Denote them by S ^ o , S ^ Y ∗ | U ( t | u ) and S ^ Y ∗ , respectively. Since 0 is a mode of the covariate U by assumption AS1, a consistent estimator of S o is the KME, denoted by S ^ o , based on the data satisfying | U i | < ϵ n , where ϵ n = r n − 1 / k o with k o > 1 (e.g., ϵ n = r n − 1 / ( 3 p ) ), and r is a given positive number (e.g., the inter-quartile-range or the standard deviation of U i ’s). WLOG, let the first n o observations be all the observations satisfying | U i | < ϵ n . If Y i ’s are right censored, then the KME S ^ o is based on ( U i , Y i ∧ C i , δ i ) , i = 1 , ⋯ , n o , where δ i = 1 ( Y i ≤ C i ) . For ease of explanation, we only consider the case of complete data in this section. The extension to the right censored data is straightforward and we present simulation results with right-censored data in Section 3. For the complete data, the KME of S o based on the first n o ob- servations is

S ^ o ( t ) = 1 n o ∑ i = 1 n o 1 ( Y i > t ) = ∑ i = 1 n 1 ( Y i > t ) 1 ( | U i | < r n − 1 / ( 3 p ) ) ∑ j = 1 n 1 ( | U j | < r n − 1 / ( 3 p ) ) (4)

S ^ Y ∗ | U ( t | u ) will be introduced in details in § 2.3 and

S ^ Y ∗ ( t ) = ∑ i = 1 n S ^ Y ∗ | U ( t | u i ) / n .

Remark 1. Since S ^ o ( t ) is a kernel estimator, it is well known that under certain regularity conditions, S ^ o ( t ) converges to S o ( t ) in probability and

1 n ∑ i = 1 n S ( t , U i , S ^ o , β ) converges to S Y ∗ ( t ) in probability for each given β ,

and ∀ F Y , U ∈ Θ . If the given PH model holds, then S Y ( t ) = S Y ∗ ( t ) and we expect that the graphs of S ^ Y ( t ) and S ^ Y ∗ ( t ) are close, as S ^ Y and S ^ Y ∗ are consistent estimators of S Y and S Y ∗ , respectively. Otherwise it is likely that S Y ( t ) ≠ S Y ∗ ( t ) and two curves y = S ^ Y ( x ) and y = S ^ Y ∗ ( x ) are apart.

Remark 2. One may wonder whether S ^ o in (4) can be replaced by the existing estimators of the baseline survival function under the PH model, denoted by S ˜ o . For instance, under the TIPH model, several consistent estimators of S o , say S ˜ o , can be obtained from the standard statistical packages. For example, the Breslow estimator of baseline survival function can be obtained by applying survfit .coxph ( ) in R. However, if the given TIPH model does not satisfy the data, S ˜ o is inconsistent, whereas S ^ o given in (4) is still consistent. In fact, given a joint distribution of a random vector ( Y , U ) , if it does not satisfy the TIPH model, there exists at least one pair of survival function S 1 and

vector b satisfying E [ { S 1 ( t ) } exp ( b U ) ] = S Y ( t ) , e.g., ( S 1 , b ) = ( S Y , 0 ) . It means

that the estimator S ˜ Y using S ˜ 0 will always suggest that the given TIPH model fits the regression data even though the data set may not satisfy the TIPH model, and hence S ˜ 0 is not a proper choice. Simulation study in § 3.1 suggests that under the assumption in Example 1, S ˜ Y converges to S Y ( t ) in probability and using S ˜ Y fails to identify the wrong model assumption.

We shall first illustrate the main idea through three typical cases when β is a constant: 1) g ( t ) = 1 i.e., the TIPH model, 2) g ( t ) = ( 1 ( t < a ) , 1 ( t ≥ a ) ) i.e., the PWPH model, 3) g ( t ) = ( t − a ) 1 ( t ≥ a ) i.e., the LDPH model. We also discuss the general case of g ( t ) .

1) Case of g ( t ) = 1 , i.e., the TIPH model. Since S ( t , u , S o , β ) = { S o ( t ) } exp ( β u ) by (3), define

#Math_282# (5)

where β ^ is a consistent estimator of β under the selected PH model, e.g, the MPLE.

Remark 3. Even though U in (4) and (5) is a random variable, it is easy to extend to the case that U is a vector. Assume U = ( U 1 , ⋯ , U p ) is a p - dimensional random vector, n is much larger than p and U is bounded. We can define ‖ U ‖ = max 1 ≤ i ≤ p | U i | or ∑ i = 1 p U i 2 . In the simulation study, we know the mode of U . In applications, the sample mode is not well defined. We choose a “sample mode” of U i ’s, say c ∈ ℝ p as follows: 1) Select a proper radius r and choose points q such that in a neighborhood of q with rasius r, N r ( q ) , there are more than 20 (or n 1 − 1 3 p ) observations. Of course, we do not want r to be too large. 2) Among these points q , choose the one, say c , with the largest number of observations in N r ( c ) among all N r ( q ) . 3) Then set U i ∗ = U i − c , i = 1 , ⋯ , n , (in view of Proposition 1).

2) Case of g ( t ) = ( 1 ( t < a ) , 1 ( t ≥ a ) ) , i.e., the PWPH model with one cut point. Now β = ( β 1 , β 2 ) ′ and

S ( t , u , v , S o , β ) = ( S o ( a ) ) { exp ( β 1 u ) − exp ( β 2 v ) } 1 ( t ≥ a ) ( S o ( t ) ) exp { β 1 u 1 ( t < a ) + β 2 v 1 ( t ≥ a ) } .

The covariate z ( t ) in (2) is z ( t ) = D ( 1 ( t < a ) , 1 ( t ≥ a ) ) ( u , v ) ′ , where D is a 2 ´ 2 diagonal matrix. Then S ^ Y | U , V ( t | u , v ) = S ( t , u , v , S ^ o , β ^ ) and

#Math_315# (6)

where a i may depend on the i -th observation. And g ( t ) = ( 1 ( t < a ) , 1 ( t ≥ a ) ) ′ corresponds to a special case of the PWPH model with one cut-point. For the PWPH model with more than one cut-point, the derivation of S ^ Y ∗ is similar. Typically for the PWPH model with two cut-points,

h ( t | u , r , v ) = exp { β ′ z ( t ) } h o ( t ) = exp { β 1 u 1 ( t < a ) + β 2 r 1 ( t ∈ [ a , b ) ) + β 3 v 1 ( t ≥ b ) } h o ( t ) ,

where β = ( β 1 , β 2 , β 3 ) ′ , the covariate z ( t ) = D ( 1 ( t < a ) , 1 ( a ≤ t < b ) , 1 ( t ≥ b ) ) ( u , r , v ) ′ , and D is a 3 ´ 3 diagonal matrix. Then (3) yields

S ( t , u , r , v , S o , β ) = ( ( S o ( t ) ) exp ( β 1 u ) if t ∈ ( − ∞ , a ) ( S o ( a ) ) exp ( β 1 u ) ( S o ( t ) S o ( a ) ) exp ( β 2 r ) if t ∈ [ a , b ) ( S o ( a ) ) exp ( β 1 u ) ( S o ( b ) S o ( a ) ) exp ( β 2 r ) ( S o ( t ) S o ( b ) ) exp ( β 3 v ) if t ∈ [ b , ∞ ) ,

S ^ Y ∗ ( t ) = ∑ i = 1 n S ( t , U i , R i , V i , S ^ o , β ^ ) / n ,

where β ^ is an estimator of β .

3) Case of g ( t ) = 1 ( t ≥ a ) ( t − a ) , i.e., the LDPH model. In this situation, S ( t , u , S o , β ) is not a simple form in terms of S o ( t ) . Let S ^ o be defined as in (4) and let a = b 0 < b 1 < ⋯ < b k be the discontinuous points of S ^ o ( t ) for t > a , we propose to estimate S ( t , u , S o , β ) and S Y ∗ ( t ) by

S ^ Y ∗ | U ( t | u ) = ( S ^ o ( t ) if t < b 1 S ^ o ( a ) ∏ i = 1 j ( S ^ o ( b i ) S ^ o ( b i − 1 ) ) e ( b i − a ) u β ^ if t ∈ [ b j , b j + 1 ) (7)

and S ^ Y ∗ ( t ) = ∑ i = 1 n S ^ Y ∗ | U ( t | U i ) / n . The reason is as follows. Notice that S ^ o

defined in (4) is a step function. It has the same consistency property as

S ⌣ o ( t ) = ( S ^ o ( t ) if t ≤ a S ^ o ( a ) exp ( − ∫ 0 t h ⌣ o ( x ) d x ) if t > a , where h ⌣ o ( t ) = 1 ( t ∈ ( a i , b i ] ) h i , h i = 1 ϵ ln S ^ o ( b i − 1 ) S ^ o ( b i ) for t ∈ ( b i − 1 , b i ] , (8)

a i = b i − ϵ , i = 1 , 2 , ⋯ , k , ϵ = min { | b j − b j − 1 | : j = 1 , ⋯ , k } / n , and S ⌣ o ( t ) = S ^ o ( t ) at the discontinuous points of S ^ o ( t ) . Since

#Math_346# (9)

S ( t , u , S ⌣ o , β ^ ) = ( S ^ o ( t ) if t ≤ a 1 or u = 0 S ^ o ( a ) exp ( − ∑ i = 1 j h ⌣ i exp { ( a i − a ) u β ^ } u β ^ [ exp { u β ^ ( b i − a i ) } − 1 ] ) if u ≠ 0 , t ∈ [ b j , a j + 1 ) , j ≤ k S ^ o ( a ) exp ( − ∑ i = 1 j h ⌣ i exp { ( a i − a ) u β ^ } u β ^ [ exp { u β ^ ( t − a i ) } − 1 ] ) if u ≠ 0 , t ∈ [ a j , b j ) , j ≤ k .

Write S ⌣ Y ∗ | U ( t | u ) = S ( t , u , S ⌣ o , β ^ ) . At the discontinuous points t of S ^ o ,

S ⌣ Y ∗ | U ( t | u ) = S ^ o ( a ) exp ( − ∑ i = 1 j h ⌣ i exp { ( a i − a ) u β ^ } u β ^ [ exp { u β ^ ( b i − a i ) } − 1 ] ) ≈ S ^ o ( a ) exp ( − ∑ i = 1 j h ⌣ i exp { ( b i − a ) u β ^ } ( b i − a i ) ) = S ^ o ( a ) ∏ i = 1 j ( S ^ o ( b i ) S ^ o ( b i − 1 ) ) exp { ( b i − a ) u β ^ } if u ≠ 0 , t = b j , j ≤ k ( by ( 8 ) ) .

Define S ^ Y ∗ | U ( t | u ) = ( S ^ o ( t ) if t < b 1 S ^ o ( a ) ∏ i = 1 j ( S ⌣ o ( b i ) S ⌣ o ( b i − 1 ) ) exp { ( b i − a ) u β ^ } if t ∈ [ b j , b j + 1 ) . It

leads to (7). S ^ Y ∗ | U ( t | u ) is simpler than S ⌣ Y ∗ | U ( t | u ) in implementation and they have the same asymptotic properties. Simulation results in section 3 suggest that S ^ Y ∗ is consistent under the selected PH model.

4) The case of other forms of g ( t ) or B ( ⋅ , g ( t ) ) . It can be shown that the estimator S ^ Y ∗ | U ( t | u ) in the previous three cases of g ( t ) are all in the form that it is a step function with discontinuous points b 1 , ⋯ , b m , which are as the same as those of S ^ o , and

S ^ Y ∗ | U ( t | u ) = ( S ^ o ( t ) if t < b 1 S ^ o ( a ) ∏ i = 1 j ( S ⌣ o ( b i ) S ⌣ o ( b i − 1 ) ) exp { B ( u , g ( b i ) ) β ^ } if t ∈ [ b j , b j + 1 ) , (10)

where b j ’s are defined in the case of g ( t ) = ( t − a ) 1 ( t ≥ a ) . It can be shown that if B ( u , g ( t ) ) is piece-wise continuous in t , then it also leads to a con- sistent estimator of S Y ∗ ( t ) .

The MD plot needs to know g ( t ) . One possible way to conjecture the form of the function g ( t ) for a time-dependent covariate is to extend the PWPH plotting method in Wong et al. [

#Math_370# (11)

where S ^ 1 ( t ) = ∑ i = 1 n 1 ( Y i > t , | Z i − z 1 | < r n − 1 / ( 3 p ) ) ∑ i = 1 n 1 ( | Z i − z 1 | < r n − 1 / ( 3 p ) ) (12)

r is a positive constant and z 1 belongs to the support of Z . For instance, for a PWPH model defined in (6),

ln S ^ 1 ( t ) = ( exp ( β ^ 1 u ) ln S ^ o ( t ) if t < a ln ( S ^ o ( a ) ) exp ( β ^ 1 u ) − exp ( β ^ 2 v ) + exp ( β ^ 2 v ) ln S ^ o ( t ) if t ≥ a ,

which corresponds to two lines: y = b 1 x and y = a 2 + b 2 x , and the cut-points can be determined from the PWPH plot (see

Let b 1 < ⋯ < b m be all the distinct exact observations. In view of the ex- pression in (7) under the PH model with g ( t ) = ( t − a ) 1 ( t ≥ a ) , if b i ≥ a , then

ln S Y ∗ | U ( b i + 1 | z ) S Y ∗ | U ( b i | z ) ≈ exp { ( b i + 1 − a ) z β } ln S o ( b i + 1 ) S o ( b i ) ,

ln S ^ 1 ( b i + 1 ) S ^ 1 ( b i ) ≈ exp { ( b i + 1 − a ) z 1 β ^ } ln S ^ o ( b i + 1 ) S ^ o ( b i ) ,

G ^ i = ln [ { ln S ^ 1 ( b i + 1 ) S ^ 1 ( b i ) } / { ln S ^ o ( b i + 1 ) S ^ o ( b i ) } ] ( = 0 if b i < a ≈ ( b i + 1 − a ) z 1 β ^ if b i ≥ a ,

where S ^ 1 and z 1 are given in (11) and (12). Thus another diagnostic plotting method is

toplot ( b i , G i ) , for i ∈ { 1 , ⋯ , m } (13)

and check whether it appears as a two-piecewise-linear curve: one is y = 0 and another one is y = ( x − a ) b . If so, it is likely that g ( t ) = ( t − a ) 1 ( t ≥ a ) , where a is the intersection of the two line segments. We shall call the plotting method the LDPH plot, as g ( t ) = ( t − a ) 1 ( t ≥ a ) corresponding to the LDPH model. The advantage of the PWPH plot and the LDPH plot is that they provide clues on the cut-points, which are needed in the MD plot, unless the cut-point is given.

Remark 4. If the cut-points in the PWPH or TDPH model vary from observation to observation, then the PWPH plot as in (11) and LDPH plot as in (13) do not work. However the cut-points a i are also observations in such cases, in addition to ( Y i , U i ) ’s (in the case of complete data). Thus we do not need to guess the cut-points, and one can replace S ^ j in (12) by

S ^ j ( t ) = ∑ i = 1 n 1 ( Y i > t , | U i − z j | < r 1 n − 1 / 3 , | a i − a | < r 2 n − 1 / ( 3 p ) ) ∑ l = 1 n 1 ( | U l − z j | < r 1 n − 1 / 3 , | a l − a | < r 2 n − 1 / ( 3 p ) ) ,

where a is a predetermined reference cut-point and r 1 and r 2 are positive constants.

In order to compare the MD plot to the other plots, we present three sets of simulation results in § 3.1, § 3.2 and § 3.3, respectively. The mode is 0 as assumed in AS1. S o can be uniform or exponential distributions. The covariate can be discrete or continuous. The data do not need to be from a PH model. There is no unit in time t due to the simulation.

Two random samples of n = 30 and n = 300 pseudo random numbers Y i are generated from U(0,1) distribution. For each Y i = y i , generate Z i from U ( y i − 1 , y i ) with probability 0.5 and from U ( 3 , 4 ) with probability 0.5. These ( Y i , Z i ) satisfy the assumptions given in Example 1. Let W i = 1 ( Z i ≥ 3 ) . Then the family of distributions { S Y 1 | Z 1 ( ⋅ | z ) : z ∈ ( − 1 , 1 ) ∪ ( 3 , 4 ) } does not satisfy the null hypothesis H 0 : h ( t | z ) = h o ( t ) exp ( β z ) , but it can be shown that { S Y 1 | W 1 ( ⋅ | z ) : z ∈ { 0 , 1 } } does.

The sample of size n = 300 is only used for the MD plots in panels (1,3) and (3,3) of

Since S Y | Z does not satisfy the PH model, a proper estimate of S Y ∗ is expected to deviate from S ^ Y . It is seen from the two MD plots with data ( Y i , Z i ) ’s in panels (1,2) and (1,3) of

Since S Y | W satisfies the PH model, a proper estimate of S Y ∗ should be close to S ^ Y . It is seen from both of the MD plot with data ( Y i , W i ) ’s in panels (3,2) and (3,3) of

To show that the consistent estimator of S o is the key in the MD approach, we

present in panels (1,1) and (3,1) of

In panels (2,1) and (4,1) of

In panels (2,2) and (4,2) of

We also carried out simulation study on the testing H 0 : h ( t | z ) = h o ( t ) exp ( β ′ z ) with data ( Y i , Z i ) ’s and using the residual test in the existing R package. Our simulation study suggests that for n = 50 , 100 or 200 and with a replication of 5000, the residual test does not reject the incorrect H 0 for more than 93 % of the time. Thus, it is not surprised that the residual plots do not work well.

A sample of complete data with n = 300 is generated from the TIPH model: h ( t | z ) = exp ( β z 2 ) h o ( t ) , where β = 1 , h o ( t ) = 1 and Z ∼ Norm ( 0 , 1 ) . Panels (1,1), (1,2), (1,3) and (3,1) in

The second sample of complete data with size n = 300 is generated from the model , h ( t | z ) = exp ( β z ) h o ( t ) , where β = 1 , Z ∼ Norm ( 0 , 1 ) and h o ( t ) = 1 , t > 0 . Panels (2,1), (2,2), (2,3) and (4,1) in

Panels (1,1) and (2.1) in

not easy to distinguish the mis-specified model from the correct one by this pair of residual plots.

Similarly, in

In fact, with the data from the first mis-specified model and with a moderate sample sizes n ≥ 50 , our simulation study with a replication of 5000 suggests that the residual test in the existing R package (e.g., cox .zph ( ) ) would not reject the mis-specified TIPH model for more than 70 % of the time. Thus it is not surprised that the residual plots cannot detect the mis-specified TIPH model.

The MD plot in panel (1,3) fits the mis-specified TIPH model with data from the first sample. It successfully identifies that the functional form of the covariates Z is mis-specified for the first data set, as S ^ Y ∗ is almost totally outside the 95 % confidence band of S ^ Y . In other words, the MD approach suggests that the first data set does not follow the PH model h ( t | z ) = exp ( β z ) h o ( t ) . On the other hand, the MD plot in panel (2,3) successfully identifies that the functional form of the covariates Z is correct for the second data set, as S ^ Y ∗ is totally inside the 95 % confidence band of S ^ Y .

Based on the second sample, the modified PWPH plot and the LDPH plot are displayed in panels (3,2) and (4,2); the MD plots under the PWPH and LDPH Models are displayed in panels (3,3) and (4,3). The PWPH plot in panel (3,2) suggests that the data are either from a TIPH Model, or from a PWPH model with one cut-point at a ≈ 0.7 . The MPLE of the regression coefficient under the TIPH Model is β ^ = 1.26 and the MPLE of the regression coefficients under the PWPH model with k = 1 is ( β ^ 1 , β ^ 2 ) = ( 1.30 , 1.20 ) with S E = ( 0.170 , 0.136 ) . Both β ^ 1 and β ^ 2 are not significantly different from β ^ , as their differences from β ^ are within two SEs. Both MD plots in panels (2,3) and (3,3) suggest that the PWPH model with at most one cut-point fits the data, as expected, as both curves of S ^ Y ∗ are totally inside the 95 % confidence band of S ^ Y .

The LDPH plot in panel (4,2) suggests that the LDPH Model may fit the data, but it is seen from panel (4,3) that even within the interval [0,1], only less than 30% of the curve of S ^ Y ∗ lies inside the confidence band of S ^ Y . Thus the MD plot suggests that the data are not from the LDPH Model, as expected. It is seen that the LDPH plot performs not as good as the MD plot.

A sample of n = 300 right-censored data is generated under the LDPH Model, where h ( t | z ) = exp ( β u g ( t ) ) h o ( t ) , g ( t ) = ( t − a ) 1 ( t ≥ a ) , a = 0.2 , and h o ( t ) = e − t . U has a Poisson distribution with mean 1, and β = 10 . It is subject to right censoring and the right censoring variable C ∼ U ( 1 , 2 ) .

The modified PWPH plots and the LDPH plot are given in panels (1,1), (2,1) and (3,1) in

Both the PWPH plot and the MD plots with corresponding qqplot in panels (1,1), (1,2) and (1,3) suggest that the data are not from the TIPH Model. We

need the information from the PWPH plot to decide the cut-point needed in the MD plots. The PWPH plot in panel (2,1) suggests that the data may be from a PWPH model with a cut-point a satisfying − ln S ( a | 0 ) ≈ 0.05 , that is, a ≈ 0.1 . However, the MD plot in panel (2,2) suggests that the data are not from the PWPH model. This again indicates that the MD plot performs better than the modified PWPH plot.

The LDPH plot in panel (3,1) in

A common situation that will involve the use of the PH model is a long-term clinical follow-up study. In such a study, the impact of a prognostic variable may change at different time periods. This is the case in the breast cancer data analyzed in Wong et al. [

One objective of the study is to investigate whether tumor diameter is significant in predicting early or late relapse. Then the relapse time Y is the response and the tumor diameter Z is the covariate. Clinical consideration and survival plots suggest late failure can be considered at time greater than 5 years from initial breast cancer surgery. Data analysis based on a PWPH model with two cut-points at 2 years and 5 years with covariate Z is carried out in Wong et al. [

We apply our data to the TIPH model with covariate Z (which is basically continuous), the regression coefficient is β ^ = 0.35 and is significant. We shall only present our new methods for the case that the tumor diameter Z is continuous in panel (2,1) of

Wong et al. [

to Z > 2 cm or Z ≤ 2 cm . Then the covariate is discrete, and the standard log-log plots, as well as the PWPH plot proposed by Wong et al. [

Our simulation results and the data analysis suggest that the MD plot has certain advantages over the existing residual plots, especially when the null hypothesis H 0 in (2) is mis-specified or the data are not from any PH model. Our MD plot does not involve residuals studied in the literature, and this is the first difference between the residual approaches and the MD approach.

The MD approach is closely related to H 0 in (2) with Z ( t ) = B ( U , g ( t ) ) , where the parameter β is unknown but B ( ⋅ , g ( t ) ) is given. The MD plot is applicable to all PH models with Z ( t ) = B ( U , g ( t ) ) and with all types of covariates U , provided that B ( ⋅ , g ( t ) ) is given. The assumption of a given B ( ⋅ , g ( t ) ) may be viewed as a drawback of the MD plot if one wants to find a certain PH model to fit the data. There are several ways to overcome this drawback.

1) For the case that B ( u , g ( t ) ) = u g ( t ) , we also propose a modified PWPH plot and LDPH plot in § 2.4 for inferring the functional form of g ( t ) .

2) One can apply the MD approach to several possible typical models. For instance, in § 3.2 and § 3.3, given a data set, the MD approach finds which of the 3 semi-parametric PH models fits.

3) Of course, one can also make use of the existing residual approaches in the literature for guessing g ( t ) . There is no harm to inspect all diagnostic plots available based on the data.

On the other hand, the MD plot can further check the validity of the function forms suggested by the existing residual plots. As illustrated in the paper, with the help of the confidence band of the KME S ^ Y , it is more reliable and more informative than the residual plots on whether the model suggested by the residual plots is appropriate for the data. Thus the MD plot is a nice complement to the existing diagnostic methods, not a replacement to them.

As seen from the simulation results, when the data are not from any PH model (see Example 1 in § 3.1) or H 0 is mis-specified (see simulation on the TIPH Model in § 3.2), the MD plots can successfully reject H 0 , provided that n is large enough ( n ≥ 30 ) . Thus the MD plot can play a role of a model checking test ϕ m d , that is,

( reject H 0 if S ^ Y ∗ liesinsidetheconfidenceband ( case ( I ) ) donotreject H 0 if S ^ Y ∗ liesoutsidetheconfidenceband ( case ( I I ) ) inconclusive otherwise ( case ( I I I ) ) .

On the contrary, in such case, our simulation results suggest that the residual plots often cannot detect that H 0 is incorrect and the residual tests often do not reject H 0 . There must be some reasons.

As summarized in Therneau and Grambsch [

Remark 5. In summary, if the existing residual tests reject H 0 , the decision is likely to be correct. Otherwise, we have no confidence to believe that the test is valid. In this regard, it is interesting to point out that the parameter space for the aforementioned “test” induced by the MD plot is actually Θ . Thus the MD approach is valid even if F Y , U ∉ Θ r . It can also detect the incorrect model assumption Θ r by testing the new null hypothesis H 0 : F Y , U ∈ Θ r when n is not too small, but the residual approach cannot accomplish this task. This is the second _ difference between the MD approach and the residual approach. In application, if H 0 is not rejected, the residual tests often make type II error, and we strongly suggest to consider the MD approach then.

Of course, very often, neither case (I) nor (II) happens, then we need some more rigorous model checking tests. The difference of S ^ Y and S ^ Y ∗ is a natural choice for a test statistic, and it is one of our on-going research. The other is to extend the MD approach to other common regression models, such as the linear regression model and the generalized additive model.

We thank the Editor and the referee for their comments.

Yu, Q.Q., Dong, J.Y. and Wong, G. (2017) Marginal Distribution Plots for Proportional Hazards Models with Time-Dependent Covariates or Time- Varying Regression Coefficients. Open Journal of Statistics, 7, 92-111. https://doi.org/10.4236/ojs.2017.71008