Survival Model Inference Using Functions of Brownian Motion

A family of tests for the presence of regression effect under proportional and non-proportional hazards models is described. The non-proportional hazards model, although not completely general, is very broad and includes a large number of possibilities. In the absence of restrictions, the regression coefficient,   t  , can be any real function of time. When   , t    we recover the proportional hazards model which can then be taken as a special case of a non-proportional hazards model. We study tests of the null hypothesis;   0 : H t  0  for all t against alternatives such as; or for some t. In contrast to now classical approaches based on partial likelihood and martingale theory, the development here is based on Brownian motion, Donsker’s theorem and theorems from O’Quigley [1] and Xu and O’Quigley [2]. The usual partial likelihood score test arises as a special case. Large sample theory follows without special arguments, such as the martingale central limit theorem, and is relatively straightforward.     : d H t F t   1  0 0   1 : H t  


Background
The complex nature of data arising in the context of survival studies is such that it is common to make use of a multivariate regression model.Cox's semi-parametric proportional hazards model [3] has enjoyed wide use in view of its broad applicability.The model makes the key assumption that the regression coefficients do not change with time and much study has gone into investigating and correcting for potential departures from these assumptions [4][5][6][7][8][9][10].Sometimes we can anticipate in advance that the proportional hazards model may be too restrictive.The example which gave rise to our own interest in this question concerned 2174 breast cancer patients, followed over a period of 15 years at the Institut Curie in Paris, France.For these data, as well as a number of other studies in breast cancer, the presence of non-proportional hazards effects has been observed by several authors.Often this is ignored but this can seriously impact inferences.
The model used to make inferences will then often differ from that which can be assumed to have generated the observations.In situations of non proportional hazards, unless dealing with very large data sets relative to the number of studied covariates, it will often not be feasible to study the whole, possibly of infinite dimension,   t  .
Xu and O'Quigley [11] argue that an estimate of average effect can be used in a preliminary analysis of a data set with time varying regression effects.For a given sample, a single average effect can be estimated more accurately (and more easily) than the whole .Xu and O'Quigley [11] derive an estimate   t    of an average regression effect   .They provide an interpretation of   as a population average effect.It is approximated by under certain conditions, where F is the marginal distribution function of the failure time random variable T. The purpose here is to test the null hypothesis;   0 : H t  0  for all t against the alternatives for some t.The development is based on Brownian motion, Donsker's theorem and theorems from O'Quigley [1] and Xu and O'Quigley [2].We show that the usual partial likelyhood score test arises as a special case.Large sample theory is straightforward.

Notation
The probability structure, although quite simple, is not the immediate one which would come to mind.The ran-dom variables of interest are the failure time, i , the censoring time, and the possibly time dependent covariate, , We view these as a random sample from the distribution of T, C and   Z  .It will not be particularly restrictive and is helpful to our development to assume that T and C have support on some finite interval.The time-dependent covariate   Z  is assumed to be a predictable stochastic process and, for ease of exposition, taken to be of dimension one whenever possible.Let     The counting process i is defined as, and we also define . The inverse function, responds to the value j t where It is of notational convenience to define 1 , in words a continuous function equal to zero apart from at the observed failures in which it assumes the covariate value of the subject that fails.The number of observed failures k is given by k N   If there are ties in the data our suggestion is to split them randomly although there are a number of other suggested ways of dealing with ties.All of the techniques described here require only superficial modification in order to accommodate any of these other approaches for dealing with ties.

Models
Insight is helped when we group the models together under as general a heading a possible.The most general model is then the non proportional hazards model written, where   O'Quigley and Stare [12] introduced the name "partially proportional hazards models" to describe models in which at least one component of the function   t  is constrained to be constant.Such models can be shown to include the stratified proportional hazards model [13] whereby; as well as random effects models [12].

Model Based Probabilities
The probability structure of the model, needed in our development, is described in O'Quigley [1].We recall the main results in this section.Most often time is viewed as a set of indices to certain stochastic processes, so that, for example, we consider   Z t to be a random variable having different distributions for different t.Also, the failure time variable T can be viewed as a non-negative random variable with distribution   F t and, whenever the set of indices t to the stochastic process coincide with the support for T, then, not only can we talk about the random variables    , the product of the 's over the observed failure times gives the partial likelihood [3].When π 0   , i is the empirical distribution that assigns equal weight to each sample subject in the risk set.Based upon the Definition 3. In order to distinguish conditionally independent censoring from independent censoring we define   Note that when censoring does not depend upon z then and, for the case of an independent censoring mechanism, For small samples it will be unrealistic to hope to obtain reliable estimates of   t  for all of t so that, often, we take an estimate of some summary measure, in particular  .It is in fact possible to estimate  without estimating   t  [11] although the usual partial likelyhood estimate does not accomplish this.In fact the partial likelihood estimate turns out to be equivalent to obtaining the solution of an estimating equation based upon   H t and using   Ĥ t as an estimate whereas, to consistently estimate  , it is necessary to work with some consistent estimate of   F t , in particular the Kaplan-Meier estimate.We firstly need some definition of what is being estimated when the data are generated by model (1.1) and we are working with model (1.2).This is contained in the following definition for   .Definition 4. Let   be the constant value satisfying The definition enables us to make sense out of using estimates based on (1. and 3) for general situations in [2,11] and, in the light of the foregoing, we can take these as estimates of  .We also have the further two corollaries to Therorem : Further n under the model, if more, agai we let once Theorem 1 and its corollaries provide the ingredients

Important Empirical Processes
i [14]; necessary to a construction from which several tests can be derived.
Consider the partial scores introduced by We was interested in goodness of fit for the Wei pr two group oblem and based a test on  , large values indicating departures away fro nal hazards in the direction of non proportional hazards.Considerable exploration of this idea, and substantial generalization via the use of martingale based residuals, has been carried out by Lin, Wei and Ying [15].These investigations m proportio showed that we could work with a much broader class of statistics that those based on the score so that a wide choice of functions, potentially describing different kinds of departures from the model, are available.Apart from the two group case, limiting distributions are complicated and usually approximated via simulation.Although the driving idea is that of goodness of fit, the same techniques can be applied to testing for the presence of regression effects against a null hypothesis that   0. t   Furthermore, working directly with the incremen process rather than the process itself, we can derive related processes for which the limiting distributions are available analytically.From the previous section the increments of the process as being indepe ,17].Thus only the existence of the variance is necessary in order to carry out appropriate standardization and to be able to appeal to the functional central limit theorem.We can then treat our observed process as though arising from a Brownian motion process.Simple calculations allow us to also work with the Brownian bridge, integrated Brownian motion and reflected Brownian motion, processes which will be useful under particular alternative to the model specified under the null hypothesis.Consider the process . This process is only defi d points o ned on k equispace f the interval (0, 1] but we extend our definition to the whole interval via linear interpolation so that, for u in the interval j k to   As n goes to infinity, under the usual C rowley conditions, then we have that, for each j converges in distribution to pr an zero and variance equal to a Gaussian ocess with me j k .This follows directly from Donsker's theorem.

Repl ng aci
 by a consistent estimate leaves asymptotic properties unaltered.


We choose used to construct different tests the * symbol to indicate some kind of standardization as opposed to the non standardized U.The variance and the number of failure points are used to carry out the standardization.Added flexibility in test construction can be achieved by using the two parameters,  and  , rather than a sin- gle parameter  .In practi these a replaced by quantities which are either fixed or estimated under some hypothesis.For goodness of fit procedures which we consider later we will only use a single parameter, typically ˆce re  .Goodness of fit tests are most usefully viewed as ts of hypotheses of the form 0 :  under some hypothesized  ) as though it were Brownian motion.From Corollary 3 we have that simple application of Slutsky's theorem.A further application of Slutsky's theorem, together with theorems of Cox and Andersen and Gill [16,17] provide that the increments , , as k becomes large.Apart from the necessity for the ex-Replaci istence of the third moment of Z we also require that, as k increases, the fluctuations of the process   ˆ, U u  between successive failures become sufficien ll in probability, the so called tightness of the process [18].We can assume this holds in real applications.We then conclude from Donsker's theorem that verges in distribution to Brownian motion.ng  by a consistent estimate leaves asymptotic propert unaltered.Suppose that the assumption of zero effect, i.e.,   0 t   is incorrect, and, in particular, that ies is a changing monotonic function of time thout losing generality we will assume this monotonicity to be an increasing one.Now, at each time point t, instead of subtracting off The variance is also impacted but the varian lways positive (thereby impacting the average size but not the average sign of the increments).So, we will observe a positive trend in the standardized residuals, the early ones tending to be too large and the later ones tending to be too large also but negatively.A good model for this, at least as a first approximation, would be Brownian motion with drift [1].For our purposes we note that, moving away from the model of zero regression effects, we anticipate observing some trend rather than zero mean Brownian motion.For decreasing ce is a changes over the time period in question in a way that s no obvious pattern or trend.We would not expect to be able to detect such behavior and the power of the test procedures would be low.We would most likely conclude that there is no effect, a conclusion that, even though not correct, would be reasonable, at least as an approximation.

Models
e Brownia tion extend immediately to the case of non proportional hazards and partially proportional hazards models.The generalization of Equation ( 1) is natural and would lead to an unstandardized score; the nu  and, as nder ll hypothesis that  will be a sum of zero mean random variable e of possible alternative hypotheses is large and, mostly, we will not wish to consider anything too complex.Often the alternative hypothesis will specify an ordering, or a non zero value, for just one of the components of a vector values s.The rang   t  .In the exact same way as in the previous section, all of the calculations lean upon the main theorem and its corollaries.The increments of the process . This process ca cover le interval n be made to the who (0, 1] continuously by interpolating in exactly the same way as in the previous section.For this process we reach the same conclusion, i.e., that as n goes to infinity, under the usual Breslow and Crowley conditions [17], then we have that, for each j ( 1, , 1 The issue is that of having consistent estimates, which for an infinite dimensional unrestricted parameter we can not achieve.The solution is simply to either restrict these functions or to work with the stratified models in which we do not need to estimate them.Subsection e process are asymptotically independent we will treat Several tests of poin based on the theory of the previous section.These tests can also be used to construct test based confidence intervals of parameter estimates, obtained as solutions to an estimating equation.Among these tests are the following.Nonetheless there may be situations in which we may opt to take a value of t less than one.If we know for instance that, under both the null and the alternative we can exclude the possibility of effects being persistent beyond some time  say, i.e., the hazard ratios beyond that point should be one or very close to that, then we will achieve greater power by taking t to be less than one, specifically some value around  .A confidence interval for 0  can be obtained using normal approximations or by constructing the interval    Again we can appeal to known results for some well known functions of Brownian motion.In particular e; we hav

Distance Travelled at
Under the null and proportional hazards alternatives this test, as opposed to the usual score test, would lose po Since we are viewing the process as though i n con-* Pr , , supU t t z      wer comparable to carrying out a two sided rather than a one-sided test.Under non-proportional hazards alternatives this test could be of use, an extreme example being crossing hazards where the usual score test may have power close to zero.As the absolute value of the hazard ratio increases so would the maximum distance from the origin.

Brownian Bridge
The process conve in distribution to the Brownian in particular, for la

  
The Brownian bridge, also referred to as tied down Brownian motion for the obvious reason that at 0 t  more an l to s. d 1 t  the process takes the value 0, will not be particularly useful for carrying out a test at 1 t  .It is usefu consider, as a test statistic, the greatest distance of the bridged process from the time axi We can then appeal to; Lemma 5.
which follows as a large sample result since; . It is easily shown that the reflected process A judicious choice of the point of reflection would result in a test sta ontinues to increa tistic that c se unde gin test m r such a ig n alternative so that a distance from the oriht have reasonable power.In practice we may not have any ideas on a potential point of reflection.We could then consider trying a whole class of points of reflection and choosing that point which results in the greatest test statistic.A bound for a supremum type test can be derived by applying the results of Davies [19,20].Under the alternative hypothesis we could imagine increments of the same sign being added together until the value r is reached, at which point the sign of the increments changes.Under the alternative hypothesis the ab-solute value of the increments is strictly greater than zero.Under the null, r is not defined and, following the usual standardization, this set up fits in with that of Davies [19,20].We can define r  to be the time point satisfying A two-sided test can then be based on the statistic where Ф denotes the cumulative normal distributio tion, and where   imate provides a consistent estimate of the true value.This sampling strategy is investigated in related work by O'Quigley and Natarajan [23].
Davies also suggests an approximation in which the autocorrelation is not needed.This may be written down as where Turning points only occur at the k distinct times failure tent with that of the next and, to keep the notation consis section, it suffices to take   as being located half way between adjacent failures, 1 0   and 1 k   any value greater than the largest failure time.We would though require differe rocedures for this.

Partial Likelihood Score Test
Supp nt inferential p ose that we wish to test 0 : 0 we choose to work with U   0, 0,  t .In the readily se sample nul t statistics light of Slutsky's theorem it is en that the large l distributions of the two tes are the same.Next, instead of standardizing by   X we take a simple average of such quantities, over the observed failures.To see this, note tha t V is defined as 2), we replace by then the distance from origin test at tim coinc ex 0 V ides is e 1 t  hara actly with the partial likelihood score test.Indeed th observation could be used to construct a c cterization of the partial likelihood score test.In epidemiological applications it is often assumed that the conditional variance,

 
V Z t  is constant through time.Otherwise time independence is often a good approximation to the true situation nd ives further motivation to the partial likelihood test.

Multivariate Model a g
In practice it is the multivariate setting that we are most existence of effects in the interested in; testing for the presence of related covariates, or possibly testing the combined effects of several covariates.In this work we give very little specific attention to the multivariate setting, not because we do not feel it to be important but because the univariate extensions are almost always rather obvious and the main concepts come through more clearly in the relatively notationally uncluttered univariate case.Nonetheless, some thought is on occasion required.The basic theorem giving a consistent estimate of the distribution of the covariate at each time point t applies equally well when the covariate   Z t is multidimensional.Everything follows through in the same way and there is no need for additional theor ms.
where the subscript "1" denotes the first co the vector.The interesting thing is that mponent of E  does not require any such additional notation, depending only on the joint As for the univariate case we can work with any function of the random vector Z, the expectation of the function being readily estimated by an application of an immediate generalization of Corollary 3. Note that the process we are constructing is not the same one that we would obtain were we to simply work with only 1 Z .This is because the  involve a univariate Z in the former case and a multivariate Z in the latter.The increments of the p As before, these increments can be t de-aken to be in at only th e varianc pendent [16,17] so th e existence of th e is necessary to be able to appeal to the functional central limit theorem.This observed process will also be treated as though arising from a Brownian motion process.The same calculations as above allow us to also work with the Brownian bridge, integrated Brownian motion and reflected Brownian motion.Our development is entirely analogous to that for the univariate case and we consider now the process . This process is only defined o k equispaced points of the interval (0, 1] and, again, we our definiti n extend on to the whole interval so that, for As n goes to infinity, under the usual Breslow and Crowley conditions [17], then we have that, for each j Another situation of interest in the multivariate setting is one where we wish to test simultaneously for more than one effect.This situation can come under one of two headings.The first, analogous to an analysis of variance, is where we wish to see if there exists any effect without being particularly concerned about which component or components of the vector Z may be causing the effect.As for an analysis of variance if we reject the global null we would probably wish to investigate further to determine which of the components appears to be the cause.The second is where we use, for the sake of argument, two covariates to represent a single entity, for instance 3 levels of treatment.Testing for whether or not treatment has an impact would require us to simultaneously consider the two covariates defining the groups.We would then consider, for a two variable model,   Z t is a vector with components

An Example
The classical example studied in Cox's fa 1  per [3] concerning t trial in leukemia has and we reconsider those data in the light of the work here.Our development sidesteps the issue of ties, a problem of sufficient importance for the Freireich data that it warranted lengthy discussion in Cox's paper.Here we simply used a random split, although all of the suggested approaches for dealing with ties (see for example Kalbfleisch and Prentice [13] are accommodated without any additional difficulty.The distance from the origin at the maximum follow up time was equal to 3.92 (p < 0.001), a result which is to be anticipated since the proportional hazards assumption is known to be a good one for these data, and effects are strong.The partial likelyhood test produced the figure 3.94, confirming, at least here, the agreement we would expect given that the estimated conditional variance of the covariate given time is fairly constant with time itself.If the process is reflected at the time point t = 12, roughly the marginal median, then we obtain the value 0.58 which is what we might well expect.On the other hand if we use the reflected process, maximized across all potential times, then we obtain, again, a p-value less than 0.001 suggesting that little has been sacrificed by this more general approach even when the proportional hazards assumption appears well founded.
Proofs of the above lem increments of the process are asymptotically independent we can treat any point b contained in the interval a test of the null hypothesis, 0 : we wish to consider values of t less one, we may have knowledge of some  of interest.Otherwise we might want to consider several possible values of  .Control on Type I error will be lost unless specific account is made of the multiplicity of tests.One simple way to address this issue is to consider the maximum value achieved by the process during the interval   0, .
these pairs, an empirical, i.e. product moment, correla- For this case we would expect, again, the procedure to work well.The second case of interest is where We

Time t
At time t, under the null hypothesis that The result is the same and it is no doubt easier to work out all expectations with the respect to the joint distribution.Let us then write; vector, some collection of compon ts of Z and, once we see the basic idea, it is clear wha o do even though the notation starts to become slightly cumbersome.As for the notation, effects of the other covariates.The somewhat imprecise notion "having accounted for" is made precise in the context of a model.It is not of course the same test as that based on a model with only 1 * U α