Adaptive Conditional Hazard Regression Modeling of Multiple Event Times ()
1. Introduction
Cox regression, also called proportional hazard regression, is the commonly used approach for modeling single survival time, failure time, or time-to-event data (see [1] for an extensive overview). The hazard function is assumed to equal a product of an unspecified function of time and the exponent of a linear function of available covariates with estimation addressing only the slope parameters for the covariates while ignoring the underlying dependence of the hazard function on time. Extensions of Cox regression have been developed to handle multiple event times. For the case of recurrent events, Andersen and Gill [2] model the hazard rate (or intensity function) in terms of possibly time-varying covariates while Lin et al. [3] model proportional mean and rate functions for the underlying counting process in terms of possibly time-varying covariates. For the more general case of multiple event times, Prentice et al. [4] use marginal Cox regression models, one for each possible event, to model possibly time-varying covariates. Wei et al. [5] propose an alternative marginal Cox regression approach. What these approaches have in common is that covariates have multiplicative effects while general dependence on time is left un-estimated. The purpose of this article is to formulate and demonstrate an adaptive approach for conditional hazard regression modeling of multiple event times data based on fractional polynomials [6] [7] using an extension of the approach of Knafl [8] for modeling single event time data. This approach estimates the dependence of the conditional hazard rate on current time values, prior time values, and event time order as well as on possibly time-varying covariates.
2. Methods
2.1. Multiple Event Times
Let
for
denote M continuous, possibly correlated random event time variables. Also, let
for
denote associated random indicators for whether
is right censored, that is, only a lower bound for
is observable (
), or whether
is not censored, that is, an actual value for
is observable (
). Furthermore, let
for
be random
covariate vectors consisting of random covariates
for
and
. This formulation allows for time-varying covariates. In practice, many covariates will be time-invariant with
for
. One natural time-varying covariate is event order with values equal to the event time index m.
For
, let
be the
random vector of measurements up to time m where
denotes the transpose of
. Also, let
,
, and
denote values associated with the random variables/vectors
,
, and
and use these to form the associated vector values
for
. Note that
is an actual value for
when
, that is,
, and a lower bound on
when
, that is,
.
For
, denote the conditional distribution function
For
where
and
are
zero-valued vectors so that
for time-invariant covariates with
, the dependence of
on
and
is simpler than in general since it depends only on the single value
. When
, the associated excess time probability function is
(generalizing the survival function for single time event data) and the associated conditional hazard rate function is
where
is the conditional density function for
. Integration gives
so that
and
A common density function f is assumed for
for all m (and so also a common distribution function F, a common excess time probability function S, a common conditional hazard rate function λ, and a common integrated conditional hazard rate function Λ), but different functions for different m may be more appropriate. The above formulation assumes that such differences are accounted for using the event order covariate with values equal to the event time index m. Also, the formulation assumes that the range of possible values for
is not constrained by the values for
. For the special case of increasing event times
(as for recurrent events [9] ), values for
are constrained to be larger than the value for
, but then consecutive differences
with
can be modeled instead using the above formulation.
2.2. Fractional Polynomial Conditional Hazard Rate Modeling
Fractional polynomials [10] [11] are based on power transforms of primary predictors u. For modeling the conditional hazard rate
, the primary predictors u are functions of
and the entries of the vectors
and
. The values for
are positive while the values for
are nonnegative. For most i, the entries
will be nonnegative so that it is reasonable to restrict
to be nonnegative for
and
. Under this assumption, power transforms of primary predictors u can be defined as
for real-valued powers q.
The fractional polynomial conditional hazard rate is defined as
where
is a
vector of individual terms
. Each term is a power transform of a single primary predictor or of a product of multiple power-transformed primary predictors (called a geometric combination, generalizing the standard interaction). The primary predictors for the fractional polynomial are a subset of
and the entries of
and
while
is a
vector of slope parameters
for
. An intercept parameter can be included in the model using the unit transform
as a primary predictor. This formulation requires that
be constrained to take on positive values as addressed later. This restriction could be resolved by modeling instead the natural log of
as a linear function in
(as in [2] [3] [4] [5] ), but then the integral
cannot always be expressed in closed form. The advantage of the above definition is that power transforms
are simply integrated with respect to
, which can also speed up computation of estimates for the parameter vector
. The possible forms for
are a power transform of
including the unit transform
, a power transform of any one entry of either
or
, a power-transformed geometric combination in
combined with one or more entries of
and/or
, and a power-transformed geometric combination in one or more entries of
and/or
but not in
. Consequently,
can be expressed as the product of a function of
and a function of
and
, that is,
where
is either the unit transform or depends on some subset of the entries of
and
but not on
. When
for all j, none of the
depend on
, and so the conditional hazard rate is constant in
and satisfies
Also,
equals the unit transform when
depends only on
. The power
is either the power for a single power transform of
or equals a product of a power
of
and a power
transforming a geometric combination containing the power
of
. In general,
can contain power transforms of the entries of
and
.
Integration gives
where
is the
vector with entries
Assume that
, so that
with
a positive function of time
. The second term
is nonnegative, but not identically 0, for example, when it equals a single indicator variable z. By definition,
is a decreasing function of
for each unique possible value of the entries of
and
, and so
must be an increasing function of
for functions of
and
determined by the fractional polynomial model.
Let
,
, and
. Partition
into
with entries
for
and associated slope parameter vector
as well as
with entries
for
depending only on
and
but not on
with associated slope parameter vector
. The conditional hazard function
then satisfies
so that
Assume for now that
is empty. For models based on a single transform of
,
can be guaranteed to be positive by restricting the associated slope
to be positive-valued. For models based on multiple transforms of
but not on
and
,
can generate complicated integrals
that are not increasing in
in general. Restricting all the associated slopes
for
to be positive-valued is a straightforward way to guarantee that
is increasing in
. More general models with
generate integrals
that need to be increasing in each unique choice for the terms
. This can be guaranteed in a straightforward way once again by restricting all the associated slopes
for
to be positive-valued.
On the other hand, when
is nonempty, the assumption of all positive-valued slope parameters would be too restrictive. For example, consider a model based on two primary predictors:
and a time-invariant indicator variable z for membership in one of two groups of study participants with additive effects to
and z. The term
models the dependence of the conditional hazard function on
in terms of
parameters while
provides for a shift in the conditional hazard function for the group coded as
compared to the conditional hazard function for the group coded as
. In general, this shift can be in the negative or the positive direction, and so the slope of
needs to be allowed to be negative. However, allowing for negative entries for
means that
and
can sometimes take on negative or zero values. See Section 2.3 for how to handle such cases.
Hazard ratios are naturally generated for models based on proportional hazard rates. Hazard ratios are not needed to assess a fractional polynomial model but can be generated if desired. For example, consider the simple fractional polynomial model
based on the current time value
, the previous time value
, and a single covariate z with values 1 and 2. The associated hazard ratio is given by
, that is, the conditional hazard rate for
divided by the conditional hazard rate for
. Note that in general this is a function of the current time
and the previous time
.
2.3. Maximum Likelihood Estimation
Let
denote indexes for n study participants who provide multiple event time data consisting of observations
for
. This allows for missing data, but just for large time indexes
. This is the kind of missing data occurring in the example data set of Section 3. More complex missing data can be readily handled if desired (see [11] ). The total number of measurements is thus
All times
for all participants s can be uncensored, but not all of them should be censored.
Let
be the observed vectors determined by some fractional polynomial model. For each
and
, the conditional likelihood terms
satisfy
so that the associated log-likelihood term
satisfies
The likelihood
equals the product over
and
of the likelihood terms
with associated log-likelihood
The maximum likelihood estimate
of
maximizes the log-likelihood
over
, which is achieved by solving the estimating equations
For simplicity of notation, parameter estimates
are denoted as functions of the index set S for the observed data used in their computation without hat (^) symbols.
The gradient vector
has entries satisfying
for
. The Hessian matrix
has entries satisfying
for
. The gradient vector and Hessian matrix can be used in a Newton-Raphson algorithm to compute the maximum likelihood estimate
.
When
or
for some s, set
for some small positive value
like 0.00001 and also set
However, leave the gradient vector and Hessian matrix unchanged so that maximum likelihood estimation uses actual derivatives with respect to the parameter vector
.
Models based on a single transform with
can be computed directly. In this case,
and the gradient is a scalar value satisfying
so that
The covariance matrix for the parameter estimate vector
can be estimated in two ways: the model-based estimate treats the assumed model as the true model while the robust empirical (or sandwich) estimate allows for the true model to be different from the assumed model. The model-based estimate of the covariance matrix for
equals the
matrix satisfying
that is, the negative of the inverse of the Hessian matrix evaluated at the estimated parameter vector
. The robust empirical estimate equals the
matrix satisfying
where
is the
matrix with entries
for
,
, and
satisfying
Note that the gradient vector
equals the
vector generated
by summing the rows of
. The diagonal entries of
and
are estimates of variances for the parameter estimates corresponding to the entries of
, and so associated standard errors equal the square roots of these diagonal entries. These standard errors can be used to compute tests for zero individual model parameters to use in assessing specific theoretically important models. However, tests for parameters of adaptively generated models are typically significant as a consequence of the adaptive modeling process (as summarized in Section 2.5). Consequently, results for such tests are not reported for adaptive models generated in example analyses, but are reported for theory-based models.
2.4. Likelihood Cross-Validation
Partition the index set S into
disjoint sets
, called folds, for
. Note that all
measurements
for a given index s are allocated to the same fold. The LCV score for some model is defined as
In other words, evaluate the likelihood for each fold
using the parameter vector
computed by maximizing the likelihood for the complement
of the fold, normalize it by the total number of measurements
, and multiply these normalized deleted fold likelihoods together to get the LCV score.
When one or more slopes
for some
are generated as part of the estimation computations, reset
for some large value
such as 700 unless
is already smaller than
. For a model with parameter vector estimate
having one or more slopes
for some
, reset the LCV score to a very small value
such as 10−12. These adjustments guarantee that the adaptive modeling process of Section 2.5 generates models with acceptable parameter vector estimates
.
A larger LCV score indicates a better model, but not necessarily a distinctly (substantially, significantly) better model. A
-based LCV ratio test, analogous to a likelihood ratio test, can be used to decide if there is a distinct improvement or not in the LCV score. Following [7] , these tests are expressed in terms of the percent decrease (PD) in the LCV score for the model with the smaller score compared to the model with the larger score and a cutoff for a distinct PD, changing with the sample size. A PD larger than the cutoff indicates that the model with the larger score distinctly (substantially, significantly) improves on the model with the smaller score. Otherwise, the model with the smaller score is a competitive alternative and, if also simpler, it is a parsimonious, competitive alternative and so a preferable choice. Examples of LCV ratio tests are provided in Section 3.
2.5. Adaptive Model Selection
An effective choice of a fractional polynomial conditional hazard rate model based on a subset of the primary predictors
and the coordinates of
and
is identified adaptively using a heuristic search process controlled by tolerance parameters indicating how much of a change in the LCV score is allowable at each stage of the process. Starting from a base conditional hazard rate model, which is usually the constant model, power transforms and also geometric combinations if requested are systematically added to the model. Then, the expanded model is contracted, removing extraneous transforms if any and adjusting the powers of the remaining transforms. LCV ratio tests are used to decide whether to stop the contraction or continue removing transforms. Only a brief overview of the adaptive modeling process is provided here; details are provided in [11] . Note that geometric combinations generalize standard interactions and so provide for a nonlinear assessment of the concept of moderation [12] . Adaptive modeling applies to both independent and correlated outcomes in a variety of other regression contexts such as linear regression [7] , logistic regression [13] , Poisson regression [14] [15] , and discrete regression [16] .
Power transforms of time values
can cause computational problems for the adaptive modeling process when some of those time values are large. For this reason, fractional polynomial conditional hazard rate models are computed using the normalized event time values
and associated vectors
computed using these normalized times where
is an upper bound on the observed time values. This generates estimates
and
of the conditional hazard rate and the excess time probability function, respectively. Then, define the estimated conditional hazard rate and excess time probability function for the original time values
as
and
for any
,
, and
. This adjustment guarantees that power transforms are computed using bounded time values within the interval (0, 1]. It is not needed when
to start with, but that is unlikely to hold except in rare cases.
2.6. Modeling Constraints
A variety of constraints are needed so that a fractional polynomial provides an appropriate conditional hazard rate model. Specifically, covariate values
for
,
are restricted to be nonnegative (Section 2.2). Power transforms
have powers restricted to satisfy
(Section 2.2). Conditional hazard rates
are adjusted to be positive-valued and their integrals
to be increasing in
(as addressed in Section 2.3). Slope parameters
for
corresponding to power transforms depending on time t are restricted to be positive-valued (as addressed in Section 2.4). Also, fractional polynomial conditional hazard rate models are computed using normalized event times (as addressed in Section 2.5).
3. Example Analyses
Example analyses are presented in this section demonstrating the use and applicability of conditional hazard regression modeling based on fractional polynomials. These analyses use data on recurrence of bladder cancer tumors as described in Section 3.1. Examples are provided of both theory-based modeling and adaptive modeling of these data. Analyses are conducted using SAS® Version 9.4. A SAS macro for conducting adaptive modeling in a variety of regression contexts including conditional hazard regression is available on request from the author.
3.1. Multiple Tumor Event Times for Bladder Cancer Patients
Data are provided in [5] on the recurrence of up to
tumors for
patients initially diagnosed with bladder cancer collected by [17] . Patients have from
to
possibly censored recurrence times for a total of
time measurements. LCV scores are computed using
folds with fold sizes ranging from 15 (17.4%) to 20 (23.3%) patients and from 31 (17.3%) to 42 (23.5%) time measurements so that both fold sizes and fold complement sizes are not proportionately sparse. The cutoff for a distinct percent decrease (PD) in the LCV score for data with
observations is 1.07%.
Patients have follow-up times
ranging from 0 to 64 months for
. Observed tumor recurrence event times
have values increasing in When a patient has a total of
tumor recurrences with
, a censored time
is added to the measurements for this patient, and so then
for that patient and the only possible censored event time for all patients is the last one
. In cases where patients have
total tumor recurrences and
, a fifth censored time could be added but has not been to restrict to at most 4 event times (as in [5] ). One patient has follow-up time
with no observed tumor recurrences. Technically, this introduces a single censored event time
, which is trivial (i.e., the associated likelihood has value 1), and so the follow-up time for this patient has been changed to
so that
is non-trivial. Example analyses use the consecutive times between tumors given by the differences
with
. Censoring is unchanged.
Available covariates include treatment group, the initial number of tumors, and the initial tumor size. Patients are categorized into two treatment groups: 48 (55.8%) receiving a placebo treatment (group = 1) and 38 (44.2%) receiving thiotepa treatment (group = 2). Initial numbers of tumors range from 1 - 8 with 51 (59.3%), 11 (12.8%), 10 (11.6%), and 14 (16.3%) patients having 1, 2, 3, and 4-8, respectively. Initial tumor sizes range from 1 - 7 with 49 (57.0%), 10 (11.6%), 16 (18.6%), and 11 (12.8%) patients having 1, 2, 3, and 4-7, respectively. These covariates are time-invariant. They are denoted as group, number, and size in example analyses for brevity. The primary theory-based model for the data is the one based on these three covariates considered jointly.
Time-varying covariates are also considered in example analyses. These address possible dependence of the conditional hazard rate on event order and time. The dependence predictors include tumor order m with possible values 1 - 4, the current observed time
, the prior observed time
with
, and the prior cumulative time
(or, equivalently, the sum of the prior time values
) with
. Note that the last two of these predictors induce dependence of an order 1 autoregressive type. Dependence predictors of higher order autoregressive types may have value especially for data with relatively large maximum numbers of event times M, but these are not considered in example analyses for brevity.
Of the 179 tumor recurrence times, 86 (48.0%) occur at tumor order 1, 46 (25.7%) at tumor order 2, 27 (15.1%) at tumor order 3, and 20 (11.2%) at tumor order 4. Also, 67 (37.4%) of these recurrence times are censored.
The example analyses are actually conducted using times
normalized by
and their differences. However, the unadjusted symbols
and
are used to denote times in the example analyses to reduce the complexity of the notation. The patient index s is dropped for the same reason.
3.2. Modeling the Conditional Hazard Rate
Existing methods [2] [3] [4] [5] for modeling multiple event times are extensions of Cox regression methods, and so ignore the general dependence on current and prior times except for how those times affect covariates. Thus, they restrict to estimating only such covariate effects. However, fractional polynomial conditional hazard rate models need to estimate dependence effects prior to conducting theory-based or adaptive modeling based on other covariates. This section provides example analyses addressing dependence for the bladder cancer event times using fractional polynomials.
To address the dependence of conditional hazard rates over time, consider models based on the following three time-varying predictors: tumor order m, the current time value
, and the prior time value
. The adaptive additive model in these three dependence predictors is based on untransformed tumor order m (i.e., transformed by the power 1) without an intercept and with LCV score 1.07006. The associated moderation model in these three dependence predictors (i.e., allowing for geometric combinations) is based on the single power transform
without an intercept, almost the same model as the additive model.
An alternate approach for accounting for dependence would be the adaptive model based on the following three time-varying predictors: tumor order m, the current time value
, and the prior cumulative time value
, that is, with the prior time value replaced by the prior cumulative time value. The adaptive additive model in these three dependence predictors is based on untransformed event order m without an intercept, the same model as generated using the prior time value. The associated moderation model in these three dependence predictors is based on
the single power transformed geometric combination
with
an intercept and LCV score 1.07036. The associated additive model with LCV score 1.07006 has non-distinct PD 0.03% (i.e., less than the cutoff of 1.07% for a distinct PD), and so is preferable as a parsimonious, competitive alternative.
This preferable dependence model generates increasing conditional hazard rate estimates of 1.73, 3.47, 5.20, and 6.93 as tumor order m varies from 1 - 4. Estimated excess time probability curves are plotted in Figure 1. The excess time probability decreases with increased times between tumors and more quickly for larger tumor orders m. Also, observed ranges for times between tumors tend to be smaller for larger tumor orders, contributing to higher estimated conditional hazard rates and more quickly decreasing estimated excess time probability curves. This dependence model is used next to address theory-based effects to non-dependence covariates using fractional polynomial conditional hazard rate models.
3.3. Modeling Theory-Based Effects
Analyses of multiple event times using the methods of [2] [3] [4] are supported by SAS PROC PHREG (for proportional hazards regression). Results generated using SAS for these methods applied to the bladder cancer event times are reported in [18] and described in what follows. Results for the method of [5] applied
Figure 1. The estimated excess time probability as a function of time between bladder cancer tumors by tumor order.
to the bladder cancer event times are reported in that article.
The model of [2] is called the intensity model since it models the hazard rate. The joint intensity model in group, number, and size has associated p-values 0.022, <0.001, and 0.538, respectively, generated using model-based standard errors. These results support the conclusion of significant joint effects on the hazard rate due to group and number but not due to size. The estimate for group is negative indicating that thiotepa treatment reduces the hazard rate for times between bladder cancer tumors. The estimate for number is positive indicating that a larger number of initial tumors increases the hazard rate for times between bladder cancer tumors. The model of [3] is called the proportional means model. It has the same slope estimates as the intensity model but uses the robust empirical standard errors to test these slope estimates. The joint proportional means model in group, number, and size has associated p-values 0.075, 0.005, and 0.572, respectively, supporting a significant joint effect due to number (and also increasing in number as before), but not due to group and size.
The model of [4] is called a marginal Cox model since it treats the effects of group, number, and size as changing with tumor order m. It can be applied to times between tumors, and is then called a gap model, or alternately to cumulative times. It is more complex having different slopes for group, number, and size for each of the 4 values for tumor order m. Of these 12 slopes, only two estimates for the gap model are significant: the estimates for number for tumor orders 1 and 4 (
and
, both with positive slope estimates). Results reported in [5] for their alternate marginal Cox model applied to the bladder cancer data are described as "providing some evidence, though not very strong, of thiotepa slowing down tumor recurrence" (they focus on only the effects to thiotepa over the 4 tumor orders controlling for number and size rather than on the effects for all three covariates in combination).
Results for the theory-based conditional hazard rate model based on fractional polynomials are reported in Table 1 including model-based and robust empirical tests for group, number, and size. Robust empirical test results are more conservative in the sense that standard errors are larger and so p-values are larger. However, for both of these two types of tests, estimates for group and number
Table 1. Assessing the theory-based model for times between bladder cancer tumors using conditional hazard rate modeling based on fractional polynomials.
SE—standard error; 1Group has values 1 (placebo) and 2 (thiotepa), number has values 1 - 8, and size has values 1 - 7. 2Tests based on the model controlling for all three covariates as well as for tumor order to address dependence over time.
are significant at
while the estimate for size is not significant. This holds while controlling jointly for all three of these covariates as well as for the dependence predictor untransformed tumor order m as identified in Section 3.1. As for the intensity model of [2] , the estimate for group is negative indicating that thiotepa treatment reduces the hazard rate for times between bladder cancer tumors and the estimate for number is positive indicating that a larger number of initial tumors increases the hazard rate for times between bladder cancer tumors.
The impact of these dependence and non-dependence covariates can be further assessed adaptively by contracting the full model in all 4 covariates to a parsimonious competitive alternative while restricting the contraction to leave the covariates untransformed. The generated model is based on untransformed effects to tumor order, group, and number with the estimate for group negative as before and the estimate for number positive as before. The inclusion of group and number in this adaptively reduced model means that the removal of each of these from the model generates a distinct PD in the LCV score, providing further support for conclusions about the significance of their joint effects on time between bladder cancer tumors.
The above result indicates that there is a distinct effect to group controlling for number. This raises the question of what effect group has by itself, not controlling for number and size, but still allowing for dependence on untransformed tumor order. The estimated slope for group in this model is negative as before, but it is non-significant for both the model-based (
) and the robust empirical (
) tests. This indicates that the conclusion of significance of the effect of group based on theory-based models requires controlling for number.
3.4. Adaptive Modeling Results
This section addresses adaptive modeling of times between bladder cancer tumors. These models assume that dependence is a function of only tumor order m as supported by the results of Section 3.1, but allows for its possible transformation rather than keeping it untransformed as in Sections 3.1-3.2. Section 3.3.1 addresses the impact of group by itself while Section 3.3.2 addresses the combined impact of group and number. Effects to size are not addressed for brevity. This seems reasonable given that size has weak effects for the theory-based models of Section 3.2.
3.4.1. Adaptive Modeling in Terms of Group
The adaptive additive model in tumor order m (to account for dependence) and group is the model based on untransformed tumor order by itself and does not depend on group. This is the same model reported in Section 3.1 with LCV score 1.07006. The associated adaptive moderation model in these two predictors is the model based on the two transformed geometric combinations
and
without an intercept and with LCV score 1.08477. The LCV score for the additive model generates a distinct PD of 1.36%, indicating that group distinctly moderates the effect of tumor order on time between bladder cancer tumors.
Estimated conditional hazard rates for this moderation model are depicted in Figure 2. For all 4 tumor orders, the conditional hazard rate for patients given thiotepa is lower than for patients given placebo treatment with the effect strongest for tumor order 2.
3.4.2. Adaptive Modeling in Terms of Group and Number
In order to generate models in number that are more readily interpretable, analyses in this section use
, that is, number bounded to be no more than 4, combining the smaller categories 4 - 8 into one. The additive model in tumor order m, group, and
is the model based on
,
, and group with LCV score 1.11380. The associated moderation model in these three predictors is the model based on the two transformed geometric combinations
and
without an intercept and with LCV score 1.12023. The LCV score for the additive model generates a non-distinct PD of 0.57%, indicating that the additive model is preferable as a competitive alternative. It is also simpler since the effect of group is the same for all values of tumor order and of
.
Estimated conditional hazard rates for patient on thiotepa treatment based on the above additive model are depicted in Figure 3. Estimated conditional hazard rates are increasing in tumor order with, for each tumor order, essentially the
Figure 2. The estimated conditional hazard rate as a function of tumor order and treatment group (placebo or thiotepa) for times between bladder cancer tumors.
same values for numbers 1 - 3 and much larger values for numbers 4 - 8 combined. In all cases, estimated conditional hazard rates are larger by 1.07 units for patients on placebo treatment (not plotted in Figure 3).
Figure 4 provides plots for excess time probability curves for number 1 compared to numbers 4 - 8, each by group for tumor order 1. For each group, the curves for numbers 4 - 8 decrease more quickly than the curves for number 1 (which is about the same as for number 2 and for number 3, but these are not plotted). For each of the number 1 and 4 - 8 cases, the curves for the placebo
Figure 3. The estimated conditional hazard rate as a function of tumor order and initial number for times between bladder cancer tumors for patients on thiotepa treatment; estimates for placebo treatment are all 1.07 units larger.
Figure 4. The estimated excess time probability as a function of time between bladder cancer tumors by initial number 1 and numbers 4-8, each by group for tumor order 1.
group decrease more quickly than the curves for the thiotepa group with a stronger thiotepa benefit for number 1 than for numbers 4 - 8.
Figure 5 provides plots for excess time probability curves for the same cases as in Figure 4 but for tumor order 2. The patterns are similar to those of Figure 4, but with curves decreasing more quickly than associated curves of Figure 4. Plots are not provided for tumor order 3 - 4, but in general curves tend to decrease more quickly for larger tumor orders, but observed data are also relatively sparse for tumor order 3 - 4.
4. Discussion
An adaptive approach is formulated for analyzing multiple event time data. Conditional hazard rates are modeled in terms of dependence on both time and covariates using fractional polynomials, that is, weighted sums of products of powers of time and other available predictors. Models are restricted so that conditional hazard rates have positive values and so that estimated excess time probability functions decrease with time for functions of the other predictors determined by the model. Maximum likelihood is used to estimate parameters adjusting for right censored event times.
Likelihood cross-validation (LCV) scores are used to compare models. LCV scores are normalized products of deleted likelihoods for
subsets, called folds, of the data computed using parameters estimated using the complements of those folds. A model with a larger LCV score is a better model, but may not be a distinctly better model. LCV ratio tests generalizing standard likelihood ratio tests are used to identify distinct differences in LCV scores. These tests are based on a cutoff for a distinct percent decrease in the LCV score.
Figure 5. The estimated excess time probability as a function of time between bladder cancer tumors by initial number 1 and numbers 4-8, each by group for tumor order 2.
Adaptive searches through alternative models are used to identify effective models for a given data set in the sense that the removal of any of the model’s terms generates a distinct decrease in the LCV score. This is accomplished in two stages: an expansion to grow the model followed by a contraction to prune ineffective terms if any from the expanded model. Geometric combinations equal to products of powers of multiple predictors can optionally be generated. These generalize standard interactions and so provide for a nonlinear assessment of moderation.
Conditional hazards regression is demonstrated using data consisting of up to four possibly censored tumor recurrence times for patients diagnosed with bladder cancer. Dependence is modeled for times between tumor recurrence in terms of tumor order, the current time to recurrence, the prior time to recurrence, and the prior cumulative time. Adaptive analyses indicate that dependence for these times between tumor recurrence is reasonably modeled in terms of tumor order by itself, indicating that adaptive modeling of the tumor recurrence data can reasonably base dependence solely on tumor order. Figure 1 provides plots for associated estimated excess time probability curves by tumor order. This dependence model is used to conduct theory-based modeling of the times between tumor recurrence.
The example analyses start with an assessment of theory-based modeling. The primary theory-based model for times between tumor recurrence is based on the joint covariate effects of treatment group (thiotepa versus placebo), initial number of tumor (1 - 8), and initial size of the tumors (1 - 7). Existing methods for analyzing these data provide conflicting conclusions for this theory-based model. The intensity model of [2] identifies significant effects to group and initial number using model-based tests for zero slopes. The proportional means model of [3] generates the same slope estimates as the intensity model but uses robust empirical tests which identify only a significant effect to initial number. The marginal Cox model of [4] assumes that the slopes for the three covariates change with the four tumor orders, thereby increasing the number of parameters four-fold and so possibly overfitting the data. This model only identifies significant effects to initial number for two of the four tumor orders. The alternate marginal Cox model of [5] focuses on the group effect controlling for the other effects and concludes that there is “some evidence, though not very strong” of a group effect. It is not clear which of these models should be used to reach an appropriate theory-based conclusion.
On the other hand, fractional polynomial modeling indicates that there are significant effects to group and initial number using both model-based and robust empirical tests (Table 1). This approach differs from the approaches of [2] and [3] by estimating the dependence as well as the covariate effects, which results in consistent conclusions about the group effect for both model-based and robust empirical tests. It differs from the marginal Cox approaches of [4] and [5] in using tumor order to model dependence rather than having all covariate effects change with tumor order, which apparently is more effective due to generating a more parsimonious model. All five approaches indicate that initial tumor size has no effect.
Further assessment of the theory-based model using fractional polynomial modeling indicates that there is a distinct effect to group, but only after controlling for the effect to initial number and not when considered by itself. Further adaptive modeling indicates that group considered by itself distinctly moderates the effect of tumor order on times between tumor recurrence. Figure 2 provides a comparison of estimated conditional hazard rates by tumor order and group for this moderation model.
Adaptive modeling also identifies combined effects to tumor order, group, and initial number of tumors, but with the latter covariate adjusted to combine the small categories of 4 - 8 numbers in order to provide for a more readily interpretable effect to initial number of tumors. The additive model in these predictors is preferable over the associated moderation model. Figure 3 displays the estimated conditional hazard rate over tumor order and initial number for the thiotepa group. The conditional hazard rate for each tumor order is almost the same for each of initial numbers 1 - 3 and then increases for initial numbers 4 - 8 combined and this pattern increases with increasing tumor order. In all cases, the conditional hazard rate for patients on placebo treatment is 1.07 units higher. Figure 4 & Figure 5 provide plots of estimated excess time probability curves generated by this model for tumor order 1 and 2, respectively. Plots are not provided for tumor order 3 and 4, but in general these curves tend to decrease more quickly for larger tumor order. However, observed data are relatively sparse for tumor order 3 and 4.
The example analyses are limited in only addressing recurrent event time data and not more general multiple event time data, but the methods have been formulated to handle more general cases. Censoring for these analyses can only occur for the last event for each participant. However, conditional hazards regression has been formulated in terms of previous time values whether they are censored or not and so generalizes to more complex censoring situations. Conditional hazards regression can also be applied to model clustered multiple event time data (as in [19] ) as long as the clusters can be assigned appropriate event orders indexes. Future work is needed to investigate these more complex multiple event time data issues.
In summary, conditional hazard regression modeling has been formulated to model dependence across multiple event times as well as the effects of covariates. It has also been demonstrated using both theory-based and adaptive analyses of times between tumor recurrence for bladder cancer patients. Results of example analyses indicate that adaptive conditional hazard rate modeling can generate useful insights into multiple event time data, but future work is needed to apply this method to more general sets of data.
Acknowledgements
Prior development of the SAS macro used to conduct reported example analyses was supported in part by grants R01 AI57043 from the National Institute of Allergy and Infectious Diseases and R03 MH086132 from the National Institute of Mental Health. Work reported on in this article was not supported by external funding.