Regression Modeling of Individual-Patient Correlated Discrete Outcomes with Applications to Cancer Pain Ratings ()
1. Introduction
Pain ratings are often coded as integer values from 0 - 10 with larger values indicating more pain [1] [2] [3] . These are collected by health care professionals from all kinds of patients, but are especially important for cancer patients [4] . Pain ratings collected from individual patients over multiple time points require modeling methods that account for within-patient correlation. These methods need to allow for outcomes with an arbitrary finite number of discrete numeric values since individual-patient responses can often be limited to a subset of the maximum range of 0 - 10. As an example, Figure 1 provides a plot of daily pain ratings for Cancer Patient 1. This patient provided pain ratings for 86 different days over a period of length 97 days (and so with 11 missing daily pain ratings). Observed pain ratings varied from 1 - 9 with all of these 9 ratings occurring at least one time. The plot suggests that mean pain ratings tended to increase over time with larger variability early on than later in time. Estimation of relationships like this requires regression methods for estimating probabilities, means, variances, and dispersions for observed outcome (dependent, response, y) values as possibly nonlinear functions of time and of other available predictors.
Figure 1. Example pain ratings over time for Cancer Patient 1.
Generalized estimating equations (GEE) methods [5] are a possible choice for modeling such correlated pain ratings. Since pain ratings are polytomous outcomes, one could use the extensions of GEE developed by Lipsitz et al. [6] and Miller et al. [7] to handle categorical outcomes. However, these extensions involve recoding each pain rating as the vector of indicator variables for the pain rating taking on its possible values (except for one value treated as a reference category). In the case of Cancer Patient 1 with 9 possible outcome values, pain ratings at each time would be recorded as a vector of 8 indicator variables with its own 8 × 8 correlation matrix. There would be 86·85/2 = 3655 pairs of such vectors measured at different times, each of whose 8 × 8 correlation matrices would need estimation. Even for simple correlation structures like exchangeable or autoregressive, there would still be a large number 8·8 = 64 of correlation parameters. Moreover, one would need to store the overall correlation matrix of size (8·86)2 = 473,334 entries. Consequently, recoding correlated polytomous outcomes seems only feasible when the outcome has a small number of possible values and is measured at a small number of times. An approach is needed that treats each polytomous outcome measured at one time as univariate so that the size of the associated correlation matrix depends only on the number of measurement times and not also on the number of possible outcome values.
Likelihoods for correlated outcomes can be computationally complex except for limited cases. For this reason, Liang and Zeger [5] formulated GEE methods to avoid having to compute a likelihood by directly specifying estimating equations for mean parameters. Variances are treated as functions of the means as in generalized linear modeling [8] [9] while dispersions are treated as constant. Correlation parameters are estimated using residuals. Prentice and Zhao [10] extend the GEE estimating equations for mean parameters to also include analogous estimating equations for covariance parameters. These GEE approaches are not based on a likelihood function so that model selection criteria such as penalized likelihood criteria [11] and likelihood cross-validation scores [12] are not readily computed
Knafl and Ding [12] define a likelihood-like function L using the multivariate normal density computed using residuals and covariance matrices for categorical outcomes and point out that the GEE estimating equations for mean parameters correspond to differentiating the residual terms of L in the mean parameters while holding the covariances fixed in those parameters (see also [13] ). Similar to Prentice and Zhao [10] , they propose a partial extension of GEE that adds estimating equations for dispersion parameters to the GEE estimating equations for mean parameters, but they still estimate correlation parameters from residuals. Knafl and Meghani [14] consider modeling of individual-patient correlated count outcomes and compare the partial extension of GEE having some estimating equations based on differentiating L to extended linear mixed modeling (ELMM) based on maximizing the function L in all parameters including those for the means, dispersions, and correlations. ELMM generates estimating equations for all parameters as for Prentice and Zhao [10] , but the estimating equations for mean parameters are not the same as for GEE (except in the special case of continuous outcomes treated as normally distributed).
Knafl and Meghani [14] compare the partial extension of GEE to ELMM for modeling individual-patient count outcomes and conclude that ELMM is preferable since it generates competitive models in less time. For that reason, only ELMM is considered here for modeling correlated discrete outcomes. They consider three correlation structures including independent correlations all equal to 0, exchangeable correlations all equal to a constant, and spatial autoregressive order 1 correlations computed as power transforms of a constant autocorrelation parameter. Exchangeable correlations are not selected in their example analyses, and so are not considered in what follows. Only spatial autoregressive order 1 (AR1) correlations based on the autocorrelation parameter ρ are considered in what follows. Independent correlations correspond to the special case with
.
Knafl and Ding [12] formulate and demonstrate an adaptive regression modeling process for identifying nonlinear relationships, controlled by likelihood cross-validation scores for comparing alternative models. These methods extend readily to modeling of discrete outcomes and are used in example analyses.
The objective of the paper is to formulate methods for analyzing discrete outcomes collected longitudinally from individual patients and to demonstrate these methods using analyses of longitudinal data on the daily pain ratings for a single cancer patient as a function of time and the number of daily pain flares. This is achieved in two parts. Section 2 addresses such methods including multinomial, ordinal, and censored Poisson probabilities as well as likelihood-like cross-validation, adaptive regression methods, and the research study whose data are used in example analyses. Section 3 presents the results of example adaptive analyses of these data including among other issues which is the preferable type of probabilities to use, how means and dispersions for the pain ratings change additively with the number of pain flares, and how the number of pain flares moderates the effect of time on means and dispersions.
2. Methods
Let
denote discrete outcomes with a finite number of possible numeric values
for
and observed at N possibly non-consecutive, integer time points
and provided by one individual patient. Let
denote associated probabilities for
and
. The means of these discrete outcomes satisfy
and the variances satisfy
These variances are not a direct function
of the means
as in generalized linear modeling [8] [9] , but they are similar since they can be considered a function
of the
vector
of probabilities
for the outcome
. Define the residuals as
. Combine the outcomes
, the means
, and the residuals
into the N × 1 vectors
,
, and
, respectively.
Let
denote predictor values over times
and over predictors indexed by
for use in modeling probabilities. Combine these into the
vectors
with transposes denoted by
for
. As formulated later, these are combined with a column vector
of coefficient parameters to estimate probabilities. Three alternatives are considered including multinomial probabilities (Section 2.1), ordinal probabilities (Section 2.2), and censored Poisson probabilities (Section 2.3). The size of the parameter vector
varies for these three alternatives.
Let
denote predictor values over times
and over predictors indexed by
for predicting dispersions. Combine these into the
vectors
for
. Let
denote the associated
vector of coefficient parameters. Let
denote dispersion values over times
satisfying
When
for
, the first entry
of
is an intercept parameter. The constant dispersion model corresponds to
for
with
Define the extended variances as
and the extended standard deviations as
for
. These generate standardized residuals
for
. Combine the extended standard deviations and the standardized residuals into the N × 1 vectors
and
, respectively.
Let
denote the N × N AR1 correlation matrix for the vector
. The diagonal entries of
are all equal to 1 while the off-diagonal entries satisfy
where
denotes the absolute value of the difference
for
with
. The entries
are well-defined for
because
have been assumed to be integers. These are spatial AR1 correlations that account for actual distance between observed times as opposed to non-spatial AR1 correlations with
for
as usually used in GEE implementations. The N × N covariance matrix
for the vector
satisfies
where
denotes the N × N diagonal matrix with diagonal entries
.
Let
be the column vector of the probability, dispersion, and correlation parameters. Use the multivariate normal likelihood to define the likelihood-like function
satisfying
where
is the determinant of the covariance matrix
. Note that
and
This formulation has been restricted to address data for each individual patient taking a person-centered approach to modeling longitudinal data [15] [16] , which is possible because substantial amounts of outcome measurements are available for each patient. This formulation readily generalizes to handle the combined longitudinal data for multiple patients as considered in the GEE context. In that case, as pointed out by Knafl and Ding [12] , the GEE estimating equations can be generated by differentiating the residual terms of
while holding the covariance matrix terms fixed. This motivates using the ELMM approach for estimating
based on estimating equations generated by maximizing
. Moreover, the likelihood-like function
can be used to generate model selection criteria. Pan [17] has formulated the quasi-likelihood information criterion (QIC) for GEE model selection. However, the QIC score does not fully account for the correlation structure while model selection criteria based on
fully account for the correlation structure.
The likelihood-like function
can be maximized to generate estimates
by solving for a zero gradient, that is,
where
is the zero vector,
is the partial derivative vector for the probability parameters,
is the partial derivative vector for the dispersion parameters, and
is the partial derivative for the correlation parameter. The Hessian matrix
has nine component submatrices:
for the probability parameters,
for the dispersion parameters,
for the correlation parameter,
and its transpose
,
and its transpose
, and
and its transpose
. Iteratively solve
using Newton's method, that is, given the current value
for
, the next value is given by
The solution to the estimating equations for observations indexed by T is denoted as
The partial derivative vector
varies with the probability type as do
,
, and
. Formulas are provided for these quantities in Sections 2.1-2.3 for multinomial probabilities, ordinal probabilities, and censored Poisson probabilities, respectively. Formulas for partial derivatives common to all three probability types are provided in what follows. Details on computation of derivatives are not provided for brevity; they are available on request from the first author.
The partial derivative vector
has
entries satisfying
where
is the N × 1 vector with entries
for
and
. The derivative
satisfies
where
For spatial AR1 correlations,
is the N × N matrix with diagonal entries all equal to 0 and off-diagonal entries equaling
for
.
has entries
where
is the N × 1 vector with entries
for
and
.
satisfies
where
Formulas for first and second derivatives of
and
are adapted from formulas in [18] . For spatial AR1 correlations,
is the N×N matrix with diagonal entries all equal to 0 and off-diagonal entries
for
.
has entries
for
.
The covariance matrix for the estimate
satisfies
Square roots of the diagonal entries of
can be used to generate
tests of zero individual model parameters. These are useful for fixed models of theoretical importance. However, these tests for parameters of adaptively generated models are usually significant as a consequence of the model selection process, and so are not reported in example analyses of Section 3.
The likelihood-like function
can be used to compute likelihood-like cross-validation (LCV) scores (Section 2.4) for evaluating and comparing alternative models. These scores can be used to control the adaptive modeling process (Section 2.5) for identifying power transforms of the probability predictors and of the dispersion predictors for use in nonlinear modeling of discrete outcomes.
2.1. Multinomial Probabilities
The probabilities
are modeled multinomially using generalized logits with the smallest value
as the reference category (but any other value can be used instead), that is,
for K J × 1 vectors
of coefficient parameters
for
and
. Combine the vectors
over
into the composite
vector
. Altogether, there are K∙J coefficient parameters for modeling the probabilities. Setting
for
generates K intercept parameters. For
, the multinomial probabilities satisfy
for
and
Their partial derivatives
satisfy
and
for
,
, and
.
The derivative vector
has K∙J entries
for
and
satisfying
where
and
is the N × 1 vector with entries
for
,
, and
.
has entries
where
while
is the N × 1 vector with entries
for
,
, and
.
has entries
where
is the N × 1 vector with entries
for
,
, and
.
has entries
for
and
.
2.2. Ordinal Probabilities
For
, define cumulative probabilities
where the values
are assumed to be in increasing order for
. The link function is cumulative logits with logits computed for lower sets of values relative to higher sets of values (but this can be reversed). Formally, for predictor values
,
, the cumulative probabilities
for
and
are modeled ordinally as
for K intercept parameters
and a single J × 1 vector
of slope parameters
for
. Combine the intercept parameters
over
into the K × 1 vector
. Altogether, there are K + J coefficient parameters for modeling the probabilities, which are combined over
and
into the
vector
A zero-intercept model corresponds to setting
, but
for
are nonzero. The cumulative probabilities satisfy
for
and
. The cumulative probabilities are differenced to compute probabilities
that is, for
, define
and then
for
. For
,
, and
, the partial derivatives of
satisfy
The derivative vector
has K + J entries
for
and
for
satisfying
where
while
and
are the N × 1 vectors with entries
for
,
, and
.
has entries
where
while
,
,
, and
are the N × 1 vectors with respective entries
for
and
.
has entries
where
and
are the N × 1 vectors with respective entries
for
,
,
, and
.
has entries
for
and
.
2.3. Censored Poisson Probabilities
The censored Poisson probabilities
are modeled as follows
using the natural log link function for modeling
, for
. There are J coefficient parameters for modeling the probabilities. Setting
for
generates an intercept parameter.
In the special case when
are consecutive integers, that is,
for an integer
and
, truncated Poisson probabilities [19] could be used instead with
where the normalizing constant S satisfies
These are not considered any further.
The first partial derivatives
and
satisfy
for
and
. The associated second partial derivatives satisfy
for
and
.
The derivative vector
has J entries
for
satisfying
where
and
is the N × 1 vector with entries
for
and
.
has entries
where
while
is the N × 1 vector with entries
for
and
.
has entries
where
is the N × 1 vector with entries
for
,
, and
.
has entries
for
.
2.4. Likelihood-Like Cross-Validation
In k-fold cross-validation [20] , observations are partitioned into k disjoint subsets called folds. Parameter estimates computed using the data from the other folds are used to predict fold observations. In k-fold likelihood-like cross-validation (LCV), these deleted fold predictions are scored using the associated likelihood-like function L. The times
are randomly partitioned into k disjoint folds
for
. The same initial seed is used for randomization with all models under consideration to generate compatible LCV scores. Denote the deleted estimate of
using the data with times in the complement
of the fold
as
. For
, denote the union of all folds
for
as
.
is the empty fold satisfying
. The LCV score is
where
is the conditional likelihood-like term for the data in fold
conditioned on the data in the union
of the prior folds using the deleted estimate
of the parameter vector
. Formally,
Larger LCV scores indicate better models.
2.5. Adaptive ELMM
Knafl and Ding [12] formulate adaptive regression methods for searching through alternative models for means and dispersions in a variety of contexts. These methods use adaptive fractional polynomial models [21] . A short overview is provided here (for details, see Chapter 20, [12] ). Adaptive regression methods generalize to ELMM modeling of discrete outcomes and are used in the example analyses of individual-patient pain ratings (Section 3). Model selection is a two-phase heuristic process. First, the model is expanded (or grown) by adding power transforms of predictors for means and dispersions. Then, the model is contracted (or pruned) to a parsimonious set of power transforms by removing transforms from the current model one at a time and adjusting the powers of the remaining transforms. LCV scores are used to evaluate and compare alternative models. Tolerance parameters control the adaptive modeling process. These tolerance parameters indicate how much of a reduction in the LCV score can be tolerated at given stages of the process. Predictors having arbitrary values raised to arbitrary powers can generate floating point overflow problems. To counter this problem, power transformed predictor values are upper bounded to be no larger than 1012.
The adaptive modeling process can optionally generate geometric combinations, that is, products of power transforms of multiple predictors generalizing standard interactions, possibly with the geometric combinations also power transformed, for example,
. This provides for an assessment of nonlinear moderation, generalizing the standard linear form of moderation [22] .
A wide variety of example analyses are provided in [12] demonstrating the usefulness of adaptive regression methods. However, adaptive modeling of discrete outcomes has not been previously addressed.
A SAS® (SAS Institute, Inc., Cary, NC) macro has been developed for conducting adaptive analyses. This macro as well as data and code used to generate the results of the example analyses along with SAS output for those analyses are available from the first author.
2.6. On-Going Study of Cancer Pain
The data analyzed in the example analyses have been collected as part of an on-going study of daily pain and opioid usage for cancer patients. This study is collecting a variety of measures including intensive longitudinal individual-patient data using Ecological Momentary Assessment (EMA) [23] as implemented in the mEMA app [24] . Each patient is providing data on numbers of pain flares, that is, sudden increases in pain, and of opioids taken on each day. Methods for analyzing such individual-patient longitudinal count outcomes using Poisson regression modeling are addressed in Knafl and Meghani [14] . Each patient is also providing data on ratings of worst pain and least pain on a scale of 0 - 10 for each day (as also used in the Brief Pain Inventory [4] ). Methods for analyzing such individual-patient longitudinal pain rating data using discrete regression modeling are addressed above. The pain ratings for Cancer Patient 1 plotted in Figure 1 are daily worst pain ratings and are used in the analyses of Section 3. This on-going study received Institutional Review Board approval. All participants provided written informed consent.
3. Results of Example Analyses
Table 1 contains results for adaptive models for probabilities and dispersions of pain ratings for Cancer Patient 1 over time (as plotted in Figure 1) computed using ELMM and the spatial AR1 correlation structure. LCV scores are based on k = 5 folds with fold sizes ranging from 13 to 20 measurements. Multinomial and ordinal probabilities are based on a single power transform of time while censored Poisson probabilities are based on two power transforms of time. All three probability types have zero intercept terms, meaning that eight intercepts are zero for the multinomial probabilities, the first intercept is zero for the ordinal probabilities, and the one intercept is zero for the censored Poisson probabilities. All three models have dispersions based on a single transform of time with zero intercept terms. The multinomial model has nine parameters (eight slopes for the probability time transform and one slope for the dispersion transform), the ordinal model also has nine parameters (seven intercept parameters for the probabilities, one slope for the probability time transform, and one slope for the dispersion transform), the censored Poisson model has three parameters (one slope for each of two probability time transforms and one slope for the dispersion transform).
The multinomial model has the best (largest) LCV score 0.23083 while the ordinal model has the worst (smallest) LCV score 0.22635. The censored Poisson model has the intermediate LCV score 0.22935, but it is not much smaller than the LCV score for the multinomial model and is based on one-third the number of parameters. Furthermore, the censored model requires only 5.2 minutes of clock time compared to 25.9 or about 5.0 times more for the ordinal model and 42.7 minutes or about 8.2 times more for the multinomial model. Consequently, censored Poisson probabilities are preferable to the other two approaches for modeling the pain ratings of Cancer Patient 1 because they generate a competitive LCV score, are more parsimonious, and require less time to compute. For this reason, only censored Poisson probabilities are considered in subsequent analyses of the pain ratings of Cancer Patient 1.
Table 1. Comparison of Probability Types for Analyzing Daily Pain Ratings of Cancer Patient 1a.
LCV—likelihood cross-validation. aComputed using adaptive extended linear mixed modeling and spatial autoregressive order 1 correlations. bAll models have zero intercept terms.
3.1. Assessment of the Number of Folds
It is possible that a larger number k of folds is more appropriate to use in analyzing the pain ratings of Cancer Patient 1. However, adaptive models for censored Poisson probabilities and dispersions using 10 and 15 folds have smaller LCV scores 0.22706 and 0.22378, respectively. Consequently, only k = 5 folds are used to compute LCV scores in subsequent analyses.
3.2. Independent versus Autoregressive Correlations
The adaptive model for censored Poisson probabilities and dispersions assuming independent correlations has LCV score 0.22349, smaller than the LCV score for the associated model of Table 1, indicating that the spatial AR1 correlation structure is the more appropriate choice. Consequently, only spatial AR1 correlations are considered in subsequent analyses of the pain ratings of Cancer Patient 1.
3.3. Assessment of Constant Dispersions
The adaptive model for censored Poisson probabilities using spatial AR1 correlations and assuming constant dispersions has LCV score 0.19309 smaller than the LCV score for the associated model of Table 1. Thus, dispersions for the pain ratings of Cancer Patient 1 are reasonably considered to be non-constant.
3.4. Assessing Linearity in Time
Using censored Poisson probabilities based on untransformed time, that is, linear in time, with an intercept, the adaptive model in time for the dispersions using spatial AR1 correlations has LCV score 0.20959 smaller than the LCV score for the associated model of Table 1. Thus, the censored Poisson probabilities for the pain ratings of Cancer Patient 1 are reasonably treated as nonlinear in time.
3.5. Adaptive Model in Time
Results of the above analyses indicate that the censored Poisson model of Table 1 provides an appropriate assessment of the dependence on time of the probabilities and dispersions of the pain ratings of Cancer Patient 1. The probabilities for this model are based on
and
without an intercept while the dispersions are based on
without an intercept. The estimated autocorrelation is 0.39 so that correlations decrease quickly with increased days apart, for example, the correlation is less than 0.01 for outcomes 5 or more days apart. Figure 2 contains the plot of estimated mean pain ratings over time, which decrease from 7.8 at day 1 quickly to 4.6 by day 6 and then increase to 7.1 by day 97. Figure 3 contains the plot of estimated dispersions for pain ratings over time, which decrease from 1 over days 1 - 15 to 0.19 at day 35, and remain constant after that (due to upper bounding the dispersion transform).
Estimated probabilities over time for pain ratings 1 - 5 are plotted in Figure 4
Figure 2. Estimated means of pain ratings over time for Cancer Patient 1.
Figure 3. Estimated dispersions of pain ratings over time for Cancer Patient 1.
and for pain ratings 6 - 9 in Figure 5. Estimated probabilities for pain ratings 1 - 5 increase quickly early on and decrease after that. Estimated probabilities for pain ratings 6 - 9 decrease quickly early on and increase after that with some small decreases late in time for pain ratings 6 - 7. Estimated probabilities are all
Figure 4. Estimated probabilities of pain ratings 1 - 5 over time for Cancer Patient 1.
Figure 5. Estimated probabilities of pain ratings 6 - 9 over time for Cancer Patient 1.
smaller than 0.25 for pain ratings 1 - 8. The estimated probability of the highest observed pain rating of 9 is 0.51 on day 1, decreases quickly to 0.03 by day 6, and then increases to 0.33 by day 97. Estimated probabilities over time for a high pain rating of 6 or more are plotted in Figure 6. The estimated probability of a
Figure 6. Estimated probabilities of a high pain rating of 6 or more over time for Cancer Patient 1.
high pain rating of 6 or more starts at 0.89 on day 1, decreases to 0.30 by day 6, and then increases to 0.78 by day 97.
3.6. Adaptive Additive Model in Time and the Number of Pain Flares
Numbers of pain flares for Cancer Patient 1 vary from 0-4 with none missing for the N = 86 time points. By default, the adaptive modeling process generates additive models in multiple predictors. When applied to the pain ratings as a function of the number x of pain flares as well as of time, the generated additive model has probabilities based on a zero intercept,
, and
; dispersions based on a zero intercept and
; estimated autocorrelation 0.37; and LCV score 0.28173. Since this LCV score is larger than the LCV score 0.22935 for the model based on only time, the number of pain flares is reasonably considered to have an additive effect on the censored Poisson probabilities, but not on the dispersions.
Estimated means under this additive model are plotted in Figure 7, which increase nonlinearly over time at higher levels for higher numbers of pain flares. Figure 8 displays the plot of estimated dispersions for the additive model, which decrease nonlinearly from 1 at day 1 to 0.02 by day 97. Plots for estimated probabilities are not provided because that requires two plots similar to Figure 4 and Figure 5 for each of the five observed numbers of pain flares.
Figure 7. Estimated means of pain ratings over time and changing additively with the number of pain flares for Cancer Patient 1.
Figure 8. Estimated dispersions for pain ratings over time under the additive model for Cancer Patient 1.
3.7. Adaptive Moderation of the Effect to Time by the Number of Pain Flares
Optionally, the adaptive modeling process can generate moderation models allowing for additive effects of multiple predictors together with geometric combinations based on those predictors. When applied to the pain ratings as a function of the number x of pain flares as well as of time, the generated moderation model has probabilities based on a zero intercept,
,
, and the four geometric combinations
,
,
, and
; dispersions based on a zero intercept,
, and the one geometric combination
; estimated autocorrelation 0.32; and the LCV score is 0.29730. Since this LCV score is larger than the LCV score 0.28173 for the additive model, the number of pain flares is reasonably considered to moderate the effect of time on the censored Poisson probabilities as well as on the dispersions.
Estimated means under this moderation model are plotted in Figure 9. For 0 - 3 pain flares, estimated means increase nonlinearly over time with some mild decreases late in time in some cases and follow somewhat different patterns. On the other hand, estimated means for 4 pain flares decrease nonlinearly over time. Figure 10 displays the plot of estimated dispersions for the moderation model. Estimated dispersions decrease nonlinearly over time following somewhat different patterns at increasingly higher levels for 1 - 4 pain flares, but at the highest level at 0 pain flares. Plots for estimated probabilities are not provided because that requires two plots similar to Figure 4 and Figure 5 for each the five observed numbers of pain flares. However, Figure 11 provides the plot for estimated probabilities of a high pain rating of 6 or more. Similar to the means of Figure 9, estimated probabilities for 0 - 3 pain flares increase nonlinearly over
Figure 9. Estimated means of pain ratings over time moderated by the number of pain flares for Cancer Patient 1.
Figure 10. Estimated dispersions for pain ratings over time moderated by the number of pain flares for Cancer Patient 1.
Figure 11. Estimated probabilities of a high pain rating of 6 or more over time moderated by the number of pain flares for Cancer Patient 1.
time with some mild decreases late in time in some cases and following somewhat different patterns while estimated probabilities for 4 pain flares decrease nonlinearly over time from a high level at day 1 to essentially zero from around day 40 and later.
4. Summary
Formulations are provided for methods to use in regression modeling of individual-patient longitudinal discrete outcomes allowing for nonlinearity in predictors for probabilities and dispersions for such outcomes along with temporal correlation using spatial autoregression order 1. Three approaches are considered for modeling probabilities of outcome values. The multinomial approach is based on generalized logits with separate intercept and slope parameters for modeling probabilities for outcome values. The ordinal approach is based on cumulative logits with separate intercept parameters and the same slope parameter for modeling cumulative probabilities for outcome values. The censored Poisson approach is based on the log link function with the same intercept and slope parameters for modeling standard Poisson probabilities for all but the largest outcome value, whose value is set so that the probabilities sum to one.
Extended linear mixed modeling is used to estimate model parameters for the three probability types. A likelihood-like function L is defined using the multivariate normal density evaluated using residuals and covariances for discrete outcomes. The function L is maximized by solving estimating equations corresponding to setting the gradient vector equal to zero. Formulations are provided for computing gradient vectors and Hessian matrices for use in estimating models of each probability type. The function L is used to compute likelihood-like cross-validation (LCV) scores for comparing alternative models. These LCV scores are used to control an adaptive modeling process for heuristic search through power transforms of available predictors of outcome probabilities and dispersions.
These methods are used in example adaptive analyses of the longitudinal individual-patient cancer pain ratings of Figure 1. Table 1 contains results for generated models of these pain ratings in time using each of the three probability types. The censored Poisson approach is preferable over the other two approaches for modeling these data because the associated model has a competitive LCV score, is more parsimonious based on fewer parameters (three compared to nine for each of the other two approaches), and is computed in much less time. This is likely to hold for modeling of other longitudinal discrete outcomes collected for individual patients, not just discrete outcomes based on pain ratings, and even of longitudinal discrete outcomes for multiple patients. The censored Poisson model for the example data has estimated probabilities that are nonlinear in time (Figures 4-6) generating associated means (Figure 2) and dispersions (Figure 3) that are also nonlinear in time.
Models are also generated assessing the additive effect of the number of pain flares on means and dispersions (Figure 7 and Figure 8) as well as moderation of the effect of time by the number of pain flares (Figures 9-11). There is an additive effect compared to the model based on only time, but a more substantive moderation effect. These models demonstrate the need to account for nonlinear additive and moderation effects for individual-patient longitudinal discrete outcomes.
Future research is needed to assess the use of ELMM for modeling correlated discrete outcomes for multiple patients in combination. Future research is also needed to compare ELMM to generalized linear mixed modeling.
Acknowledgements
This work was supported in part by the National Institutes of Health/National Institute of Nursing Research Award 1R01NR017853. S. H. Meghani was the Principal Investigator and G. J. Knafl was a consultant for this research project.