Modeling Individual Patient Count/Rate Data over Time with Applications to Cancer Pain Flares and Cancer Pain Medication Usage ()
1. Introduction
An ongoing study (NIH/NINR 1R01NR017853) of patients with cancer is collecting daily longitudinal count/rate data including numbers of pain flares per day and numbers of as needed pain medications taken per day. Data are being collected for each study participant over periods of up to five months long. A completed study (NIH/NINR RC1NR011591) collected numbers for cancer patients over 3 months of around the clock pain medications taken per day per dose, that is, the number of times a medication is taken in a day relative to the number of doses that are supposed to be taken in a day. Standard assumptions of means linear in time and dispersions constant over time are not always appropriate for such data. Also, a Poisson process assumption of independence over time needs not always hold. A model selection score needs to be defined for evaluating models for the data and for use in searches over alternative models. Times to conduct these searches need to be as short as possible, especially as the number of time measurements increases.
Approaches are presented for modeling mean counts/rates over time separately for each individual patient controlling for temporal correlation as well as for time-varying dispersions. These approaches use Poisson regression methods, because count/rate data are being modeled. Generalized estimating equations (GEE) methods [1] [2] provide a natural choice for modeling correlations for such count variables. However, standard GEE methods have limited value, because they assume constant dispersions. Furthermore, GEE methods avoid specification of likelihood functions, which are useful for generating model selection criteria. In what follows, two extensions of GEE methods are formulated and evaluated that address temporal correlation and time-varying means and dispersions for repeated count/rate measurements. A likelihood-like function, that is, a function used like a likelihood but which needs not integrate to 1, is defined and used in computing parameter estimates for these extensions along with a model selection criterion for comparing alternative models. Example analyses of selected individual cancer patient count/rate data are presented using adaptive regression methods [3] for identifying possibly nonlinear trajectories for means and dispersions of counts/rates over time while controlling for temporal correlation.
2. Modeling Individual Count/Rate Data
2.1. Generalized Linear Modeling of Means
Let
denote count values for an individual patient observed at N distinct times within a general set T of times, that is,
. Combine these into the
vector
. Let
denote associated mean or expected counts and combine these into the
vector
. Denote the residuals as
for
and combine these into the
vector
. Let
denote predictor values over times
and over predictors indexed by
and combine these into the
vector
with transpose denoted by
for
. Let
be the
matrix with rows
for
. Let
denote the associated
vector of coefficient parameters. Use generalized linear models [4] [5] of
for
with natural log link function
so that
for
. When
for
, the first entry of
is an intercept parameter. Treat each
as Poisson distributed so that its variance is
.
The counts
sometimes have associated totals
, and then the model for the mean counts
is converted to a model for the means
of the rates
using offsets
. Formally, replace
by
so that mean counts are
and then
are the mean rates.
2.2. Time-Varying Dispersions
Let
denote predictor values over times
and over predictors indexed by
and combine these into the
vectors
for
. Let
be the
matrix with rows
for
. Let
denote the associated
vector of coefficient parameters. Let
denote dispersion values over times
satisfying
and define the extended variances as
and the extended standard deviations as
for
. Informally, these quantities extend the usual Poisson variances and standard deviations through multiplication by dispersions. These are used to compute the standardized residuals
. Combine the extended standard deviations into the
vector
. When
for
, the first entry
of
is an intercept parameter. The constant dispersion model corresponds to
for
with
. This is the dispersion model used in standard GEE modeling.
When offsets
are used to convert the model for the counts
to a model for the rates
, they can also be added to the dispersions. The dispersions then satisfy
so that the extended variances for the counts
are
and then the variances for the rates
are
where
2.3. Modeling Correlations
Denote the covariance matrix for the count vector
as
. Use the GEE approach [1] [2] to model the covariance matrix for
as
where
is the
diagonal matrix with diagonal entries
for
and
is a
correlation matrix with diagonal entries 1 and off diagonal entries
for
determined by a correlation parameter
varying with the assumed correlation structure. Under independent (IND) correlations,
for
. A Poisson process generates such correlations. Under exchangeable (EXCH) correlations,
for
, that is, the correlations are constant. Under autoregressive of order 1 (AR1) correlations,
for
where
is the absolute value of
. These differences are all assumed to be integers so that the correlations
are all well-defined. In general,
are spatial AR1 correlations. The special case of non-spatial AR1 correlations with
treats times as equally spaced. The parameter
is called the autocorrelation.
2.4. Possible Extensions
The above formulation can be extended to address repeated measurements of types other than counts/rates and for multiple patients. More complex correlation structures based on multiple correlation parameters can also be considered. One such example is unstructured correlations with different correlations for different pairs of measurements, but this requires data from multiple patients to be reasonably estimated. These extensions are not addressed further.
3. Standard Generalized Estimating Equations Modeling
3.1. Notation and Parameter Estimation
Under standard GEE modeling, dispersions are treated as a constant
so that the covariance matrix satisfies
where
is the
vector with entries
for
. The generalized estimating equations are given by
where 0 is the
vector with all zero entries,
, and the
matrix
with entries
for
and
. Let
. Note that in the general GEE context with correlated outcomes for multiple subjects, the formulation for
would equal a sum of terms like
for each subject and
would equal a sum of terms like
for each subject. Only one such term is needed here since data for only one subject/patient are being modeled. The GEE process for estimating
iteratively solves
as follows. Given the current value
for
, the next value is given by
, thereby adapting Newton’s method with
in the role of the gradient vector and
in the role of the Hessian matrix.
The constant dispersion parameter
is estimated using the Pearson residuals
evaluated at a given value for the mean coefficient parameter vector
. The bias-adjusted estimate
of the dispersion parameter
satisfies
assuming
. Next, the correlation parameter
is estimated using standardized errors
for
as follows. The IND correlation structure has no need for an estimate. For the EXCH correlation structure and a given value
for the mean parameter vector,
can be estimated by
assuming
. For the AR1 correlation structure and a given value
for the mean parameter vector, the autocorrelation
can be estimated by
assuming
. In the non-spatial AR1 special case,
because
for
.
For any correlation structure, once the GEE estimate
of the coefficient parameter vector
is computed using the observations indexed by
, the GEE estimate of the dispersion parameter
is
. The GEE estimate of the correlation parameter
is
computed using
and
.
3.2. The Likelihood-Like Function
Let
be the
vector of the GEE mean and dispersion parameters. The correlation parameter
is a function of
and
and so has not been included in
. Use the multivariate normal likelihood to define the likelihood-like function
satisfying
where
is the determinant of the covariance matrix
. The vector
of partial derivatives of
can be expressed as the sum of two terms. The first term corresponds to differentiating the residual vector part
of
with respect to
holding the covariance part
fixed in
and equals
, the gradient-like quantity used in standard GEE modeling. This fact seems to have been first recognized by Chaganty [6] . One advantage for having a likelihood-like function for GEE models is that it can be used to compute parameter estimates. Another is that it can be used to compute model selection criteria not otherwise available for GEE modeling.
3.3. Likelihood-Like Cross-Validation
Burman [7] defined k-fold cross-validation with observations partitioned into k disjoint subsets called folds. Fold observations are predicted using parameter estimates computed using the data from the other folds. In k-fold likelihood-like cross-validation (LCV), these deleted fold predictions are scored using the associated likelihood-like function L. Randomly partition the times
into k disjoint folds
for
. Use the same initial seed for randomization with all models under consideration so that their LCV scores are comparable. Let
denote the estimate of
using the data with times in the complement
of the fold
. For
, let
denote the union of all folds
for
with
the empty fold and set
. Define the LCV score to satisfy
where
is defined as the conditional likelihood-like term for the data in fold
conditioned on the data in the union
of the prior folds using the deleted estimate
of the parameter vector
. Formally,
Because fold assignment is random, folds can be empty when the number k of folds is large relative to the number N of measurements, and then those folds are dropped from the computation of the LCV score. Larger LCV scores indicate better models. Note that even if the full data are non-spatial with observations at consecutive integer times
for
, the folds
and the fold unions
are not consecutive integer times except in rare cases and so require more general handling.
4. Incorporating Nonconstant Dispersions
4.1. Formulation
GEE modeling can be extended to handle nonconstant dispersions. Let
be the
vector of the mean and dispersion parameters. The definition of the likelihood-like function
given for standard GEE holds using the more general parameter vector
. Differentiate
with respect to the vector
of dispersion coefficient parameters while holding the correlation parameter
fixed in the current parameter vector
to provide the
estimating equations
where the notation
is used to indicate that this is not the full partial derivative vector for
in
due to not accounting for the effect of
on
. Now, combine these with the J standard GEE equations
to solve for joint estimates of
and
. Then, iteratively solve for
with
in the role of the gradient vector and the
matrix
in the role of the Hessian matrix.
has four component submatrices: the
matrix
for the mean coefficients as defined for standard GEE, the
matrix
for the
dispersion coefficients, the
matrix
, and its transpose
.
Note that
and
where
is the
vector with entries
for
. Consequently,
has entries
for
where
is the
vector with entries
for
.
has entries
for
where
is the
vector with entries
for
.
has columns
where
is the
vector with entries
for
and
. If offsets are included, they are carried along in equations without any effect on derivatives.
4.2. Parameter Estimation
Given a value for the vector
of all coefficient parameters, an estimate of the correlation parameter
can be based on the associated standardized residuals
. Calculate correlation estimates for the IND, EXCH, and AR1 correlation structures using the same formulas as before but computed with these more general standardized residuals. Iteratively solve
as follows. Given the current value
for
, the next value is given by
, thereby adapting Newton’s method with
in the role of the gradient vector and
in the role of the Hessian matrix. The solution to the estimating equations for observations indexed by T is denoted as
with associated correlation estimate
.
5. Extended Linear Mixed Modeling
5.1. Formulation
GEE modeling can be further extended to handle full parameter estimation through maximizing the likelihood-like function. Let
be the
vector of the mean, dispersion, and correlation parameters. The definition of the likelihood-like function
given for standard GEE holds using this more general parameter vector
. The likelihood-like function
is maximized in the coefficient parameter vector
by solving the estimating equations
where
is the vector of standard partial derivatives of
. The associated matrix
. In this case,
is a true gradient vector and
a true Hessian matrix. This approach is extended linear mixed modeling in the sense that if the entries of
were continuous variables treated as normally distributed with
, then it would be exactly linear mixed modeling. Formulations given in what follows are adapted from those of [8] .
The gradient vector
. The gradient sub-vector
has the same formulation as for extended GEE modeling, only now its entries are standard partial derivatives. The gradient subvector
has entries
where
is the
vector with entries
for
and
. The partial derivative
satisfies
where
,
tr denotes the trace function, and
For IND correlations,
. For EXCH correlations,
is the
matrix with diagonal entries all equal to 0 and off-diagonal entries all equal to 1. For AR1 correlations,
is the
matrix with diagonal entries all equal to 0 and off-diagonal entries equaling
in the
row and
column for
.
has nine component submatrices: the
matrix
for the mean parameters, the
matrix
for the dispersion parameters computed as for extended GEE modeling, the second partial derivative
for the correlation parameter, the
matrix
, and its transpose
, the
vector
and its transpose
, and the
vector
and its transpose
.
has entries
for
where
is the
vector with entries
for
. The second partial derivative
satisfies
where
For IND and EXCH correlations,
. For AR1 correlations,
is the
matrix with diagonal entries all equal to 0 and off-diagonal entries equaling
in the
row and
column for
.
has entries
for
where
is the
vector with entries
for
,
, and
.
has entries
for
.
has entries
for
.
5.2. Parameter Estimation
The parameter vector
is estimated by iteratively solving
as follows. Given the current value
for
, the next value is given by
,
thereby using Newton’s method with gradient vector
and Hessian matrix
. The estimation process can be stopped early if
does not increase by much compared to
. The solution to the estimating equations for observations indexed by T is denoted as
.
The covariance matrix for the parameter estimate vector
can be computed as
and the variances corresponding to its diagonal entries can be used to compute z tests of zero individual model parameters. These are useful for fixed models of theoretical importance. On the other hand, tests for parameters of adaptively generated models (as described in Section 6) are usually significant as a consequence of the model selection process, and so results for these tests are not reported for models generated in the example analyses.
6. Modeling Possibly Nonlinear Means and Dispersion over Time
Knafl and Ding [3] provide a detailed formulation for adaptively searching through alternative regression models for means and dispersions in a variety of contexts using adaptive fractional polynomial models [9] . A brief overview is provided here. These methods are used in the example analyses of individual cancer patient count/rate data presented later. Model selection proceeds through two phases. The expansion phase first grows the model adding in alternative power transforms of predictors for means and dispersions. The contraction phase then reduces the model to a parsimonious set of power transforms by removing transforms from the current model one at a time and adjusting the powers of the remaining transforms. Alternative models are evaluated using LCV scores. The modeling process is controlled by tolerance parameters indicating how much of a reduction in the LCV score can be tolerated at given stages of the process. Knafl and Ding [3] also provide a wide variety of example analyses demonstrating the usefulness of these adaptive regression methods. A description of these methods in the standard Poisson regression context is provided in [10] .
A SAS® (SAS Institute, Inc., Cary, NC) macro has been developed for generating adaptive analyses including the reported example analyses. This macro as well as data and code used to generate the results of the example analyses are available from the first author.
7. Example Analyses
7.1. Pain Flare Counts per Day
Figure 1 displays pain flare counts for Cancer Patient 1 over a period of 34 days.
Figure 1. Pain flare counts over time for cancer patient 1.
Pain flares range from 0 to 4 per day and tend to increase over time. Data are available for N = 33 days with a missing value for one day (day 33). These data were collected using Ecological Momentary Assessment (EMA) [11] as implemented in the mEMA app [12] .
Table 1 contains results for adaptive models for means and dispersions of pain flare counts over time using the two modeling approaches extended GEE modeling and extended linear mixed modeling and the three correlation structures IND, AR1, and EXCH. Power transforms reported in Table 1 were generated by adaptively searching through alternative power transforms using the methods described in Section 6. LCV scores are based on k = 5 folds with fold sizes ranging from 2 to 8 measurements and no empty folds. For extended GEE modeling, IND correlations generate the best LCV score 0.38018 over the three correlation structures. For extended linear mixed modeling, IND correlations also generate the best LCV score 0.40622. These results suggest that a Poisson process assumption is reasonable for these pain flare counts.
Extended linear mixed modeling generates better LCV scores than extended GEE for all three correlation structures. Moreover, computation times are much shorter ranging from 0.4 to 1.2 minutes compared to 13.9 to 35.5 minutes. These results suggest that extended linear mixed modeling is preferable for modeling these pain flare counts because it generates better LCV scores in less time. Consequently, only extended linear mixed modeling using IND correlations is considered further for these data, generating the model with means based on
without an intercept and dispersions based on
and
without an intercept. Figure 2 displays estimates of mean pain flare counts over time along with unit error bands over time (i.e., the mean ±1 extended standard deviation at each time) to account for variability about the means. Mean pain flare counts increase over time, somewhat close to linearly in time. Variability in pain flare counts is smaller in the middle of the period, somewhat larger at the start of the period and even larger at the end of the period. Figure 3 displays
Figure 2. Mean pain flare counts (middle curve) with unit error bounds for cancer patient 1.
Figure 3. Standardized residuals for the model of Figure 2.
Table 1. Adaptive models for means and dispersions of pain flare counts over time for alternative modeling approaches and correlation structures.
AR1—autoregressive of order 1; EXCH—exchangeable; GEE—generalized estimating equations; IND—independent; LCV—likelihood-like cross-validation; LMM—linear mixed modeling. a. The ith time value is denoted as t(i). A 1 corresponds to an intercept parameter; otherwise, the model has a zero intercept. b. Difference in minutes of clock times between the start and end of computations.
standardized residuals for this model, which range between ±2 without any extreme outliers, suggesting the model is a reasonable fit for these data.
The associated model generated using k = 10 folds has similar means based on
without an intercept and simpler dispersions based on
without an intercept. However, the 10-fold LCV score 0.38107 is smaller, suggesting that k = 5 is a better choice for these data. Moreover, there is one empty fold, suggesting that the choice of k = 10 folds is too large for these data with only N = 33 measurements. The associated model generated with k = 5 folds and assuming constant dispersions has a similar model for the means based on
without an intercept but a smaller LCV score 0.37031, suggesting that the dispersions for these data are reasonably treated as nonconstant over time.
7.2. As Needed Pain Medications Taken Counts per Day
Figure 4 displays as needed pain medications taken counts for Cancer Patient 2 over a period of 100 days. As needed pain medications taken counts range from 0 to 4 per day and tend to decrease over time. Data are available for N = 92 days with a missing value for eight other days (days 4, 14, 52, 56, 74, 81, 85, and 89). These data were also collected using the mEMA app.
Table 2 contains results for adaptive models for means and dispersions of as needed pain medications taken counts over time using the two modeling approaches extended GEE modeling and extended linear mixed modeling and the three correlation structures IND, AR1, and EXCH. LCV scores are based on k = 5 folds with fold sizes ranging from 13 to 21 measurements with no empty folds. For extended GEE modeling, AR1 correlations generate the best LCV score 0.41497 over the three correlation structures. For extended linear mixed modeling, AR1 correlations generate the best LCV score 0.40509. These results indicate that a Poisson process assumption may not be appropriate for these as needed pain medications taken counts.
Figure 4. As needed pain medications taken counts over time for cancer patient 2.
Table 2. Adaptive models for means and dispersions of as needed pain medications taken counts over time for alternative modeling approaches and correlation structures.
AR1—autoregressive of order 1; EXCH—exchangeable; GEE—generalized estimating equations; IND—independent; LCV—likelihood-like cross-validation; LMM—linear mixed modeling. a. The ith time value is denoted as t(i). A 1 corresponds to an intercept parameter; otherwise, the model has a zero intercept. b. Difference in minutes of clock times between the start and end of computations.
Extended GEE modeling generates a better LCV score than extended linear mixed modeling for the IND correlation structure, but the scores for these two approaches are not too different. Extended linear mixed modeling generates a better LCV score than extended GEE modeling for the EXCH correlation structure. Extended GEE modeling generates a better LCV score than extended linear mixed modeling for the AR1 correlation structure. Although this is the best overall LCV score, the associated model for extended linear mixed modeling is more parsimonious with an intercept and one time transform for the means compared to two time transforms and constant dispersions compared to dispersions based on and intercept and one time transform. Moreover, computation times are substantially shorter for extended linear mixed modeling ranging from 0.5 to 1.7 minutes compared to 84.5 to 222.7 minutes or 1.4 to 3.7 hours. These results suggest that extended linear mixed modeling is preferable for modeling these as needed pain medications taken counts because it generates competitive or better scores or more parsimonious models in substantially less time. Consequently, only extended linear mixed modeling using AR1 correlations are considered further for these data, generating the model with means based on
with an intercept, constant dispersions based on an intercept, and estimated autocorrelation
. Figure 5 displays estimates of mean as needed pain medications taken counts over time along with unit error bands over time (i.e., the mean ±1 extended standard deviation at each time) to account for variability about the means. Mean as needed pain medications taken counts decrease nonlinearly over time. Variability in as needed pain medications taken counts is close to constant over time. Figure 6 displays standardized residuals for this model, which range well within ±3 without any extreme outliers, suggesting the model provides a reasonable fit to these data.
Figure 5. Mean as needed pain medications taken counts (middle curve) with unit error bounds for cancer patient 2.
Figure 6. Standardized residuals for the model of Figure 5.
The associated model generated using k = 10 folds is about the same with means based on
with an intercept, constant dispersions, and estimated correlation
. The 10-fold LCV score 0.40958 is larger, suggesting that k = 10 is a better choice for these data. There are no empty folds. The associated model generated using k = 15 folds is similar with means based on
with an intercept, dispersions based on
without an intercept, and estimated correlation
. The 15-fold LCV score 0.40353 is smaller, suggesting that k = 10 is a better choice for these data. There are no empty folds. The associated model generated with k = 15 folds assuming constant dispersions has means based on
with an intercept and close 15-fold LCV score 0.40318. Consequently, models generated by 5, 10, and 15 folds using extended linear mixed modeling are not too different, suggesting that the results are reasonably robust to the choice of the number of folds.
7.3. Around the Clock Pain Medications Taken Rates per Day per Dose
Adherence data for around the clock pain medications were collected using pill bottles equipped with Medication Event Monitoring System (MEMS) devices (AARDEX North America, Boulder, CO) that recorded the date and time of each pill bottle opening and presumably of the taking of the pain medication [13] [14] . Cancer Patient 3 was monitored for a period of 91 days. Counts of around the clock pain medications taken were computed for 30 equal-sized subperiods of 3.03 days each, ranging from 0 to 18. Around the clock pain medications were to be taken five times a day by this patient. Methods for modeling such data assuming the special case of a Poisson process with constant dispersions are provided in [10] . Figure 7 displays around the clock pain medications taken rates per day per dose for Cancer Patient 3. The ideal rate of 1 means that the patient took around the clock pain medications at the appropriate rate over the associated time subperiod. Around the clock pain medications taken rates range from 0 to 1.19 per day per dose and tend to decrease over time. Data are available for N = 30 subperiods with none missing.
Table 3 contains results for adaptive models for means and dispersions of around the clock pain medications taken rates over time using the two modeling approaches extended GEE modeling and extended linear mixed modeling and the three correlation structures IND, AR1, and EXCH. LCV scores are based on k = 5 folds with fold sizes ranging from 2 to 8 measurements with no empty folds. For extended GEE modeling, EXCH correlations generate the best LCV score 0.051583 over the three correlation structures. For extended linear mixed modeling, AR1 correlations generate the best LCV score 0.053856, which is also the best overall LCV score for Table 3 models. For both modeling approaches, the LCV score for IND correlations is quite a bit smaller than the best LCV score over the three correlation structures. These results indicate that a Poisson process assumption may not be appropriate for these around the clock pain medications taken rates.
Figure 7. Around the clock pain medications taken rates per day per dose over time for cancer patient 3.
Table 3. Adaptive models for means and dispersions of around the clock pain medications taken rates per day per dose over time for alternative modeling approaches and correlation structures.
AR1—autoregressive of order 1; EXCH—exchangeable; GEE—generalized estimating equations; IND—independent; LCV—likelihood-like cross-validation; LMM—linear mixed modeling. a. The ith time value is denoted as t(i). A 1 corresponds to an intercept parameter; otherwise, the model has a zero intercept. b. Difference in minutes of clock times between the start and end of computations.
Extended linear mixed modeling generates better LCV scores than extended GEE for the IND and AR1 correlation structures. Its LCV score is smaller for the EXCH correlation structure, but its model is more parsimonious based on one time transform for the means with constant dispersions compared to one time transform plus an intercept for the means with constant dispersions. Furthermore, computation times are much shorter for extended linear mixed modeling ranging from 0.2 to 0.7 minutes compared to 5.1 to 11.9 minutes. These results suggest that extended linear mixed modeling is preferable for modeling these around the clock pain medications taken rates because it generates the best LCV score in less time. Consequently, only extended linear mixed modeling using AR1 correlations are considered further for these data, generating the model with means based on
without an intercept, dispersions based on
with an intercept, and estimated autocorrelation
. Figure 8 displays estimates of mean around the clock pain medications taken rates over time along with unit error bands over time (i.e., the mean ±1 extended standard deviation at each time) to account for variability about the means. Mean around the clock pain medications taken counts decrease close to linearly over time. Variability in around the clock pain medications taken rates is larger at the end of the period. Figure 9 displays standardized residuals for this model, which range well within ±3 without any extreme outliers, suggesting the model provides a reasonable fit to these data.
The associated model generated using k = 10 folds is somewhat similar with means based on
without an intercept, constant dispersions based on an intercept, and estimated autocorrelation
. However, the 10-fold LCV score 0.052023 is smaller, suggesting that k = 5 is a better choice for these data. Moreover, there is one empty fold, suggesting that the choice of k = 10
Figure 8. Mean around the clock pain medications taken rates per day per dose (middle curve) with unit error bounds for cancer patient 3.
Figure 9. Standardized residuals for the model of Figure 8.
folds is too large for these data with only N = 30 measurements. The associated model generated with k = 5 folds and assuming constant dispersions has a model for the means based on based on
without an intercept, an autocorrelation estimate of
, and a smaller LCV score 0.0.050386, suggesting that the dispersions for these data are reasonably treated as nonconstant over time.
8. Discussion
8.1. Summary
Methods are formulated for modeling individual patient count/rate data over time allowing for nonlinear trajectories for means, time-varying dispersions, and temporal correlation. Three correlation structures are considered including IND, EXCH, and spatial AR1 correlations. Two extensions of standard GEE modeling are considered. Extended GEE modeling augments standard GEE mean parameter estimating equations with dispersion parameter estimating equations while using the GEE approach for correlation parameter estimation. Extended linear mixed modeling estimates all model parameters using estimating equations for mean, dispersion, and correlation parameters. These new estimating equations are determined by partial derivatives of a likelihood-like function based on the multivariate normal density. This likelihood-like function is also used to define a likelihood-like cross-validation (LCV) score for evaluating models. LCV scores are used to control adaptive regression modeling of possibly nonlinear means and dispersions over time. It is also possible to generate penalized likelihood-like criteria for model selection generalizing standard penalized likelihood criteria [15] such as the commonly used Akaike information criterion (AIC) and Bayesian information criterion (BIC). Pan [16] has formulated a penalized model selection criterion related to the AIC called the quasi-likelihood information criterion (QIC) for GEE model selection, but the QIC score does not fully account for the correlation structure. Model selection criteria based on the likelihood-like function fully account for the correlation structure.
Example analyses using these methods are provided using three types of count/rate data for individual cancer patients including cancer pain flares per day, as needed cancer pain medications taken per day, and around the clock cancer pain medications taken per day per dose. Extended linear mixed modeling generates models with either better LCV scores or more parsimonious models than extended GEE modeling. Moreover, times to compute models are substantially smaller for extended linear mixed modeling than for extended GEE modeling. Time differences can be extreme for even moderate samples sizes, for example, analyses for the second example data set with 92 observations required at most 1.7 minutes for extended linear mixed modeling compared to up to 3.7 hours for extended GEE. These results indicate that extended linear mixed modeling is preferable for modeling individual patient count/rate data over time. This is likely to hold in more general modeling situations with other types of data and for combined data for multiple patients.
8.2. Alternative Approaches
The formulation provided here assumes that separate modeling of each patient’s longitudinal data is preferable to modeling the combined data for all patients. Separate modeling is a person-centered approach to modeling longitudinal data as opposed to a variable-centered approach using the combined data [17] [18] . This is only feasible when there are substantial numbers of time measurements for each patient. Modeling the combined data for all patients typically involves the assumption that means and dispersions for all patients are reasonably treated as having the same functional form. Knafl et al. [10] provide an example where this is not an appropriate assumption for a specific set of data on medication taken rates per day for HIV patients on antiretroviral medications. In any case, the methods considered here generalize to handle combined data for multiple patients, not only count/rate longitudinal data but also continuous and dichotomous longitudinal data, and not just data for cancer patients.
Multilevel (or hierarchical linear) modeling [19] [20] could alternatively be used to provide for individual patient differences, but that usually accounts for nonlinearity using polynomial models, often simple quadratic models. Polynomial models can be too simplistic for addressing general nonlinearity. Knafl and Ding [3] provide an example for independent data where the polynomial model generating the best LCV score for degrees 0-3 is the degree 0 constant model, but a nonlinear adaptive regression model generates a much better LCV score. Future research is needed to investigate general nonlinearity using adaptive regression methods applied to multilevel models as well as to random effects models [21] and to generalized linear mixed models [22] .
Spatial AR1 correlations generate better models than independent and exchangeable correlations for two of the three example data sets. This suggests consideration of autoregressive and/or moving average correlations [23] of orders more than 1. As the number of time points increases, even relatively large autocorrelations can generate small correlations for larger distances apart. For example, the third example data set had an estimated autocorrelation of
with integer time measurements ranging from 1 to 30 so that the smallest correlation is
. The second example data set had an even smaller estimated autocorrelation of
with an even larger range of 1 to 100 integer time measurements so that the smallest correlation is
. These results suggest consideration of banded correlation autoregressive structures with zero correlations for measurements further apart than some fixed amount.
Acknowledgements
This work was supported in part by the National Institutes of Health/National Institute of Nursing Research Awards 1R01NR017853 and RC1NR011591. S. H. Meghani was the Principal Investigator for these research projects. G. J. Knafl was a consultant on both projects.