Modeling Individual Patient Count/Rate Data over Time with Applications to Cancer Pain Flares and Cancer Pain Medication Usage

The purpose of this article is to investigate approaches for modeling individual patient count/rate data over time accounting for temporal correlation and non-constant dispersions while requiring reasonable amounts of time to search over alternative models for those data. This research addresses formulations for two approaches for extending generalized estimating equations (GEE) modeling. These approaches use a likelihood-like function based on the multivariate normal density. The first approach augments standard GEE equations to include equations for estimation of dispersion parameters. The second approach is based on estimating equations determined by partial derivatives of the likelihood-like function with respect to all model parameters and so extends linear mixed modeling. Three correlation structures are considered including independent, exchangeable, and spatial autoregressive of order 1 correlations. The likelihood-like function is used to formulate a likelihood-like cross-validation (LCV) score for use in evaluating models. Example analyses are presented using these two modeling approaches applied to three data sets of counts/rates over time for individual cancer patients including pain flares per day, as needed pain medications taken per day, and around the clock pain medications taken per day per dose. Means and dispersions are modeled as possibly nonlinear functions of time using adaptive regression modeling methods to search through alternative models compared using LCV scores. The results of these analyses demonstrate that extended linear mixed modeling is preferable for modeling individual patient count/rate data over time, because in example analyses, it either generates better LCV scores or more parsimonious models and requires substantially less time.


Introduction
An ongoing study (NIH/NINR 1R01NR017853) of patients with cancer is collecting daily longitudinal count/rate data including numbers of pain flares per day and numbers of as needed pain medications taken per day. Data are being collected for each study participant over periods of up to five months long. A completed study (NIH/NINR RC1NR011591) collected numbers for cancer patients over 3 months of around the clock pain medications taken per day per dose, that is, the number of times a medication is taken in a day relative to the number of doses that are supposed to be taken in a day. Standard assumptions of means linear in time and dispersions constant over time are not always appropriate for such data. Also, a Poisson process assumption of independence over time needs not always hold. A model selection score needs to be defined for evaluating models for the data and for use in searches over alternative models. Times to conduct these searches need to be as short as possible, especially as the number of time measurements increases.
Approaches are presented for modeling mean counts/rates over time separately for each individual patient controlling for temporal correlation as well as for time-varying dispersions. These approaches use Poisson regression methods, because count/rate data are being modeled. Generalized estimating equations (GEE) methods [1] [2] provide a natural choice for modeling correlations for such count variables. However, standard GEE methods have limited value, because they assume constant dispersions. Furthermore, GEE methods avoid specification of likelihood functions, which are useful for generating model selection criteria. In what follows, two extensions of GEE methods are formulated and evaluated that address temporal correlation and time-varying means and dispersions for repeated count/rate measurements. A likelihood-like function, that is, a function used like a likelihood but which needs not integrate to 1, is defined and used in computing parameter estimates for these extensions along with a model selection criterion for comparing alternative models. Example analyses of selected individual cancer patient count/rate data are presented using adaptive regression methods [3] for identifying possibly nonlinear trajectories for means and dispersions of counts/rates over time while controlling for temporal correlation.

Generalized Linear Modeling of Means
Let y t(i) denote count values for an individual patient observed at N distinct times within a general set T of times, that is, t(i) ∈ T = {t(i):1 ≤ i ≤ N}. Combine these into the N × 1 vector y. Let μ t(i) = Ey t(i) denote associated mean or expected counts and combine these into the N × 1 vector μ. Denote the residuals as e t(i) = y t(i) − μ t(i) for t(i) ∈ T and combine these into the N × 1 vector e = y -μ. Let x t(i), j denote predictor values over times t(i) ∈ T and over predictors indexed by 1 ≤ j ≤ J and combine these into the J × 1 vector x t(i) with transpose denoted by x t(i) T for t(i) ∈ T. Let X be the N × J matrix with rows x t(i) intercept parameter. Treat each y t(i) as Poisson distributed so that its variance is V (μ t(i) ) = μ t(i) .
The counts y t(i) sometimes have associated totals Y t(i) > 0, and then the model for the mean counts μ t(i) is converted to a model for the means μ t(i) ′ of the rates y t(i) ′ = y t(i) /Y t(i) using are the mean rates.

Time-Varying Dispersions
denote predictor values over times t(i) ∈ T and over predictors indexed by 1 ≤ j ≤ J′ and combine these into the J′ × 1 vectors x t(i) ′ for t(i) ∈ T. Let X′ be the N × J′ matrix with rows x t(i) ′T for 1 ≤ i ≤ N. Let β′ denote the associated J′ × 1 vector of coefficient parameters. Let φ t(i) denote dispersion values over times t(i) ∈ T satisfying ′T ⋅ β′ and define the extended variances as and the extended standard deviations as σ t(i) = φ t(i) 1/2 ⋅ μ t(i) 1/2 for t(i) ∈ T. Informally, these quantities extend the usual Poisson variances and standard deviations through multiplication by dispersions. These are used to compute the standardized residuals stde t(i) = e t(i) /σ t(i). Combine the extended standard deviations into the N × 1 vector σ. When x t(i), 1 ′ = 1 for t(i) ∈ T, the first entry β 1 ′ of β′ is an intercept parameter. The constant dispersion model corresponds to x t(i), 1 ′ = 1 for t(i) ∈ T with J′ = 1. This is the dispersion model used in standard GEE modeling.
When offsets o t(i) are used to convert the model for the counts y t(i) to a model for the rates y t(i) ′ , they can also be added to the dispersions. The dispersions then satisfy ′T ⋅ β′ + o t(i) so that the extended variances for the counts y t(i) are and then the variances for the rates y t(i) ′ are

Modeling Correlations
Denote the covariance matrix for the count vector y as Σ. Use the GEE approach [1] [2] to model the covariance matrix for y as Σ = Diag(σ)·R(ρ)·Diag(σ) where Diag(σ) is the N × N diagonal matrix with diagonal entries σ t(i) for t(i) ∈ T and R(ρ) is a N × N correlation matrix with diagonal entries 1 and off diagonal entries R t(i),t(i′) for 1 ≤ i ≠ i′ ≤ N determined by a correlation parameter ρ varying with the assumed correlation structure. Under independent (IND) correlations, R t(i),t(i′) = ρ IND = 0 for 1 ≤ i ≠ i′ ≤ N. A Poisson process generates such correlations. Under exchangeable (EXCH) correlations, R t(i),t(i′) = ρ EXCH for 1 ≤ i ≠ i′ ≤ N, that is, the correlations are constant. Under autoregressive of order 1 (AR1) correlations, These differences are all assumed to be integers so that the correlations R t(i),t(i′) are all well-defined. In general, R t(i),t(i′) are spatial AR1 correlations. The special case of non-spatial AR1 correlations with t(i) = i treats times as equally spaced. The parameter ρ AR1 is called the autocorrelation.

Possible Extensions
The above formulation can be extended to address repeated measurements of types other than counts/rates and for multiple patients. More complex correlation structures based on multiple correlation parameters can also be considered. One such example is unstructured correlations with different correlations for different pairs of measurements, but this requires data from multiple patients to be reasonably estimated. These extensions are not addressed further.

Notation and Parameter Estimation
Under standard GEE modeling, dispersions are treated as a constant φ 0 so that the covariance matrix satisfies where V(μ) is the N × 1 vector with entries V(μ t(i) ) = μ t(i) for t(i) ∈ T. The generalized estimating equations are given by g(β) = 0 where 0 is the J × 1 vector with all zero entries, g(β) = D T · Σ −1 · e, and the N × J matrix D = ∂μ/∂β with entries D t(i),j = ∂μ t(i) / ∂β j = x t(i),j · μ t(i) for t(i) ∈ T and 1 ≤ j ≤ J. Let H(β) = − D T · Σ −1 · D. Note that in the general GEE context with correlated outcomes for multiple subjects, the formulation for g(β) would equal a sum of terms like D T · Σ −1 · e for each subject and H(β) would equal a sum of terms like − D T · Σ −1 · D for each subject. Only one such term is needed here since data for only one subject/patient are being modeled. The GEE process for estimating β iteratively solves g(β) = 0 as follows. Given the current value β u for β, the next value is given by β u+1 = β u − H −1 (β u ) · g(β u ), thereby adapting Newton's method with g(β) in the role of the gradient vector and H(β) in the role of the Hessian matrix.
The constant dispersion parameter φ 0 is estimated using the Pearson residuals Pe t(i) (β) = e t(i) /V 1/2 (μ t(i) ) evaluated at a given value for the mean coefficient parameter vector β. The bias-adjusted estimate φ 0 (β) of the dispersion parameter φ 0 satisfies φ 0 (β) = ∑ i = 1 N P e t(i) 2 (β)/(N − J) assuming N -J > 0. Next, the correlation parameter ρ(β) is estimated using standardized errors for t(i) ∈ T as follows. The IND correlation structure has no need for an estimate. For the EXCH correlation structure and a given value β for the mean parameter vector, ρ EXCH can be estimated by assuming N · (N − 1)/2 − J > 0. For the AR1 correlation structure and a given value β for the mean parameter vector, the autocorrelation ρ AR1 can be estimated by assuming N − 1 − J > 0. In the non-spatial AR1 special case, because |t(i) − t i(+ 1)| = 1 for 1 ≤ i < N.
For any correlation structure, once the GEE estimate β(T) of the coefficient parameter vector β is computed using the observations indexed by t(i) ∈ T, the GEE estimate of the dispersion parameter φ 0 is φ 0 (T) = φ 0 (β(T)). The GEE estimate of the correlation parameter ρ is ρ(T) = ρ(β(T)) computed using β(T) and φ 0 (T).

The Likelihood-Like Function
Let θ = (β T φ 0 ) T be the (J + 1) × 1 vector of the GEE mean and dispersion parameters. The correlation parameter ρ is a function of β and φ 0 and so has not been included in θ. Use the multivariate normal likelihood to define the likelihood-like function L(T;θ) satisfying ℓ (T ; θ) = log(L(T ; θ)) = − e T ⋅ Σ −1 ⋅ e/2 − log( Σ )/2 − N ⋅ log(2 ⋅ π)/2 where |Σ| is the determinant of the covariance matrix Σ. The vector ∂ℓ(T;θ)/∂β of partial derivatives of ℓ(T;θ) can be expressed as the sum of two terms. The first term corresponds to differentiating the residual vector part e of ℓ(T;θ) with respect to β holding the covariance part Σ fixed in β and equals g(β), the gradient-like quantity used in standard GEE modeling.
This fact seems to have been first recognized by Chaganty [6]. One advantage for having a likelihood-like function for GEE models is that it can be used to compute parameter estimates. Another is that it can be used to compute model selection criteria not otherwise available for GEE modeling.

Likelihood-Like Cross-Validation
Burman [7] defined k-fold cross-validation with observations partitioned into k disjoint subsets called folds. Fold observations are predicted using parameter estimates computed using the data from the other folds. In k-fold likelihood-like cross-validation (LCV), these deleted fold predictions are scored using the associated likelihood-like function L. Randomly partition the times t(i) ∈ T into k disjoint folds T(f) for 1 ≤ f ≤ k. Use the same initial seed for randomization with all models under consideration so that their LCV scores are comparable. Let θ(T\T(f)) denote the estimate of θ using the data with times in the complement T\T(f) of the fold T(f). For 1 ≤ f ≤ k, let T + (f) denote the union of all folds T(f) for 1 ≤ f ′ ≤ f with T + (0) the empty fold and set L(T + (0);θ) = 1. Define the LCV score to where LCV f is defined as the conditional likelihood-like term for the data in fold T(f) conditioned on the data in the union T + (f − 1) of the prior folds using the deleted estimate θ(T\T(f)) of the parameter vector θ. Formally, Because fold assignment is random, folds can be empty when the number k of folds is large relative to the number N of measurements, and then those folds are dropped from the computation of the LCV score. Larger LCV scores indicate better models. Note that even if the full data are non-spatial with observations at consecutive integer times t(i) = i for 1 ≤ i ≤ N, the folds T(f) and the fold unions T + (f) are not consecutive integer times except in rare cases and so require more general handling.

Formulation
GEE modeling can be extended to handle nonconstant dispersions. Let θ = (β T β′ T ) T be the (J + J′) × 1 vector of the mean and dispersion parameters. The definition of the likelihoodlike function L(T;θ) given for standard GEE holds using the more general parameter vector θ. Differentiate ℓ(T;θ) = log(L(T;θ)) with respect to the vector β′ of dispersion coefficient parameters while holding the correlation parameter ρ fixed in the current parameter vector β′ to provide the J′ estimating equations g(β′) = ∂′ℓ(T;θ)/∂′β′ = 0 where the notation ∂′ℓ(T;θ)/∂′β′ is used to indicate that this is not the full partial derivative vector for ℓ(T;θ) in β′ due to not accounting for the effect of β′ on ρ. Now, combine these with the J standard GEE equations g(β) = 0 to solve for joint estimates of β and β′. Then, iteratively solve for Note that log e ( Σ ) = log e ( R(ρ) ) where σinvx j ′ is the N × 1 vector with entries σinvx t(i), j ′ = x t(i), j ′ / 2 ⋅ σ t(i) for t(i) ∈ T and 1 ≤ j ≤ J′. If offsets are included, they are carried along in equations without any effect on derivatives.

Parameter Estimation
Given a value for the vector θ of all coefficient parameters, an estimate of the correlation parameter ρ can be based on the associated standardized residuals stde t(i) . Calculate correlation estimates for the IND, EXCH, and AR1 correlation structures using the same formulas as before but computed with these more general standardized residuals. Iteratively solve g(θ) = 0 as follows. Given the current value θ u for θ, the next value is given by  The gradient vector g(θ) = (g(β) T g(β′) T g(ρ)) T . The gradient sub-vector g(β′) = ∂ℓ(T;θ)/∂β′ has the same formulation as for extended GEE modeling, only now its entries are standard partial derivatives. The gradient subvector g(β) = ∂ℓ(T;θ)/∂β has entries

Parameter Estimation
The parameter vector θ is estimated by iteratively solving g(θ) = 0 as follows. Given the current value θ u for θ, the next value is given by thereby using Newton's method with gradient vector g(θ) and Hessian matrix H(θ). The estimation process can be stopped early if ℓ(T;θ u+1 ) does not increase by much compared to ℓ(T;θ u) . The solution to the estimating equations T for observations indexed by T is denoted The covariance matrix for the parameter estimate vector θ(T) can be computed as −H −1 (θ(T)) and the variances corresponding to its diagonal entries can be used to compute z tests of zero individual model parameters. These are useful for fixed models of theoretical importance. On the other hand, tests for parameters of adaptively generated models (as described in Section 6) are usually significant as a consequence of the model selection process, and so results for these tests are not reported for models generated in the example analyses.

Modeling Possibly Nonlinear Means and Dispersion over Time
Knafl and Ding [3] provide a detailed formulation for adaptively searching through alternative regression models for means and dispersions in a variety of contexts using adaptive fractional polynomial models [9]. A brief overview is provided here. These methods are used in the example analyses of individual cancer patient count/rate data presented later. Model selection proceeds through two phases. The expansion phase first grows the model adding in alternative power transforms of predictors for means and dispersions. The contraction phase then reduces the model to a parsimonious set of power transforms by removing transforms from the current model one at a time and adjusting the powers of the remaining transforms. Alternative models are evaluated using LCV scores. The modeling process is controlled by tolerance parameters indicating how much of a reduction in the LCV score can be tolerated at given stages of the process. Knafl and Ding [3] also provide a wide variety of example analyses demonstrating the usefulness of these adaptive regression methods. A description of these methods in the standard Poisson regression context is provided in [10].
A SAS ® (SAS Institute, Inc., Cary, NC) macro has been developed for generating adaptive analyses including the reported example analyses. This macro as well as data and code used to generate the results of the example analyses are available from the first author. days with a missing value for one day (day 33). These data were collected using Ecological Momentary Assessment (EMA) [11] as implemented in the mEMA app [12]. Table 1 contains results for adaptive models for means and dispersions of pain flare counts over time using the two modeling approaches extended GEE modeling and extended linear mixed modeling and the three correlation structures IND, AR1, and EXCH. Power transforms reported in Table 1 were generated by adaptively searching through alternative power transforms using the methods described in Section 6. LCV scores are based on k = 5 folds with fold sizes ranging from 2 to 8 measurements and no empty folds. For extended GEE modeling, IND correlations generate the best LCV score 0.38018 over the three correlation structures. For extended linear mixed modeling, IND correlations also generate the best LCV score 0.40622. These results suggest that a Poisson process assumption is reasonable for these pain flare counts.

Pain Flare Counts per Day
Extended linear mixed modeling generates better LCV scores than extended GEE for all three correlation structures. Moreover, computation times are much shorter ranging from 0.4 to 1.2 minutes compared to 13.9 to 35.5 minutes. These results suggest that extended linear mixed modeling is preferable for modeling these pain flare counts because it generates better LCV scores in less time. Consequently, only extended linear mixed modeling using IND correlations is considered further for these data, generating the model with means based on t(i) 0.49 without an intercept and dispersions based on t(i) 8.37 and t(i) 0.5 without an intercept. Figure 2 displays estimates of mean pain flare counts over time along with unit error bands over time (i.e., the mean ±1 extended standard deviation at each time) to account for variability about the means. Mean pain flare counts increase over time, somewhat close to linearly in time. Variability in pain flare counts is smaller in the middle of the period, somewhat larger at the start of the period and even larger at the end of the period. Figure 3 displays standardized residuals for this model, which range between ±2 without any extreme outliers, suggesting the model is a reasonable fit for these data.
The associated model generated using k = 10 folds has similar means based on t(i) 0.5 without an intercept and simpler dispersions based on t(i) 0.2 without an intercept. However, the 10-fold LCV score 0.38107 is smaller, suggesting that k = 5 is a better choice for these data. Moreover, there is one empty fold, suggesting that the choice of k = 10 folds is too large for these data with only N = 33 measurements. The associated model generated with k = 5 folds and assuming constant dispersions has a similar model for the means based on t(i) 0.53 without an intercept but a smaller LCV score 0.37031, suggesting that the dispersions for these data are reasonably treated as nonconstant over time. . These data were also collected using the mEMA app. Table 2 contains results for adaptive models for means and dispersions of as needed pain medications taken counts over time using the two modeling approaches extended GEE modeling and extended linear mixed modeling and the three correlation structures IND, AR1, and EXCH. LCV scores are based on k = 5 folds with fold sizes ranging from 13 to 21 measurements with no empty folds. For extended GEE modeling, AR1 correlations generate the best LCV score 0.41497 over the three correlation structures. For extended linear mixed modeling, AR1 correlations generate the best LCV score 0.40509. These results indicate that a Poisson process assumption may not be appropriate for these as needed pain medications taken counts.

As Needed Pain Medications Taken Counts per Day
Extended GEE modeling generates a better LCV score than extended linear mixed modeling for the IND correlation structure, but the scores for these two approaches are not too different. Extended linear mixed modeling generates a better LCV score than extended GEE modeling for the EXCH correlation structure. Extended GEE modeling generates a better LCV score than extended linear mixed modeling for the AR1 correlation structure.
Although this is the best overall LCV score, the associated model for extended linear mixed modeling is more parsimonious with an intercept and one time transform for the means compared to two time transforms and constant dispersions compared to dispersions based on and intercept and one time transform. Moreover, computation times are substantially shorter for extended linear mixed modeling ranging from 0.5 to 1.7 minutes compared to 84.5 to 222.7 minutes or 1.4 to 3.7 hours. These results suggest that extended linear mixed modeling is preferable for modeling these as needed pain medications taken counts because it generates competitive or better scores or more parsimonious models in substantially less time. Consequently, only extended linear mixed modeling using AR1 correlations are considered further for these data, generating the model with means based on t(i) 0.4 with an intercept, constant dispersions based on an intercept, and estimated autocorrelation ρ AR1 = 0.45. Figure 5 displays estimates of mean as needed pain medications taken counts over time along with unit error bands over time (i.e., the mean ±1 extended standard deviation at each time) to account for variability about the means. Mean as needed pain medications taken counts decrease nonlinearly over time. Variability in as needed pain medications taken counts is close to constant over time. Figure 6 displays standardized residuals for this model, which range well within ±3 without any extreme outliers, suggesting the model provides a reasonable fit to these data.
The associated model generated using k = 10 folds is about the same with means based on t(i) 0.5 with an intercept, constant dispersions, and estimated correlation ρ AR1 = 0.45. The 10-fold LCV score 0.40958 is larger, suggesting that k = 10 is a better choice for these data. There are no empty folds. The associated model generated using k = 15 folds is similar with means based on t(i) 0.5 with an intercept, dispersions based on t(i) 0.07 without an intercept, and estimated correlation ρ AR1 = 0.46. The 15-fold LCV score 0.40353 is smaller, suggesting that k = 10 is a better choice for these data. There are no empty folds. The associated model generated with k = 15 folds assuming constant dispersions has means based on t(i) 0.5 with an intercept and close 15-fold LCV score 0.40318. Consequently, models generated by 5, 10, and 15 folds using extended linear mixed modeling are not too different, suggesting that the results are reasonably robust to the choice of the number of folds.

Around the Clock Pain Medications Taken Rates per Day per Dose
Adherence data for around the clock pain medications were collected using pill bottles equipped with Medication Event Monitoring System (MEMS) devices (AARDEX North America, Boulder, CO) that recorded the date and time of each pill bottle opening and presumably of the taking of the pain medication [13] [14]. Cancer Patient 3 was monitored for a period of 91 days. Counts of around the clock pain medications taken were computed for 30 equal-sized subperiods of 3.03 days each, ranging from 0 to 18. Around the clock pain medications were to be taken five times a day by this patient. Methods for modeling such data assuming the special case of a Poisson process with constant dispersions are provided in [10]. Figure 7 displays around the clock pain medications taken rates per day per dose for Cancer Patient 3. The ideal rate of 1 means that the patient took around the clock pain medications at the appropriate rate over the associated time subperiod. Around the clock pain medications taken rates range from 0 to 1.19 per day per dose and tend to decrease over time. Data are available for N = 30 subperiods with none missing. Table 3 contains results for adaptive models for means and dispersions of around the clock  pain medications taken rates over time using the two modeling approaches extended GEE  modeling and extended linear mixed modeling and the three correlation structures IND,  AR1, and EXCH. LCV scores are based on k = 5 folds with fold sizes ranging from 2 to 8 measurements with no empty folds. For extended GEE modeling, EXCH correlations generate the best LCV score 0.051583 over the three correlation structures. For extended linear mixed modeling, AR1 correlations generate the best LCV score 0.053856, which is also the best overall LCV score for Table 3 models. For both modeling approaches, the LCV score for IND correlations is quite a bit smaller than the best LCV score over the three correlation structures. These results indicate that a Poisson process assumption may not be appropriate for these around the clock pain medications taken rates.
Extended linear mixed modeling generates better LCV scores than extended GEE for the IND and AR1 correlation structures. Its LCV score is smaller for the EXCH correlation structure, but its model is more parsimonious based on one time transform for the means with constant dispersions compared to one time transform plus an intercept for the means with constant dispersions. Furthermore, computation times are much shorter for extended linear mixed modeling ranging from 0.2 to 0.7 minutes compared to 5.1 to 11.9 minutes. These results suggest that extended linear mixed modeling is preferable for modeling these around the clock pain medications taken rates because it generates the best LCV score in less time. Consequently, only extended linear mixed modeling using AR1 correlations are considered further for these data, generating the model with means based on t(i) 1.1 without an intercept, dispersions based on t(i) 6.1 with an intercept, and estimated autocorrelation ρ AR1 = 0.75. Figure 8 displays estimates of mean around the clock pain medications taken rates over time along with unit error bands over time (i.e., the mean ±1 extended standard deviation at each time) to account for variability about the means. Mean around the clock pain medications taken counts decrease close to linearly over time. Variability in around the clock pain medications taken rates is larger at the end of the period. Figure 9 displays standardized residuals for this model, which range well within ±3 without any extreme outliers, suggesting the model provides a reasonable fit to these data.
The associated model generated using k = 10 folds is somewhat similar with means based on t(i) 0.4 without an intercept, constant dispersions based on an intercept, and estimated autocorrelation ρ AR1 = 0.76. However, the 10-fold LCV score 0.052023 is smaller, suggesting that k = 5 is a better choice for these data. Moreover, there is one empty fold, suggesting that the choice of k = 10 folds is too large for these data with only N = 30 measurements. The associated model generated with k = 5 folds and assuming constant dispersions has a model for the means based on based on t(i) 1.01 without an intercept, an autocorrelation estimate of ρ AR1 = 0.75, and a smaller LCV score 0.0.050386, suggesting that the dispersions for these data are reasonably treated as nonconstant over time.

Summary
Methods are formulated for modeling individual patient count/rate data over time allowing for nonlinear trajectories for means, time-varying dispersions, and temporal correlation. Three correlation structures are considered including IND, EXCH, and spatial AR1 correlations. Two extensions of standard GEE modeling are considered. Extended GEE modeling augments standard GEE mean parameter estimating equations with dispersion parameter estimating equations while using the GEE approach for correlation parameter estimation. Extended linear mixed modeling estimates all model parameters using estimating equations for mean, dispersion, and correlation parameters. These new estimating equations are determined by partial derivatives of a likelihood-like function based on the multivariate normal density. This likelihood-like function is also used to define a likelihood-like crossvalidation (LCV) score for evaluating models. LCV scores are used to control adaptive regression modeling of possibly nonlinear means and dispersions over time. It is also possible to generate penalized likelihood-like criteria for model selection generalizing standard penalized likelihood criteria [15] such as the commonly used Akaike information criterion (AIC) and Bayesian information criterion (BIC). Pan [16] has formulated a penalized model selection criterion related to the AIC called the quasi-likelihood information criterion (QIC) for GEE model selection, but the QIC score does not fully account for the correlation structure. Model selection criteria based on the likelihood-like function fully account for the correlation structure.
Example analyses using these methods are provided using three types of count/rate data for individual cancer patients including cancer pain flares per day, as needed cancer pain medications taken per day, and around the clock cancer pain medications taken per day per dose. Extended linear mixed modeling generates models with either better LCV scores or more parsimonious models than extended GEE modeling. Moreover, times to compute models are substantially smaller for extended linear mixed modeling than for extended GEE modeling. Time differences can be extreme for even moderate samples sizes, for example, analyses for the second example data set with 92 observations required at most 1.7 minutes for extended linear mixed modeling compared to up to 3.7 hours for extended GEE. These results indicate that extended linear mixed modeling is preferable for modeling individual patient count/rate data over time. This is likely to hold in more general modeling situations with other types of data and for combined data for multiple patients.

Alternative Approaches
The formulation provided here assumes that separate modeling of each patient's longitudinal data is preferable to modeling the combined data for all patients. Separate modeling is a person-centered approach to modeling longitudinal data as opposed to a variable-centered approach using the combined data [17] [18]. This is only feasible when there are substantial numbers of time measurements for each patient. Modeling the combined data for all patients typically involves the assumption that means and dispersions for all patients are reasonably treated as having the same functional form. Knafl et al. [10] provide an example where this is not an appropriate assumption for a specific set of data on medication taken rates per day for HIV patients on antiretroviral medications. In any case, the methods considered here generalize to handle combined data for multiple patients, not only count/rate longitudinal data but also continuous and dichotomous longitudinal data, and not just data for cancer patients.
Multilevel (or hierarchical linear) modeling [19] [20] could alternatively be used to provide for individual patient differences, but that usually accounts for nonlinearity using polynomial models, often simple quadratic models. Polynomial models can be too simplistic for addressing general nonlinearity. Knafl and Ding [3] provide an example for independent data where the polynomial model generating the best LCV score for degrees 0-3 is the degree 0 constant model, but a nonlinear adaptive regression model generates a much better LCV score. Future research is needed to investigate general nonlinearity using adaptive regression methods applied to multilevel models as well as to random effects models [21] and to generalized linear mixed models [22].
Spatial AR1 correlations generate better models than independent and exchangeable correlations for two of the three example data sets. This suggests consideration of autoregressive and/or moving average correlations [23] of orders more than 1. As the number of time points increases, even relatively large autocorrelations can generate small correlations for larger distances apart. For example, the third example data set had an estimated autocorrelation of ρ AR1 = 0.75 with integer time measurements ranging from 1 to 30 so that the smallest correlation is 0.75 29 = 0.0002. The second example data set had an even smaller estimated autocorrelation of ρ AR1 = 0.45 with an even larger range of 1 to 100 integer time measurements so that the smallest correlation is 0.45 99 = 4.7 × 10 −35 .These results suggest consideration of banded correlation autoregressive structures with zero correlations for measurements further apart than some fixed amount. Pain flare counts over time for cancer patient 1.   a. The i th time value is denoted as t(i). A 1 corresponds to an intercept parameter; otherwise, the model has a zero intercept. b.
Difference in minutes of clock times between the start and end of computations.