^{1}

^{*}

^{2}

The purpose of this article is to investigate approaches for modeling individual patient count/rate data over time accounting for temporal correlation and non - constant dispersions while requiring reasonable amounts of time to search over alternative models for those data. This research addresses formulations for two approaches for extending generalized estimating equations (GEE) modeling. These approaches use a likelihood-like function based on the multivariate normal density. The first approach augments standard GEE equations to include equations for estimation of dispersion parameters. The second approach is based on estimating equations determined by partial derivatives of the likelihood-like function with respect to all model parameters and so extends linear mixed modeling. Three correlation structures are considered including independent, exchangeable, and spatial autoregressive of order 1 correlations. The likelihood-like function is used to formulate a likelihood-like cross-validation (LCV) score for use in evaluating models. Example analyses are presented using these two modeling approaches applied to three data sets of counts/rates over time for individual cancer patients including pain flares per day, as needed pain medications taken per day, and around the clock pain medications taken per day per dose. Means and dispersions are modeled as possibly nonlinear functions of time using adaptive regression modeling methods to search through alternative models compared using LCV scores. The results of these analyses demonstrate that extended linear mixed modeling is preferable for modeling individual patient count/rate data over time , because in example analyses , it either generates better LCV scores or more parsimonious models and requires substantially less time.

An ongoing study (NIH/NINR 1R01NR017853) of patients with cancer is collecting daily longitudinal count/rate data including numbers of pain flares per day and numbers of as needed pain medications taken per day. Data are being collected for each study participant over periods of up to five months long. A completed study (NIH/NINR RC1NR011591) collected numbers for cancer patients over 3 months of around the clock pain medications taken per day per dose, that is, the number of times a medication is taken in a day relative to the number of doses that are supposed to be taken in a day. Standard assumptions of means linear in time and dispersions constant over time are not always appropriate for such data. Also, a Poisson process assumption of independence over time needs not always hold. A model selection score needs to be defined for evaluating models for the data and for use in searches over alternative models. Times to conduct these searches need to be as short as possible, especially as the number of time measurements increases.

Approaches are presented for modeling mean counts/rates over time separately for each individual patient controlling for temporal correlation as well as for time-varying dispersions. These approaches use Poisson regression methods, because count/rate data are being modeled. Generalized estimating equations (GEE) methods [

Let y t ( i ) denote count values for an individual patient observed at N distinct times within a general set T of times, that is, t ( i ) ∈ T = { t ( i ) : 1 ≤ i ≤ N } . Combine these into the N × 1 vector y . Let μ t ( i ) = E y t ( i ) denote associated mean or expected counts and combine these into the N × 1 vector μ . Denote the residuals as e t ( i ) = y t ( i ) − μ t ( i ) for t ( i ) ∈ T and combine these into the N × 1 vector e = y − μ . Let x t ( i ) , j denote predictor values over times t ( i ) ∈ T and over predictors indexed by 1 ≤ j ≤ J and combine these into the J × 1 vector x t ( i ) with transpose denoted by x t ( i ) T for t ( i ) ∈ T . Let X be the N × J matrix with rows x t ( i ) T for 1 ≤ i ≤ N . Let β denote the associated J × 1 vector of coefficient parameters. Use generalized linear models [

The counts y t ( i ) sometimes have associated totals Y t ( i ) > 0 , and then the model for the mean counts μ t ( i ) is converted to a model for the means μ ′ t ( i ) of the rates y ′ t ( i ) = y t ( i ) / Y t ( i ) using offsets o t ( i ) = log ( Y t ( i ) ) . Formally, replace x t ( i ) T ⋅ β by x t ( i ) T ⋅ β + o t ( i ) so that mean counts are μ t ( i ) = exp ( x t ( i ) T ⋅ β + o t ( i ) ) and then

μ ′ t ( i ) = E y ′ t ( i ) = μ t ( i ) / Y t ( i ) = exp ( x t ( i ) T ⋅ β )

are the mean rates.

Let x ′ t ( i ) , j denote predictor values over times t ( i ) ∈ T and over predictors indexed by 1 ≤ j ≤ J ′ and combine these into the J ′ × 1 vectors x ′ t ( i ) for t ( i ) ∈ T . Let X ′ be the N × J ′ matrix with rows x ′ t ( i ) T for 1 ≤ i ≤ N . Let β ′ denote the associated J ′ × 1 vector of coefficient parameters. Let φ t ( i ) denote dispersion values over times t ( i ) ∈ T satisfying log ( φ t ( i ) ) = x ′ t ( i ) T ⋅ β ′ and define the extended variances as

σ t ( i ) 2 = φ t ( i ) ⋅ V ( μ t ( i ) ) = φ t ( i ) ⋅ μ t ( i )

and the extended standard deviations as σ t ( i ) = φ t ( i ) 1 / 2 ⋅ μ t ( i ) 1 / 2 for t ( i ) ∈ T . Informally, these quantities extend the usual Poisson variances and standard deviations through multiplication by dispersions. These are used to compute the standardized residuals s t d e t ( i ) = e t ( i ) / σ t ( i ) . Combine the extended standard deviations into the N × 1 vector σ . When x ′ t ( i ) , 1 = 1 for t ( i ) ∈ T , the first entry β ′ 1 of β ′ is an intercept parameter. The constant dispersion model corresponds to x ′ t ( i ) , 1 = 1 for t ( i ) ∈ T with J ′ = 1 . This is the dispersion model used in standard GEE modeling.

When offsets o t ( i ) are used to convert the model for the counts y t ( i ) to a model for the rates y ′ t ( i ) , they can also be added to the dispersions. The dispersions then satisfy log ( φ t ( i ) ) = x ′ t ( i ) T ⋅ β ′ + o t ( i ) so that the extended variances for the counts y t ( i ) are

σ t ( i ) 2 = φ t ( i ) ⋅ μ t ( i ) = exp ( x ′ t ( i ) T ⋅ β ′ ) ⋅ exp ( x t ( i ) T ⋅ β ) ⋅ exp 2 ( o t ( i ) )

and then the variances for the rates y ′ t ( i ) are

σ ′ t ( i ) 2 = σ t ( i ) 2 / T t ( i ) 2 = φ ′ t ( i ) ⋅ μ ′ t ( i )

where

φ ′ t ( i ) = exp ( x ′ t ( i ) T ⋅ β ′ ) .

Denote the covariance matrix for the count vector y as Σ . Use the GEE approach [

The above formulation can be extended to address repeated measurements of types other than counts/rates and for multiple patients. More complex correlation structures based on multiple correlation parameters can also be considered. One such example is unstructured correlations with different correlations for different pairs of measurements, but this requires data from multiple patients to be reasonably estimated. These extensions are not addressed further.

Under standard GEE modeling, dispersions are treated as a constant φ 0 so that the covariance matrix satisfies

Σ = φ 0 ⋅ Diag ( V 1 / 2 ( μ ) ) ⋅ R ( ρ ) ⋅ Diag ( V 1 / 2 ( μ ) )

where V ( μ ) is the N × 1 vector with entries V ( μ t ( i ) ) = μ t ( i ) for t ( i ) ∈ T . The generalized estimating equations are given by g ( β ) = 0 where 0 is the J × 1 vector with all zero entries, g ( β ) = D T ⋅ Σ − 1 ⋅ e , and the N × J matrix D = ∂ μ / ∂ β with entries D t ( i ) , j = ∂ μ t ( i ) / ∂ β j = x t ( i ) , j ⋅ μ t ( i ) for t ( i ) ∈ T and 1 ≤ j ≤ J . Let H ( β ) = − D T ⋅ Σ − 1 ⋅ D . Note that in the general GEE context with correlated outcomes for multiple subjects, the formulation for g ( β ) would equal a sum of terms like D T ⋅ Σ − 1 ⋅ e for each subject and H ( β ) would equal a sum of terms like − D T ⋅ Σ − 1 ⋅ D for each subject. Only one such term is needed here since data for only one subject/patient are being modeled. The GEE process for estimating β iteratively solves g ( β ) = 0 as follows. Given the current value β u for β , the next value is given by β u + 1 = β u − H − 1 ( β u ) ⋅ g ( β u ) , thereby adapting Newton’s method with g ( β ) in the role of the gradient vector and H ( β ) in the role of the Hessian matrix.

The constant dispersion parameter φ 0 is estimated using the Pearson residuals P e t ( i ) ( β ) = e t ( i ) / V 1 / 2 ( μ t ( i ) ) evaluated at a given value for the mean coefficient parameter vector β . The bias-adjusted estimate φ 0 ( β ) of the dispersion parameter φ 0 satisfies φ 0 ( β ) = ∑ i = 1 N P e t ( i ) 2 ( β ) / ( N − J ) assuming N − J > 0 . Next, the correlation parameter ρ ( β ) is estimated using standardized errors

s t d e t ( i ) ( β ) = φ 0 − 1 / 2 ( β ) ⋅ P e t ( i ) ( β )

for t ( i ) ∈ T as follows. The IND correlation structure has no need for an estimate. For the EXCH correlation structure and a given value β for the mean parameter vector, ρ EXCH can be estimated by

ρ EXCH ( β ) = ∑ i = 1 N − 1 ∑ i ′ = i + 1 N s t d e t ( i ) ( β ) ⋅ s t d e t ( i ′ ) ( β ) / ( N ⋅ ( N − 1 ) / 2 − J )

assuming N ⋅ ( N − 1 ) / 2 − J > 0 . For the AR1 correlation structure and a given value β for the mean parameter vector, the autocorrelation ρ AR1 can be estimated by

ρ AR1 ( β ) = ∑ i = 1 N − 1 ( s t d e t ( i ) ( β ) ⋅ s t d e t ( i + 1 ) ( β ) ) 1 / ( | t ( i ) − t ( i + 1 ) | ) / ( N − 1 − J )

assuming N − 1 − J > 0 . In the non-spatial AR1 special case,

ρ AR1 ( β ) = ∑ i = 1 N − 1 ( s t d e t ( i ) ( β ) ⋅ s t d e t ( i + 1 ) ( β ) ) / ( N − 1 − J )

because | t ( i ) − t ( i + 1 ) | = 1 for 1 ≤ i < N .

For any correlation structure, once the GEE estimate β ( T ) of the coefficient parameter vector β is computed using the observations indexed by t ( i ) ∈ T , the GEE estimate of the dispersion parameter φ 0 is φ 0 ( T ) = φ 0 ( β ( T ) ) . The GEE estimate of the correlation parameter ρ is ρ ( T ) = ρ ( β ( T ) ) computed using β ( T ) and φ 0 ( T ) .

Let θ = ( β T φ 0 ) T be the ( J + 1 ) × 1 vector of the GEE mean and dispersion parameters. The correlation parameter ρ is a function of β and φ 0 and so has not been included in θ . Use the multivariate normal likelihood to define the likelihood-like function L ( T ; θ ) satisfying

l ( T ; θ ) = log ( L ( T ; θ ) ) = − e T ⋅ Σ − 1 ⋅ e / 2 − log ( | Σ | ) / 2 − N ⋅ log ( 2 ⋅ π ) / 2

where | Σ | is the determinant of the covariance matrix Σ . The vector ∂ l ( T ; θ ) / ∂ β of partial derivatives of l ( T ; θ ) can be expressed as the sum of two terms. The first term corresponds to differentiating the residual vector part e of l ( T ; θ ) with respect to β holding the covariance part Σ fixed in β and equals g ( β ) , the gradient-like quantity used in standard GEE modeling. This fact seems to have been first recognized by Chaganty [

Burman [

LCV = ∏ f = 1 k LCV f 1 / N

where LCV f is defined as the conditional likelihood-like term for the data in fold T ( f ) conditioned on the data in the union T + ( f − 1 ) of the prior folds using the deleted estimate θ ( T \ T ( f ) ) of the parameter vector θ . Formally,

LCV f = L ( T ( f ) | T + ( f − 1 ) ; θ ( T \ T ( f ) ) ) = L ( T + ( f ) ; θ ( T \ T ( f ) ) ) / L ( T + ( f − 1 ) ; θ ( T \ T ( f ) ) )

Because fold assignment is random, folds can be empty when the number k of folds is large relative to the number N of measurements, and then those folds are dropped from the computation of the LCV score. Larger LCV scores indicate better models. Note that even if the full data are non-spatial with observations at consecutive integer times t ( i ) = i for 1 ≤ i ≤ N , the folds T ( f ) and the fold unions T + ( f ) are not consecutive integer times except in rare cases and so require more general handling.

GEE modeling can be extended to handle nonconstant dispersions. Let θ = ( β T β ′ T ) T be the ( J + J ′ ) × 1 vector of the mean and dispersion parameters. The definition of the likelihood-like function L ( T ; θ ) given for standard GEE holds using the more general parameter vector θ . Differentiate l ( T ; θ ) = log ( L ( T ; θ ) ) with respect to the vector β ′ of dispersion coefficient parameters while holding the correlation parameter ρ fixed in the current parameter vector β ′ to provide the J ′ estimating equations g ( β ′ ) = ∂ ′ l ( T ; θ ) / ∂ ′ β ′ = 0 where the notation ∂ ′ l ( T ; θ ) / ∂ ′ β ′ is used to indicate that this is not the full partial derivative vector for l ( T ; θ ) in β ′ due to not accounting for the effect of β ′ on ρ . Now, combine these with the J standard GEE equations g ( β ) = 0 to solve for joint estimates of β and β ′ . Then, iteratively solve for

g ( θ ) = ( g ( β ) T g ( β ′ ) T ) T = 0

with g ( θ ) in the role of the gradient vector and the ( J + J ′ ) × ( J + J ′ ) matrix H ( θ ) in the role of the Hessian matrix. H ( θ ) has four component submatrices: the J × J matrix H ( β ) for the mean coefficients as defined for standard GEE, the J ′ × J ′ matrix H ( β ′ ) = ∂ ′ g ( β ′ ) / ∂ ′ β ′ for the J ′ dispersion coefficients, the J × J ′ matrix H ( β , β ′ ) = ∂ ′ g ( β ) / ∂ ′ β ′ , and its transpose H ( β ′ , β ) = H ( β , β ′ ) T .

Note that

log e ( | Σ | ) = log e ( | R ( ρ ) | ) + ∑ i = 1 N log e ( φ t ( i ) ) + ∑ i = 1 N log e ( V ( μ t ( i ) ) ) ,

φ t ( i ) = exp ( x ′ t ( i ) T ⋅ β ′ )

and

e T ⋅ Σ − 1 ⋅ e = s t d e T ⋅ R − 1 ( ρ ) ⋅ s t d e

where s t d e is the N × 1 vector with entries s t d e t ( i ) = e t ( i ) / σ t ( i ) for t ( i ) ∈ T . Consequently, g ( β ′ ) has entries

g j ( β ′ ) = s t d e x ′ j T ⋅ R − 1 ( ρ ) ⋅ s t d e − ∑ i = 1 N x ′ t ( i ) , j / 2

for 1 ≤ j ≤ J ′ where s t d e x ′ j is the N × 1 vector with entries

s t d e x ′ t ( i ) , j = x ′ t ( i ) , j ⋅ s t d e t ( i ) / 2

for t ( i ) ∈ T . H ( β ′ ) has entries

H j . j ′ ( β ′ ) = − s t d e x x ″ j . j ′ T ⋅ R − 1 ( ρ ) ⋅ s t d e − s t d e x ′ j T ⋅ R − 1 ( ρ ) ⋅ s t d e x ′ j ′

for 1 ≤ j , j ′ ≤ J ′ where s t d e x x ″ j . j ′ is the N × 1 vector with entries

s t d e x x ″ t ( i ) j , j ′ = x ′ t ( i ) , j ⋅ x ′ t ( i ) , j ′ ⋅ s t d e t ( i ) / 4

for t ( i ) ∈ T . H ( β , β ′ ) has columns

H j ( β , β ′ ) = − D T ⋅ Diag ( σ i n v x ′ j ) ⋅ R − 1 ( ρ ) ⋅ s t d e − D T ⋅ Diag ( 1 / σ ) ⋅ R − 1 ( ρ ) ⋅ s t d e x ′ j

where σ i n v x ′ j is the N × 1 vector with entries σ i n v x ′ t ( i ) , j = x ′ t ( i ) , j / ( 2 ⋅ σ t ( i ) ) for t ( i ) ∈ T and 1 ≤ j ≤ J ′ . If offsets are included, they are carried along in equations without any effect on derivatives.

Given a value for the vector θ of all coefficient parameters, an estimate of the correlation parameter ρ can be based on the associated standardized residuals s t d e t ( i ) . Calculate correlation estimates for the IND, EXCH, and AR1 correlation structures using the same formulas as before but computed with these more general standardized residuals. Iteratively solve g ( θ ) = 0 as follows. Given the current value θ u for θ , the next value is given by θ u + 1 = θ u − H − 1 ( θ u ) ⋅ g ( θ u ) , thereby adapting Newton’s method with g ( θ ) in the role of the gradient vector and H ( θ ) in the role of the Hessian matrix. The solution to the estimating equations for observations indexed by T is denoted as θ ( T ) = ( β ( T ) T β ′ ( T ) T ) T with associated correlation estimate ρ ( T ) = ρ ( θ ( T ) ) .

GEE modeling can be further extended to handle full parameter estimation through maximizing the likelihood-like function. Let θ = ( β T β ′ T ρ ) T be the ( J + J ′ + 1 ) × 1 vector of the mean, dispersion, and correlation parameters. The definition of the likelihood-like function L ( T ; θ ) given for standard GEE holds using this more general parameter vector θ . The likelihood-like function L ( T ; θ ) is maximized in the coefficient parameter vector θ by solving the estimating equations

g ( θ ) = ∂ l ( T ; θ ) / ∂ θ = 0

where ∂ l ( T ; θ ) / ∂ θ is the vector of standard partial derivatives of l ( T ; θ ) . The associated matrix H ( θ ) = ∂ g ( θ ) / ∂ θ . In this case, g ( θ ) is a true gradient vector and H ( θ ) a true Hessian matrix. This approach is extended linear mixed modeling in the sense that if the entries of y were continuous variables treated as normally distributed with V ( μ ) = 1 , then it would be exactly linear mixed modeling. Formulations given in what follows are adapted from those of [

The gradient vector g ( θ ) = ( g ( β ) T g ( β ′ ) T g ( ρ ) ) T . The gradient sub-vector g ( β ′ ) = ∂ l ( T ; θ ) / ∂ β ′ has the same formulation as for extended GEE modeling, only now its entries are standard partial derivatives. The gradient subvector g ( β ) = ∂ l ( T ; θ ) / ∂ β has entries

g j ( β ) = s t d e x j T ⋅ R − 1 ( ρ ) ⋅ s t d e − ∑ i = 1 N x t ( i ) , j / 2

where s t d e x j is the N × 1 vector with entries

s t d e x t ( i ) , j = x t ( i ) , j ( y t ( i ) + μ t ( i ) ) / ( 2 ⋅ σ t ( i ) )

for t ( i ) ∈ T and 1 ≤ j ≤ J . The partial derivative g ( ρ ) = ∂ l ( T ; θ ) / ∂ ρ satisfies

g ( ρ ) = − s t d e T ⋅ ∂ R − 1 ( ρ ) / ∂ ρ ⋅ s t d e / 2 − ∂ ( log ( | R ( ρ ) | ) ) / ∂ ρ / 2

where

∂ ( log ( | R ( ρ ) | ) ) / ∂ ρ = t r ( R − 1 ( ρ ) ⋅ ∂ R ( ρ ) / ∂ ρ ) ,

tr denotes the trace function, and

∂ R − 1 ( ρ ) / ∂ ρ = − R − 1 ( ρ ) ⋅ ∂ R ( ρ ) / ∂ ρ ⋅ R − 1 ( ρ ) .

For IND correlations, ∂ R ( ρ ) / ∂ ρ = 0 . For EXCH correlations, ∂ R ( ρ ) / ∂ ρ is the N × N matrix with diagonal entries all equal to 0 and off-diagonal entries all equal to 1. For AR1 correlations, ∂ R ( ρ ) / ∂ ρ is the N × N matrix with diagonal entries all equal to 0 and off-diagonal entries equaling

| t ( i ) − t ( i ′ ) | ⋅ ρ AR1 | t ( i ) − t ( i ′ ) | − 1

in the i t h row and i ′ t h column for 1 ≤ i ≠ i ′ ≤ N .

H ( β ) has nine component submatrices: the J × J matrix H ( β ) = ∂ g ( β ) / ∂ β for the mean parameters, the J ′ × J ′ matrix H ( β ′ ) = ∂ g ( β ′ ) / ∂ β ′ for the dispersion parameters computed as for extended GEE modeling, the second partial derivative H ( ρ ) = ∂ g ( ρ ) / ∂ ρ for the correlation parameter, the J × J ′ matrix H ( β , β ′ ) = ∂ g ( β ) / ∂ β ′ , and its transpose H ( β ′ , β ) = H ( β , β ′ ) T , the J × 1 vector H ( β , ρ ) = ∂ g ( β ) / ∂ ρ and its transpose H ( ρ , β ) = H ( β , ρ ) T , and the J ′ × 1 vector H ( β ′ , ρ ) = ∂ g ( β ′ ) / ∂ ρ and its transpose H ( ρ , β ′ ) = H ( β ′ , ρ ) T . H ( β ) has entries

H j , j ′ ( β ) = − s t d e x x j , j ′ T ⋅ R − 1 ( ρ ) ⋅ s t d e − s t d e x j T ⋅ R − 1 ( ρ ) ⋅ s t d e x j ′

for 1 ≤ j , j ′ ≤ J ′ where s t d e x x j , j ′ is the N × 1 vector with entries

s t d e x x t ( i ) j , j ′ = x t ( i ) , j ⋅ x t ( i ) , j ′ ⋅ s t d e t ( i ) / 4

for t ( i ) ∈ T . The second partial derivative H ( ρ ) satisfies

H ( ρ ) = − s t d e T ⋅ ∂ 2 R − 1 ( ρ ) / ∂ ρ 2 ⋅ s t d e / 2 − ∂ 2 ( log ( | R ( ρ ) | ) ) / ∂ ρ 2 / 2

where

∂ 2 ( log ( | R ( ρ ) | ) ) / ∂ ρ 2 = − t r ( R − 1 ( ρ ) ⋅ ∂ R ( ρ ) / ∂ ρ ⋅ R − 1 ( ρ ) ⋅ ∂ R ( ρ ) / ∂ ρ ) + t r ( R ( ρ ) − 1 ⋅ ∂ 2 R ( ρ ) / ∂ ρ 2 ) .

For IND and EXCH correlations, ∂ 2 R ( ρ ) / ∂ ρ 2 = 0 . For AR1 correlations, ∂ 2 R ( ρ ) / ∂ ρ 2 is the N × N matrix with diagonal entries all equal to 0 and off-diagonal entries equaling

| t ( i ) − t ( i ′ ) | ⋅ ( | t ( i ) − t ( i ′ ) | − 1 ) ⋅ ρ AR1 | t ( i ) − t ( i ′ ) | − 2

in the i t h row and i ′ t h column for 1 ≤ i ≠ i ′ ≤ N . H ( β ′ , β ) has entries

H j , j ′ ( β ′ , β ) = − s t d e x x ′ j , j ′ T ⋅ R − 1 ( ρ ) ⋅ s t d e − s t d e x j T ⋅ R − 1 ( ρ ) ⋅ s t d e x ′ j ′

for 1 ≤ j , j ′ ≤ J ′ where s t d e x x ′ j , j ′ is the N × 1 vector with entries

s t d e x x ′ t ( i ) j , j ′ = x ′ t ( i ) , j ′ ⋅ s t d e x t ( i ) , j / 2

for t ( i ) ∈ T , 1 ≤ j ≤ J , and 1 ≤ j ′ ≤ J ′ . H ( β , ρ ) has entries

H j ( β , ρ ) = s t d e x j T ⋅ ∂ R ( ρ ) − 1 / ∂ ρ ⋅ s t d e

for 1 ≤ j ≤ J . H ( β ′ , ρ ) has entries

H j ( β ′ , ρ ) = s t d e x ′ j T ⋅ ∂ R − 1 ( ρ ) / ∂ ρ ⋅ s t d e

for 1 ≤ j ≤ J ′ .

The parameter vector θ is estimated by iteratively solving g ( θ ) = 0 as follows. Given the current value θ u for θ , the next value is given by

θ u + 1 = θ u − H − 1 ( θ u ) ⋅ g ( θ u ) ,

thereby using Newton’s method with gradient vector g ( θ ) and Hessian matrix H ( θ ) . The estimation process can be stopped early if l ( T ; θ u + 1 ) does not increase by much compared to l ( T ; θ u ) . The solution to the estimating equations for observations indexed by T is denoted as θ ( T ) = ( β ( T ) T β ′ ( T ) T ρ ( T ) ) T .

The covariance matrix for the parameter estimate vector θ ( T ) can be computed as − H − 1 ( θ ( T ) ) and the variances corresponding to its diagonal entries can be used to compute z tests of zero individual model parameters. These are useful for fixed models of theoretical importance. On the other hand, tests for parameters of adaptively generated models (as described in Section 6) are usually significant as a consequence of the model selection process, and so results for these tests are not reported for models generated in the example analyses.

Knafl and Ding [

A SAS^{®} (SAS Institute, Inc., Cary, NC) macro has been developed for generating adaptive analyses including the reported example analyses. This macro as well as data and code used to generate the results of the example analyses are available from the first author.

Pain flares range from 0 to 4 per day and tend to increase over time. Data are available for N = 33 days with a missing value for one day (day 33). These data were collected using Ecological Momentary Assessment (EMA) [

Extended linear mixed modeling generates better LCV scores than extended GEE for all three correlation structures. Moreover, computation times are much shorter ranging from 0.4 to 1.2 minutes compared to 13.9 to 35.5 minutes. These results suggest that extended linear mixed modeling is preferable for modeling these pain flare counts because it generates better LCV scores in less time. Consequently, only extended linear mixed modeling using IND correlations is considered further for these data, generating the model with means based on t ( i ) 0.49 without an intercept and dispersions based on t ( i ) 8.37 and t ( i ) 0.5 without an intercept.

Modeling Approach | Correlation | Power Transforms^{a} | 5-fold LCV Score | Time^{b} | ||
---|---|---|---|---|---|---|

Structure | Estimate | Means | Dispersions | |||

extended GEE | IND | 0 | t ( i ) 0.54 | t ( i ) 0.12 | 038018 | 13.9 |

EXCH | 0.001 | t ( i ) 0.869 | t ( i ) 0.29 | 0.34712 | 35.5 | |

AR1 | 0.10 | t ( i ) 0.59 | t ( i ) 0.1 | 0.37404 | 17.6 | |

extended LMM | IND | 0 | t ( i ) 0.49 | t ( i ) 8.37 , t ( i ) 0.5 | 0.40622 | 0.4 |

EXCH | 0.42 | t ( i ) 0.511 | t ( i ) 0.2 | 0.36590 | 1.2 | |

AR1 | 0.20 | t ( i ) 0.4 | 1, t ( i ) 1.01 , t ( i ) 1.01 | 0.37693 | 0.4 |

AR1—autoregressive of order 1; EXCH—exchangeable; GEE—generalized estimating equations; IND—independent; LCV—likelihood-like cross-validation; LMM—linear mixed modeling. a. The i^{th} time value is denoted as t(i). A 1 corresponds to an intercept parameter; otherwise, the model has a zero intercept. b. Difference in minutes of clock times between the start and end of computations.

standardized residuals for this model, which range between ±2 without any extreme outliers, suggesting the model is a reasonable fit for these data.

The associated model generated using k = 10 folds has similar means based on t ( i ) 0.5 without an intercept and simpler dispersions based on t ( i ) 0.2 without an intercept. However, the 10-fold LCV score 0.38107 is smaller, suggesting that k = 5 is a better choice for these data. Moreover, there is one empty fold, suggesting that the choice of k = 10 folds is too large for these data with only N = 33 measurements. The associated model generated with k = 5 folds and assuming constant dispersions has a similar model for the means based on t ( i ) 0.53 without an intercept but a smaller LCV score 0.37031, suggesting that the dispersions for these data are reasonably treated as nonconstant over time.

Modeling Approach | Correlation | Power Transforms^{a} | 5-fold LCV Score | Time^{b} | ||
---|---|---|---|---|---|---|

Structure | Estimate | Means | Dispersions | |||

extended GEE | IND | 0 | 1, t ( i ) 0.5 | 1 | 037030 | 84.5 |

EXCH | −0.01 | 1, t ( i ) 0.5 | 1 | 0.36340 | 202.3 | |

AR1 | 0.57 | t ( i ) − 0.12 , t ( i ) 3 | 1, t ( i ) − 1.5 | 0.41497 | 222.7 | |

extended LMM | IND | 0 | t ( i ) 0.3 , t ( i ) 0.1 | 1 | 0.36845 | 0.5 |

EXCH | −0.01 | 1, t ( i ) 0.4 | 1 | 0.37379 | 1.7 | |

AR1 | 0.45 | 1, t ( i ) 0.4 | 1 | 0.40509 | 0.8 |

AR1—autoregressive of order 1; EXCH—exchangeable; GEE—generalized estimating equations; IND—independent; LCV—likelihood-like cross-validation; LMM—linear mixed modeling. a. The i^{th} time value is denoted as t(i). A 1 corresponds to an intercept parameter; otherwise, the model has a zero intercept. b. Difference in minutes of clock times between the start and end of computations.

Extended GEE modeling generates a better LCV score than extended linear mixed modeling for the IND correlation structure, but the scores for these two approaches are not too different. Extended linear mixed modeling generates a better LCV score than extended GEE modeling for the EXCH correlation structure. Extended GEE modeling generates a better LCV score than extended linear mixed modeling for the AR1 correlation structure. Although this is the best overall LCV score, the associated model for extended linear mixed modeling is more parsimonious with an intercept and one time transform for the means compared to two time transforms and constant dispersions compared to dispersions based on and intercept and one time transform. Moreover, computation times are substantially shorter for extended linear mixed modeling ranging from 0.5 to 1.7 minutes compared to 84.5 to 222.7 minutes or 1.4 to 3.7 hours. These results suggest that extended linear mixed modeling is preferable for modeling these as needed pain medications taken counts because it generates competitive or better scores or more parsimonious models in substantially less time. Consequently, only extended linear mixed modeling using AR1 correlations are considered further for these data, generating the model with means based on t ( i ) 0.4 with an intercept, constant dispersions based on an intercept, and estimated autocorrelation ρ AR1 = 0.45 .

The associated model generated using k = 10 folds is about the same with means based on t ( i ) 0.5 with an intercept, constant dispersions, and estimated correlation ρ AR1 = 0.45 . The 10-fold LCV score 0.40958 is larger, suggesting that k = 10 is a better choice for these data. There are no empty folds. The associated model generated using k = 15 folds is similar with means based on t ( i ) 0.5 with an intercept, dispersions based on t ( i ) 0.07 without an intercept, and estimated correlation ρ AR1 = 0.46 . The 15-fold LCV score 0.40353 is smaller, suggesting that k = 10 is a better choice for these data. There are no empty folds. The associated model generated with k = 15 folds assuming constant dispersions has means based on t ( i ) 0.5 with an intercept and close 15-fold LCV score 0.40318. Consequently, models generated by 5, 10, and 15 folds using extended linear mixed modeling are not too different, suggesting that the results are reasonably robust to the choice of the number of folds.

Adherence data for around the clock pain medications were collected using pill bottles equipped with Medication Event Monitoring System (MEMS) devices (AARDEX North America, Boulder, CO) that recorded the date and time of each pill bottle opening and presumably of the taking of the pain medication [

Modeling Approach | Correlation | Power Transforms^{a} | 5-fold LCV Score | Time^{b} | ||
---|---|---|---|---|---|---|

Structure | Estimate | Means | Dispersions | |||

extended GEE | IND | 0 | t ( i ) 0.7 | 1 | 0.046556 | 5.1 |

EXCH | −0.03 | 1, t ( i ) 5 | 1 | 0.051583 | 11.9 | |

AR1 | 0.58 | t ( i ) 1.1 | 1 | 0.048525 | 6.9 | |

extended LMM | IND | 0 | t ( i ) 0.8 | 1 | 0.046837 | 0.2 |

EXCH | −0.03 | t ( i ) 0.9 | 1 | 0.045251 | 0.7 | |

AR1 | 0.75 | t ( i ) 1.1 | 1, t ( i ) − 1.5 | 0.053856 | 0.2 |

AR1—autoregressive of order 1; EXCH—exchangeable; GEE—generalized estimating equations; IND—independent; LCV—likelihood-like cross-validation; LMM—linear mixed modeling. a. The i^{th} time value is denoted as t(i). A 1 corresponds to an intercept parameter; otherwise, the model has a zero intercept. b. Difference in minutes of clock times between the start and end of computations.

Extended linear mixed modeling generates better LCV scores than extended GEE for the IND and AR1 correlation structures. Its LCV score is smaller for the EXCH correlation structure, but its model is more parsimonious based on one time transform for the means with constant dispersions compared to one time transform plus an intercept for the means with constant dispersions. Furthermore, computation times are much shorter for extended linear mixed modeling ranging from 0.2 to 0.7 minutes compared to 5.1 to 11.9 minutes. These results suggest that extended linear mixed modeling is preferable for modeling these around the clock pain medications taken rates because it generates the best LCV score in less time. Consequently, only extended linear mixed modeling using AR1 correlations are considered further for these data, generating the model with means based on t ( i ) 1.1 without an intercept, dispersions based on t ( i ) 6.1 with an intercept, and estimated autocorrelation ρ AR1 = 0.75 .

The associated model generated using k = 10 folds is somewhat similar with means based on t ( i ) 0.4 without an intercept, constant dispersions based on an intercept, and estimated autocorrelation ρ AR1 = 0.76 . However, the 10-fold LCV score 0.052023 is smaller, suggesting that k = 5 is a better choice for these data. Moreover, there is one empty fold, suggesting that the choice of k = 10

folds is too large for these data with only N = 30 measurements. The associated model generated with k = 5 folds and assuming constant dispersions has a model for the means based on based on t ( i ) 1.01 without an intercept, an autocorrelation estimate of ρ AR1 = 0.75 , and a smaller LCV score 0.0.050386, suggesting that the dispersions for these data are reasonably treated as nonconstant over time.

Methods are formulated for modeling individual patient count/rate data over time allowing for nonlinear trajectories for means, time-varying dispersions, and temporal correlation. Three correlation structures are considered including IND, EXCH, and spatial AR1 correlations. Two extensions of standard GEE modeling are considered. Extended GEE modeling augments standard GEE mean parameter estimating equations with dispersion parameter estimating equations while using the GEE approach for correlation parameter estimation. Extended linear mixed modeling estimates all model parameters using estimating equations for mean, dispersion, and correlation parameters. These new estimating equations are determined by partial derivatives of a likelihood-like function based on the multivariate normal density. This likelihood-like function is also used to define a likelihood-like cross-validation (LCV) score for evaluating models. LCV scores are used to control adaptive regression modeling of possibly nonlinear means and dispersions over time. It is also possible to generate penalized likelihood-like criteria for model selection generalizing standard penalized likelihood criteria [

Example analyses using these methods are provided using three types of count/rate data for individual cancer patients including cancer pain flares per day, as needed cancer pain medications taken per day, and around the clock cancer pain medications taken per day per dose. Extended linear mixed modeling generates models with either better LCV scores or more parsimonious models than extended GEE modeling. Moreover, times to compute models are substantially smaller for extended linear mixed modeling than for extended GEE modeling. Time differences can be extreme for even moderate samples sizes, for example, analyses for the second example data set with 92 observations required at most 1.7 minutes for extended linear mixed modeling compared to up to 3.7 hours for extended GEE. These results indicate that extended linear mixed modeling is preferable for modeling individual patient count/rate data over time. This is likely to hold in more general modeling situations with other types of data and for combined data for multiple patients.

The formulation provided here assumes that separate modeling of each patient’s longitudinal data is preferable to modeling the combined data for all patients. Separate modeling is a person-centered approach to modeling longitudinal data as opposed to a variable-centered approach using the combined data [

Multilevel (or hierarchical linear) modeling [

Spatial AR1 correlations generate better models than independent and exchangeable correlations for two of the three example data sets. This suggests consideration of autoregressive and/or moving average correlations [

This work was supported in part by the National Institutes of Health/National Institute of Nursing Research Awards 1R01NR017853 and RC1NR011591. S. H. Meghani was the Principal Investigator for these research projects. G. J. Knafl was a consultant on both projects.

The authors declare no conflicts of interest regarding the publication of this paper.

Knafl, G.J. and Meghani, S.H. (2021) Modeling Individual Patient Count/Rate Data over Time with Applications to Cancer Pain Flares and Cancer Pain Medication Usage. Open Journal of Statistics, 11, 633-654. https://doi.org/10.4236/ojs.2021.115038