Empirical Likelihood Based Longitudinal Data Analysis

In longitudinal data analysis, our primary interest is in the estimation of regression parameters for the marginal expectations of the longitudinal responses, and the longitudinal correlation parameters are of secondary interest. The joint likelihood function for longitudinal data is challenging, particularly due to correlated responses. Marginal models, such as generalized estimating equations (GEEs), have received much attention based on the assumption of the first two moments of the data and a working correlation structure. The confidence regions and hypothesis tests are constructed based on the asymptotic normality. This approach is sensitive to the misspecification of the variance function and the working correlation structure which may yield inefficient and inconsistent estimates leading to wrong conclusions. To overcome this problem, we propose an empirical likelihood (EL) procedure based on a set of estimating equations for the parameter of interest and discuss its characteristics and asymptotic properties. We also provide an algorithm based on EL principles for the estimation of the regression parameters and the construction of its confidence region. We have applied the proposed method in two case examples.


Introduction
Longitudinal studies are common in areas such as epidemiology, clinical trials, economics, agriculture, and survey sampling. In longitudinal studies, we are interested in the changes in the variables over time as a function of the covariates, generally under the assumption that observations from different individuals are independent. For example, longitudinal studies are used to characterize growth and ageing, to assess the effect of risk factors on human health, and to evaluate the effectiveness of treatments. To obtain an unbiased, efficient, and reliable estimate, we must properly model the correlation between the repeated responses for each individual. However, the modelling of correlation, especially when the responses are discrete, is a challenging task even if the responses are collected over equispaced time points.
The approaches used for the analysis of longitudinal data can be classified as mixed effects models, transitional models, and marginal regression models. A potential disadvantage of mixed effects models is that they rely on parametric assumptions, which may lead to biased parameter estimates when a model is for the ith subject is fully specified by α .
It has been demonstrated that in some situations the use of an arbitrary working correlation structure may lead to no solution for α , which may break down the entire GEE methodology (see [3]). In an another study [4] showed that the GEE approach may yield an estimator of β , that, although consistent, is less efficient than that of the independence estimating equation approach under an arbitrary working correlation structure. To overcome this difficulty, [5] proposed using a stationary lag correlation structure instead of the working correlation matrix.
The estimate for β is obtained by solving the following estimating equa- can be estimated via the method of moments introduced by [6] and showed that the stationary lag correlation approach produces regression estimates that are consistent and more efficient than those obtained from the independence-assumption-based estimating equation approach (see [4]). Using simulation studies, [7] showed that there is a loss in the efficiency of the GEE estimators when the correlation structures are misspecified. Note, however, that the correlation structure is unknown in practice, and it is better to use a stationary lag-correlation structure which can accommodates the AR(1), EQC, and MA(1) structures. We, therefore, recommend defining a lag correlation structure for the longitudinal responses. When the number of estimations (r) is more than the number of parameters (p) (i.e. r p > ), we have extra information about the parameter for improved efficiency, but it may not be possible to solve the estimating equations directly. To overcome this problem, [8] proposed an adaptive quadratic inference  [12] and [13]. The GEE estimator of β with the independent working correlation is always consistent, so [10] recommended using this correlation as a safe choice. However, ignoring the correlation of observations may lead to an inefficient estimate of the regression coefficients and an underestimate of the standard errors.
The GEE approach requires only the assumption of the existence of the first two marginal moments and a correlation structure. GEE estimators are consistent and asymptotically normal as long as the mean, variance, and correlation structure are correctly specified. Marginal models have satisfactory performance when the assumptions are satisfied. Misspecification can cause estimates based on marginal models to be inefficient and inconsistent, and inference in this situation can be completely inappropriate. Confidence regions and hypothesis tests are based on asymptotic normality, which may not hold since the finite-sample distribution may not be symmetric. These problems motivate us to investigate the applicability of empirical likelihood (EL), a nonparametric likelihood method, based on a set of GEEs for the parameter of interest.
The EL introduced by [14] has properties similar to the parametric likelihood.
The EL combines the reliability of nonparametric methods with the flexibility and effectiveness of the likelihood approach. The EL has many nice properties parallel to those of parametric likelihood, including the ability to carry out hypothesis tests and construct confidence intervals without estimating the variance. The shape of EL confidence regions automatically reflects the emphasis of the observed data set. The EL method also offers advantages in parameter estimation and the formulation of goodness-of-fit tests. The EL has been successfully applied in areas such as linear models, GLMs, survey sampling, variable selection, survival analysis, and time series. We investigate the use of a nonparametric EL in subject-wise longitudinal data analysis, simultaneously estimating within-subject correlations using the method of moments by [6]. We used the adjusted EL to avoid computational issues. We explore the asymptotic properties of the proposed method and assess the method's performance based on a large number of simulations. Our approach provides consistent estimators and has comparable performance to marginal models when the model assumptions are correct. It is superior to marginal models when the variance function and correlation structure are even misspecified.
The remaining part of the paper is organized as follows. In Section 2, we develop the subject-wise EL via a set of GEEs of the parameter of interest and discuss its characteristics. We then introduce an adjusted EL (AEL) inference to longitudinal data. We discuss its characteristics and asymptotic properties in Section 3. In Section 4, we developed an algorithm based on EL principles for the

Empirical-Likelihood-Based Longitudinal Modelling
EL is a nonparametric-likelihood-based approach, introduced by [14], which is an alternative to parametric likelihood and bootstrap methods. This method enables us to fully employ the information available from the data for making asymptotically efficient inference about the population parameters. In this section, we introduce the EL-based longitudinal modelling.
In a seminal paper, [15] introduced the EL for linear models. EL confidence regions for regression coefficients in linear models were studied by [16]. The EL method can also be used to estimate the parameters defined by a set of estimating equations [17]. A comprehensive overview of the EL and its properties can be obtained from [18]. EL methods have attracted increasing attention over the last two decades, and the literature is extensive.
In longitudinal data analysis framework, [19] applied the EL approach using a subject-wise working independence model. This method ignores the within-subject correlation structure. Also note that [20] proposed a subject-wise EL by entering the longitudinal data and obtained asymptotic normality of the maximum EL estimator (MELE) of the regression coefficients. They did not consider the within-subject correlation structure. It is well known that the working-independence assumption may lead to a loss of efficiency in estimation when a within-subject correlation is present. To estimate the within-subject covariance matrices, [21] used the nonparametric sample covariance matrix obtained from the residuals of a GEE using the working-independence assumption. In this work, we show how to incorporate the within-subject correlation structure of the repeated measurements into the EL.
Following [15] and [17], we can extend the EL inference to longitudinal data based on a set of estimating functions ( ) , g β ρ given in (3). We incorporate the within-subject correlation structure of the repeated measurements into the EL using the well-known method of moments estimators by [6] for a given value of β . The profile empirical log-likelihood function of β is defined by The EL is maximized when Under some regularity conditions, we have β is the true parameter value. This conclusion is similar to that for the parametric likelihood ratio function. The vector β can be estimated by minimizing with respect to β . Note that the profile log-likelihood ratio function can be minimized with respect to β when ρ is known. In practice, ρ is unknown, but can be consistently estimated using the method of moments by [6].
The computation of the profile EL function is a key step in EL applications, and it involves constrained maximization. In some situations, the algorithm may fail because of poor initial values of the parameters. Moreover, the poor accuracy of EL confidence regions has been reported by several authors, including [17] [18] [22] [23] [24] and [25]. In the next subsection we will discuss how to address these problems in the context of longitudinal data.

Adjusted Empirical Likelihood
The computation of the profile EL ratio function ( ) l W β given in (7) is a key step in EL applications. The solution for λ must satisfy , g ρ β β [18], the convex hull contains 0 as an interior point with probability 1 as k → ∞ . However, when β is not close to the true parameter value 0 β or when k is small, it is possible that the solution of (5) does not exist. To avoid this problem, [25] with respect to β .
The adjustment is particularly useful because, even for some undesirable values of β , the algorithm guarantees a solution. The confidence regions constructed via the AEL are found to have better coverage probabilities than those for the regular EL and the algorithm provides a promising solution for λ , particularly when the sample size is small. The improved coverage probability is achieved without resorting to more complex procedures such as Bartlett correction or bootstrap calibration.
In the next section, following [17], we state and prove the results on the distributional properties of the adjusted profile EL estimates of β . We construct these theorems based on the GEE with lag correlation given in (3), since the GEE estimate of β under an arbitrary working-correlation structure is not necessarily consistent.

Main Results
In this section, we present the first-order asymptotic properties of β and the adjusted profile empirical log-likelihood ratio statistics. We first introduce some notation and regularity conditions that are used in the theorems and lemma.
Regularity Conditions: where Θ be the natural parameter space of the exponential family distributions presented in (1) and Θ  the interior of Θ . Also, ( ) h η is three times continuously differentiable and A3.
A4. The rank of ( )  is a set of independent and identically distributed random vectors. Let be the adjusted profile empirical log-likelihood ratio function. Then, as k → ∞ , ( ) ρ β is a consistent estimator in the neighbourhood of β ; the correlation (3) and ( ) * l W β attains its minimum value at some point β in the interior of This result corresponds to Lemma 1 in [17] which is about the consistency of maximum empirical likelihood estimates (MELE) for independent and identically distributed data. By following [17], under the regularity conditions A1-A5, we can obtain a subject-wise MELE, as k → ∞ , with probability tending to 1 the equation ( ) * l W β has a solution within the open ball . It is noted that the proof is similar to the proof of Lemma 1 in [19] and the details are omitted here. Theorem 3.2. In addition to the regularity conditions A1-A5, suppose that 2 l W β , where 0 β is the true value of β , is asymptotically chi-squared distributed with degrees of freedom p.
The proof of Theorem 3.3 can be achieved by using similar arguments as those used in the proof of Theorem 2 in [17]. The details are thus omitted here.

Algorithm
To implement our method, we need an efficient algorithm. We minimize the profile EL ratio function ( ) l W β with respect to β using a Newton-Raphson algorithm. At each Newton-Raphson iteration, we compute the Lagrange multiplier for updated values of β and ( ) ρ β . We used the modified Newton-Raphson algorithm proposed by [26] for computing the Lagrange multiplier for a given value of the parameter. We implemented this method, which is numerically stable. The algorithm given in Sections 4.1, 4.2, and 4.3 can easily be extended to the AEL by the addition of a pseudo-value

Computation of Lagrange Multiplier
The Lagrange multiplier λ is estimated by solving the equation In the EL problem, the solution must satisfy The modified Newton-Raphson algorithm for estimating λ for a given value of β and ( ) ρ β is as follows: 2. Let R λ and R λλ be the first and second partial derivatives of R (given in (10)) with respect to λ : Compute R λ and R λλ for c = λ λ and let ( )  Step 2 will guarantee that 0 i p > and the optimization is carried out in the right direction.

Algorithm for Optimizing Profile Empirical Likelihood Ratio Function
be the estimated value of λ for a given β . We minimize the profile EL ratio function defined in (7) over β . The Newton-Raphson algorithm is as follows: Note that to compute l W β and l W ββ , we need to estimate the Lagrange multiplier ( ) λ β as in Section 4.1. In practice, ρ is unknown, and the correlations can be consistently estimated by [24] using the method of moments. The simplified expressions for l W β and l W ββ are as follows. Let R β , R ββ , and R βλ be the first and second partial derivatives of (10) with respect to β and λ ( )

If
The first derivative of ( ) l W β with respect to β is , Similarly, the second derivative of ( ) Following [18], a local quadratic approximation to R leads to

Construction of Confidence Interval
We use the bisection method to construct the lower and upper confidence limits based on the profile EL ratio for β . Let 1. Compute a reasonable lower confidence limit 1,L SE β is the standard error of 1 β using any existing method. We can choose a such that ( ) ( ) We can use this approach to construct the upper confidence limit by setting

Performance Analysis
In this section, we conduct simulation studies to investigate the performance of our EL-based approach. We compute the coverage probabilities based on the ordinary EL, AEL and compare them with those of the GEE approach, which is based on a normal approximation. We also compute the coverage probabilities based on the extended empirical likelihood (EEL) by [27] and [28], which expands the EL domain geometrically and improve the coverage probabilities. We use different working correlations for the comparison. We generate count and continuous responses with different correlation structures and compare the methods under different working correlation structures.

Correlation Models for Stationary Count Data
We consider the stationary correlation models for count data discussed by [29] and [30]. The three models used to generate the data are be the time-independent covariate for the ith individual.
(ii) Poisson Moving Average Order 1 (MA(1)) Model The repeated responses follow the MA lag 1 dynamic model given by We simulated 1000 data sets from each of these models follow the AR(1), EQC, or MA(1) structure, and used EL-based methods to estimate the parameters using different working correlation such as AR (1) ,   from a normal distribution with mean 0 and standard deviation 1. For the analysis, we consider the working correlation to be either a true correlation or a lag correlation. We did not consider other possible values for ρ since the working correlation structure may lead to no solution for α in some situations. Table 1

Misspecified Working Correlation Structure
In the simulation studies discussed in Section 5.1 we considered the correlation structure used to generate the data as the working correlation in the GEE-based modelling. However, in practice, we do not know the correlation structure of the data. As discussed before, if the working correlation is misspecified, we may lose the efficiency of the parameter estimates [3] [4].
We conducted a simulation study to assess the loss of efficiency. We generated repeated counts with the AR(1) correlation structure given in Section 5.1(i) with 0.49 ρ = and 0.70 and 5 m = time points. We used three working correlation structures: EQC, MA(1), and lag correlation. Table 2 gives the results for the GEE, EL, EEL, and AEL. for the same nominal level. In this situation, the GEE with stationary lag correlation performs better than the GEE with a misspecified working correlation.
However, the EL, EEL, and AEL perform as well as the former method, despite being nonparametric methods based on a data-driven likelihood ratio function.
We did not consider all possible cases, for instance, a true EQC or MA(1) correlation model, since under different working correlation structures the correlation parameter α does not exist (see [5]).

Over-Dispersed Stationary Count Data
In this section, we consider the performance of our approach when the variance function is misspecified, in the context of stationary count data. We generate over-dispersed stationary count data it y using ( ) β for the three models discussed in Section 5.1, where i u is a random sample such that ( ) The distribution of u is chosen to be gamma with shape parameter ω and scale parameter 1 ω , where ω is the over-dispersion parameter. We choose over-dispersion parameter 1 4 ω = . However, the GEE, EL, EEL, and AEL CIs are constructed under the assumption that there is no overdispersion. Table 3 gives the average estimated values of the regression coefficients, the corresponding simulated standard errors in parentheses, the coverage probabilities for 1 β and 2 β for the 0.95 and 0.99 confidence levels, and the average width of the CI in parentheses for the independent, AR(1), EQC, and MA(1) models. Table 3 shows that when there is over-dispersion, the EL, EEL, and AEL outperform the GEE. In the AR(1)/AR (1)

Correlation Models for Continuous Data
In this section, we investigate the performance of our EL approach on a class of stationary and nonstationary correlation models for longitudinal continuous data. The random errors ( ) normal distribution with mean 0 and standard deviation 1. Table 4 gives the mean estimated values of the regression coefficients, the corresponding simulated standard errors in parentheses, the simulated coverage probabilities for 1 β and 2 β for the 0.95 and 0.99 confidence levels, and the average width of the CI in parentheses for the independent, AR(1), EQC, and MA(1) models with stationary covariates. Table 5 gives the results for nonstationary covariates.
The coverage probabilities of the intervals based on the EL, EEL, and AEL are similar to those of the GEE. For instance, in the MA(1)/MA(1) case in Table 4 the coverage probabilities of 1

Correlation Models for Misspecified Continuous Data
In this section, we compare the performances of the methods when the correla- However, the confidence regions for the GEE are constructed under the normality assumption. Table 6 gives the mean estimated values of the coefficients and the corresponding simulated standard errors in parentheses. It also includes the coverage probability for 1 β and 2 β for the 0.95 and 0.99 confidence levels and the average width of the CI in parentheses for samples of sizes 50 k = and 100 k = for the independent, AR(1), EQC, and MA(1) models with stationary covariates. Table 7 gives the results for nonstationary covariates.
When the model is misspecified, the EL, EEL, and AEL outperform the GEE.
For example, in the AR(1)/Lag case in Table 6 Table 7 shows that when the covariates are time-dependent the GEE has substantial undercoverage compared with the results for time-independent covariates, as discussed by [9].

Applications
In this section, we illustrate the applicability of our proposed method to two real-world examples.

Health Care Utilization Study
We consider longitudinal health care utilization data [5] that was collected by     Table 8 shows that different working correlations lead to slightly different parameter estimates, but the overall conclusion remains the same. Since the data indicate over-dispersion, the GEE-based approach may be inefficient, as shown  is also significant. The GEE approach is sensitive to the choice of correlation structure. In this real data set, the true correlation structure is unknown, so the lag correlation approach is appropriate since it can accommodate all three correlation structures. The Shapiro-Wilk test shows that the square-root transformed CD4 cell counts are not normally distributed. The GEE-based method is, therefore, not appropriate. We, therefore, conclude that the EL is a better choice.

Conclusions
Longitudinal data modelling using the GEE approach assumes a working correlation model for the within-subject correlation of the responses. When the working correlation is incorrectly specified, the GEE based estimates are not necessarily consistent and may lose efficiency. Any misspecification can cause estimates based on marginal models to be inefficient and misleading conclusions. Also, the construction of a confidence region and hypothesis testing are based on asymptotic normality, which may not hold since the finite-sample distribution may not be symmetric.
Taking these issues into account, we have proposed an EL-based longitudinal modelling based on a data-driven likelihood ratio approach sharing many of the properties of the parametric likelihood. We do not need to specify the complete parametric distribution to perform the inference. We can, therefore, use likelihood methods without assuming that the data come from a known family of distributions. We defined the subject-wise profile EL based on a set of GEEs. The estimation and confidence region construction using the EL approach are proposed, which has advantages over other methods such as those based on normal approximations. We introduced the adjusted EL to avoid any computational issues, which improve the coverage probabilities. A major advantage of EI is that involves no prior assumptions about the shape of an EL-based confidence region, which is data-driven. The construction of the confidence region based on the EL method does not involve any variance estimation.
The proposed approach yields more efficient estimators than the conventional GEE approach and achieves the same asymptotic properties as [17]. Our performance analysis showed that our method for longitudinal count and continuous responses is comparable to the GEE when the model assumptions are satisfied. For instance, when the working correlation is correctly specified, the coverage probabilities of the intervals based on the EL, EEL, and AEL are similar to those of the GEE. CIs based on the regular EL have slight undercoverage compared with those of the GEE; the coverage probabilities are substantially improved with the EEL and AEL. Moreover, these methods are consistently more accurate than the regular EL. When the working correlation is misspecified, the coverage probabilities of the intervals based on the EL, EEL, and AEL are shown to be equally efficient to the GEE estimator with stationary lag correlation structure. Also, the results show that when the working correlation is misspecified, the GEE estimator with stationary lag correlation structure, EL, EEL, and AEL outperforms the GEE with an incorrect working correlation structure. When the model is misspecified such as marginal variance, our method outperforms the GEE. This result shows that EL methods are robust to misspecification. Moreover, the EL-based CI has a data-driven shape, whereas the GEE-based CI is always symmetric due to normal approximation.