Empirical Likelihood Inference for Generalized Partially Linear Models with Longitudinal Data

In this article, we propose a generalized empirical likelihood inference for the parametric component in semiparametric generalized partially linear models with longitudinal data. Based on the extended score vector, a generalized empirical likelihood ratios function is defined, which integrates the within-cluster correlation meanwhile avoids direct estimating the nuisance parameters in the correlation matrix. We show that the proposed statistics are asymptotically Chi-squared under some suitable conditions, and hence it can be used to construct the confidence region of parameters. In addition, the maximum empirical likelihood estimates of parameters and the corresponding asymptotic normality are obtained. Simulation studies demonstrate the performance of the proposed method.

surements form subjects are collected over times, and therefore the responses from same subjects are very likely to be correlated with an unknown structure.
The challenge for longitudinal data lies in how to effectively utilize the within-cluster information. The early works in EL for longitudinal data ignored the correlations within subjects, e.g. [7] [8]. Some recent studies incorporate the correlation information by constructing the auxiliary random vector through the generalized estimating equations (GEEs) [9] [10]. The GEEs use a working correlated matrix to carry the correlation information. The working correlated matrix is decided by a small set of nuisance parameters α to avoid the specification of the whole correlation matrix [13]. The advantage of the GEEs is that the estimators of the regression parameter β are always consistent. However, GEEs estimator suffers a great loss in efficiency when the correlation structure is misspecified. The quadratic inference functions (QIFs) approach avoids estimating the nuisance correlation parameters α by assuming that the inverse of the working correlation matrix can be approximated by a linear combination of several known basis matrices, and solve the combined estimation functions by using the principle of the generalized method of moments [14] [15]. The QIFs can also take the within-cluster correlation into account and is more efficient than GEEs when the working correlation is misspecified. The QIFs approach has been applied to many models, including varying coefficient models, partially linear models, single-index models and generalized linear models. The recent related works include [16]- [21].
More recently, [11] [12] proposed generalized empirical likelihood method (GEL), which using a QIFs-based generalized log-empirical likelihood ratio statistics to construct the confidence region for the parameters in generalized linear models (GLMs) with longitudinal data and partially linear models with Longitudinal data.
Generalized partially linear models (GPLMs) can be regarded as a comprise between the GLMs and fully nonparametric models. The choice of a partial linear model is sometimes made to avoid nonparametric specification of high-dimensional covariates, and at other times the model arises naturally due to categorical covariates. In this article, we extend the GEL method to GPLMs with longitudinal data and the B-spline method is adopted to approximate the nonparametric component in the model. Our method incorporates the within-cluster correlation information into the auxiliary random vector. Our proposed method does not require the estimation of the variance of the proposed estimator and is not sensitive to the misspecification of the working correlation structure. The rest of this article is organized as follows. We propose the QIF-based EL method for GPLMs in Section 2 and present the corresponding asymptotic results in Section 3. Simulation studies are provided in Section 4 and a real data is analysed in Section 5. The details of the proofs are provided in the Appendix.

GPLMs with Longitudinal Data
In this article, we consider a longitudinal study with n subjects and i m observations over time for the ith subject ( 1, , Following [22], we replace ( ) α ⋅ by its basis function approximations. More

GEL for GPLMs with Longitudinal Data
In most applications of GPLMs, the main interest is the statistical inference on the regression coefficient 0 β . Similar with [5], we regard the nonparametric function ( ) α ⋅ , i.e. the spline coefficient γ as nuisance, and conduct a suitable estimator of it to make sure the efficient statistical inference for β . In this article, we take the QIF estimate γ as the estimator of γ . (3) carries the within-cluster correlation information, in order to construct the empirical likelihood ratio function for β , we introduce the auxiliary random vector ., we define the generalized empirical log-likelihood ratio function as follows, By the Lagrange multiplier method, we obtain that ( ) where λ is a 1 ps × vector satisfies

Asymptotic Properties
For convenience and simplicity, let C denote a positive constant that may have different values at each appearance throughout this paper and A denote the modulus of the largest singular value of matrix or vector A. Before the proof of our main theorems, we list some regularity conditions that used in this paper.

Simulation Studies
In this section, we conduct simulation studies to evaluate the finite sample performance of the proposed methods. We compare the GEL with the NA-based method in terms of the coverage probability and the lengths of the obtained confidence region.
In our non-parametric estimation implementation, we use the sample quantiles of ij U as knots. Moreover, we use cubic splines and take the number of internal knots to be the integer around 1 5 n . This particular choice is consistent with the asymptotic theory in Section 3 and performs well in the simulations.

Study 1
Consider a binomial response: clustered binary responses are generated as [23]. The correlation parameter ρ are taken to be 0.25, 0.5 and 0.75 which represent weak, medium and strong correlation respectively. We generated 500 data sets for each pair of ( , n ρ ). Table 1 list the EL-based and NA-based confidence intervals of β under CS structure. It shows that the GEL approach gives a slightly shorter confidence intervals than the NA method, while the former has a coverage probability more To study the influence of mis-specification to GEL approach, we derive the GEL confidence interval when the working correlation structure is specified to be CS and AR-1 respectively. Table 2 list the results when the true structure is CS. Table 3 list the results when the true structure is AR-1. It is known that the QIFs estimator is insensitive to mis-specification in correlation structure. Table   2 and Table 3 show that the QIFs-based GEL approach gives similar 95% confidence interval and coverage probability even the correlation structure is misspecified, which means the proposed GEL approach is robust.  Table 3. The average length and the corresponding coverage probabilities of the 95% confidence region of β for GEL when the true correlation structure is AR-1. . The clustered binary responses with exchangeable correlation structure are also generated as [23]. The correlation parameter ρ are taken to be 0.3 and 0.8. is similar, we omit here.

Example: Infectious Disease Data
To investigate the performance of the proposed method, we analysis an infec-  This data set has been well analyzed by many authors, such as [24] [25] [26] [27] [28]. We here consider a simple logistic model: where µ is the mean of the risk of respiratory infection, 1 β and 2 β describe the effects of Vitamin A deficiency and the sex aspect. We use two methods: the NA method and QIFs-based GEL under the CS correlation. The confidence regions are reported in Figure 2. It shows that GEL gives smaller confidence regions than NA does.
Note that Apply Taylor expansion to the first two terms in (13) at where i ω is between i η and ( ) Follow the argument of [4], we can prove ( )

Proof of Theorem 1
Proof. Applying Taylor expansion to ( ) (8), it follows that

Proof of Theorem 2
Proof. We first define bivariate functions ( )