ESL-Based Robust Estimation for Mean-Covariance Regression with Longitudinal Data

Fei Lu; Liugen Xue; Xiong Cai

doi:10.4236/ojs.2020.101002

Open Journal of Statistics > Vol.10 No.1, February 2020

ESL-Based Robust Estimation for Mean-Covariance Regression with Longitudinal Data

Fei Lu^*, Liugen Xue, Xiong Cai
College of Applied Sciences, Beijing University of Technology, Beijing, China.
DOI: 10.4236/ojs.2020.101002 PDF HTML XML 502 Downloads 1,313 Views Citations

Abstract

When longitudinal data contains outliers, the classical least-squares approach is known to be not robust. To solve this issue, the exponential squared loss (ESL) function with a tuning parameter has been investigated for longitudinal data. However, to our knowledge, there is no paper to investigate the robust estimation procedure against outliers within the framework of mean-covariance regression analysis for longitudinal data using the ESL function. In this paper, we propose a robust estimation approach for the model parameters of the mean and generalized autoregressive parameters with longitudinal data based on the ESL function. The proposed estimators can be shown to be asymptotically normal under certain conditions. Moreover, we develop an iteratively reweighted least squares (IRLS) algorithm to calculate the parameter estimates, and the balance between the robustness and efficiency can be achieved by choosing appropriate data adaptive tuning parameters. Simulation studies and real data analysis are carried out to illustrate the finite sample performance of the proposed approach.

Keywords

Exponential Squared Loss Function, Within-Subject Correlation, Longitudinal Data, Modified Cholesky Decomposition, Robustness

Share and Cite:

Lu, F. , Xue, L. and Cai, X. (2020) ESL-Based Robust Estimation for Mean-Covariance Regression with Longitudinal Data. Open Journal of Statistics, 10, 10-30. doi: 10.4236/ojs.2020.101002.

1. Introduction

Longitudinal data arises frequently in many fields, such as biological research, social science and other fields. The observations on the same subject are measured repeatedly over time, and thus intrinsically correlated [1]. The covariance matrix of such data is important since ignoring the correlation structure may lead to inefficient estimators of the mean parameter. Furthermore, the covariance matrix itself may be of scientific interest [2]. However, it is challenging to model the covariance matrix which suffers from the positive-definiteness constraint and includes many unknown parameters. To avoid this challenge, a common strategy is to specify the working correlation structure [3], which does not permit more general structures and cannot flexibly incorporate covariates that may be helpful to explain the covariations. To overcome this limitation, joint modelling for the mean and covariance of longitudinal data has received increasing interest recently; see, for example, [4] - [12]. Among these joint modeling approaches, the modified Cholesky decomposition (MCD) for the covariance matrix proposed in [4] [5] is attractive owing to the fact that it leads automatically to positive definite covariance matrix, and the parameters in it can be interpreted by suitable statistical concepts. As a result, the regression techniques and model based inference can be adopted to the parameters in this decomposition, see [7] for more details.

However, the aforementioned approaches are very sensitive to outliers or heavy-tailed distributions. [13] proposed a robust procedure for modeling the correlation matrix of longitudinal data based on an alternative Cholesky decomposition and heavy-tailed multivariate t-distributions with unknown degrees of freedom. It should be pointed out that the use of the multivariate t-distribution alone does not necessarily guarantee robustness. In addition, [14] [15] developed robust generalized estimating equations (GEE) for regression parameters in joint mean-covariance model by employing the Huber’s function and leverage-based weights. [16] developed an efficient parameter estimation via MCD for quantile regression with longitudinal data. [17] further proposed a moving average Cholesky factor model, which is transformed from MCD, in covariance modeling for composite quantile regression with longitudinal data. Then, [18] carried out smoothed empirical likelihood inference via MCD for quantile varying coefficient models with longitudinal data. Later, [19] developed quantile estimation via MCD for longitudinal single-index models.

Although M-type regression and quantile regression procedures can overcome outliers and heavy-tail errors, they may lose efficiency under the normal distribution. To overcome this difficulty, [20] recently proposed a robust variable selection approach by adopting the exponential squared loss (ESL) function with a tuning parameter. They have showed that, with properly selected tuning parameter, the proposed approach not only achieves good robustness with respect to outliers in the dataset, but also is as asymptotically efficient as the least squares estimation without outliers under normal error. Later, some authors employed the ESL funtion for longitudinal data. For example, [21] propsed an efficient and robust variable selection method for longitudinal generalized linear models based on GEE. [22] proposed a robust and efficient estimation procedure for simultaneous model structure identification and variable selection in generalized partial linear varying coefficient models for longitudinal data. [23] developed GEE-based robust estimation and empirical likelihood inference approach with ESL for panel data models. With a similar loss function, [24] proposed a robust variable selection method in modal varying-coefficient models with longitudinal data. [25] proposed modal regression statistical inference for longitudinal semivarying coefficient models, including GEE, empirical likelihood and variable selection. However, to our knowledge, there is no paper to investigate the robust estimation procedure against outliers within the framework of mean-covariance regression analysis for longitudinal data employing the ESL function.

In this paper, we propose a robust estimation approach for the model parameters of the mean and generalized autoregressive parameters in the within subject covariance matrices for longitudinal data based on the ESL function. We begin with the ESL-based estimation for the mean parameters pretending that the repeated measurements within a subject are independent. Then based on the roughly estimated mean parameters, the simultaneous estimation for the mean and generalized autoregressive parameters is carried out using the ESL function. The proposed estimators can be shown to be asymptotically normal under certain conditions. Moreover, we develop an iteratively reweighted least squares (IRLS) algorithm [26] to calculate the parameter estimates. Numerical studies are carried out to illustrate the finite sample performance of our approach.

The outline of this paper is organized as follows. Section 2 develops the robust estimation procedure. Section 3 establishes the asymptotic properties of the proposed ESL estimators. The IRLS algorithm and a data-driven method for the selection of tuning parameters are presented in Section 4. Simulations are carried out in Section 5. Section 6 analyses a real data set. A discussion is given in Section 7. The technical proofs are provided in the Appendix.

2. Robust Estimation Procedure

2.1. Initial Estimate for the Mean Parameters

Consider n subjects, where each subject is measured repeatedly over time. For the ith subject, suppose that $Y_{i j}$ is the observed scale response variable at time $t_{i j}$ , and $X_{i j}$ is the corresponding $p \times 1$ covariate vector, $i = 1, \dots, n, j = 1, \dots, m_{i}$ . Denote $N = \sum_{i = 1}^{n} m_{i}$ . Furthermore, let $Y_{i} = {(Y_{i 1}, \dots, Y_{i m_{i}})}^{Τ}$ , $X_{i} = {(X_{i 1}, \dots, X_{i m_{i}})}^{Τ}$ , $t_{i} = {(t_{i 1}, \dots, t_{i m_{i}})}^{Τ}$ . A longitudinal linear regression model has the form

$Y_{i j} = X_{i j}^{Τ} β + ε_{i j}, i = 1, \dots, n, j = 1, \dots, m_{i},$ (1)

where $β$ is the $p \times 1$ vector of associated parameters, $ε_{i} = {(ε_{i 1}, \dots, ε_{i m_{i}})}^{T}$ is the random error satisfying that $E (ε_{i j} | X_{i j}) = 0$ and $var (ε_{i} | X_{i}) = Σ_{i}$ .

Define the ESL function $1 - ϕ_{τ} (t) = 1 - \exp (- t^{2} / τ)$ . We first estimate $β$ pretending that the random error $ε_{i j}$ ’s are independent. More specifically, we estimate $β$ by maximizing the objective function

$L_{1} (β) = \sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} ϕ_{τ_{1}} (Y_{i j} - X_{i j}^{Τ} β),$ (2)

where $τ_{1}$ is a tuning parameter. The resulting initial estimate of $β$ is denoted by $\tilde{β}$ .

2.2. Simultaneous Estimate for the Mean and Generalized Autoregressive Parameters

Based on the Cholesky decomposition, there exists a lower triangle matrix $Φ_{i}$ with diagonal ones such that

$var (Φ_{i} ε_{i} | X_{i}) = Φ_{i} Σ_{i} Φ_{i}^{Τ} = D_{i},$

where $D_{i} = diag (d_{i 1}^{2}, \dots, d_{i m_{i}}^{2})$ . In other words, let $ϵ_{i} = {(ϵ_{i 1}, \dots, ϵ_{i m_{i}})}^{T} = Φ_{i} ε_{i}$ , we obtain that

$\begin{array}{l} ε_{i 1} = ϵ_{i 1}, \\ ε_{i j} = ϕ_{j,1}^{(i)} ε_{i 1} + \dots + ϕ_{j, j - 1}^{(i)} ε_{i, j - 1} + ϵ_{i j}, i = 1, \dots, n, j = 2, \dots, m_{i}, \end{array}$ (3)

where $ϕ_{j, k}^{(i)}$ is the negative of the $(j, k) th$ element of $Φ_{i}$ . It’s obvious that $ϵ_{i j}$ ’s are uncorrelated with $E (ϵ_{i j}) = 0$ and $var (ϵ_{i j}) = d_{i j}^{2}$ , $j = 1, \dots, m_{i}$ . If $ε_{i}$ ’s were available, then (1) could be transformed as the following linear model with uncorrelated error $ϵ_{i j}$ ’s:

$\begin{array}{l} Y_{i 1} = X_{i 1}^{Τ} β + ϵ_{i 1}, \\ Y_{i j} = X_{i j}^{Τ} β + ϕ_{j,1}^{(i)} ε_{i 1} + \dots + ϕ_{j, j - 1}^{(i)} ε_{i, j - 1} + ϵ_{i j}, i = 1, \dots, n, j = 2, \dots, m_{i} . \end{array}$ (4)

[4] pointed out that the MCD has a well-founded statistical interpretation, and it has the advantage that the generalized autoregressive parameter $ϕ_{j, k}^{(i)}$ ’s and log-innovation variance $\log (d_{i j}^{2})$ ’s are unconstrained. For simplicity, we assume that $d_{i j}^{2} = d^{2}$ for all $i = 1, \dots, n, j = 2, \dots, m_{i}$ . Since $Σ_{i} = var (ε_{i} | X_{i})$ may depend on $X_{i}$ , we adopt a more parsimonious structure,

$ϕ_{j, k}^{(i)} = Z_{j, k}^{(i) Τ} γ,$ (5)

where $γ = {(γ_{1}, \dots, γ_{q})}^{Τ}$ is the regression coefficient, and the covariates $Z_{j, k}^{(i)} = {(Z_{j, k,1}^{(i)}, \dots, Z_{j, k, q}^{(i)})}^{Τ}$ may contain the time, the baseline covariates $X_{i}$ , the interactions and so on.

Based on $\tilde{β}$ , we can obtain the estimated residuals

${\tilde{ε}}_{i j} = Y_{i j} - X_{i j}^{Τ} \tilde{β}, i = 1, \dots, n, j = 1, \dots, m_{i} - 1.$ (6)

From (4), (5) and (6), we can obtain the simultaneous estimate $\hat{θ} = {({\hat{β}}^{Τ}, {\hat{γ}}^{Τ})}^{Τ}$ of $θ = {(β^{Τ}, γ^{Τ})}^{Τ}$ by maximizing the following objective function:

$\begin{matrix} L_{2} (θ) = \sum_{i = 1}^{n} \sum_{j = 2}^{m_{i}} ϕ_{τ_{2}} (Y_{i j} - X_{i j}^{Τ} β - {\tilde{ε}}_{i 1} Z_{j,1}^{(i) Τ} γ - \dots - {\tilde{ε}}_{i, j - 1} Z_{j, j - 1}^{(i) Τ} γ) \\ = \sum_{i = 1}^{n} \sum_{j = 2}^{m_{i}} ϕ_{τ_{2}} (Y_{i j} - X_{i j}^{Τ} β - \sum_{k = 1}^{j - 1} {\tilde{ε}}_{i, k} Z_{j, k,1}^{(i)} γ_{1} - \dots - \sum_{k = 1}^{j - 1} {\tilde{ε}}_{i, k} Z_{j, k, q}^{(i)} γ_{q}) \\ = \sum_{i = 1}^{n} \sum_{j = 2}^{m_{i}} ϕ_{τ_{2}} (Y_{i j} - X_{i j}^{Τ} β - {\tilde{ζ}}_{i j}^{Τ} γ) \\ = \sum_{i = 1}^{n} \sum_{j = 2}^{m_{i}} ϕ_{τ_{2}} (Y_{i j} - {\tilde{δ}}_{i j}^{Τ} θ), \end{matrix}$ (7)

where ${\tilde{δ}}_{i j} = {(X_{i j}^{Τ}, {\tilde{ζ}}_{i j}^{Τ})}^{Τ}$ with ${\tilde{ζ}}_{i j} = {(\sum_{k = 1}^{j - 1} {\tilde{ε}}_{i, k} Z_{j, k,1}^{(i)}, \dots, \sum_{k = 1}^{j - 1} {\tilde{ε}}_{i, k} Z_{j, k, q}^{(i)})}^{Τ}$ . Then we can obtain the estimates of $ϕ_{j, k}^{(i)}$ ’s in model (3) by combining (5) and $\hat{γ}$ .

3. Asymptotic Properties

Define $F_{1} (x, τ_{1}) = E ({ϕ^{″}}_{τ_{1}} (ε_{i j}) | X_{i j} = x)$ , $G_{1} (x, τ_{1}) = E ({ϕ^{'}}_{τ_{1}} {(ε_{i j})}^{2} | X_{i j} = x)$ , $F_{2} (x, τ_{2}) = E ({ϕ^{″}}_{τ_{2}} (ϵ_{i j}) | X_{i j} = x)$ , $G_{2} (x, τ_{2}) = E ({ϕ^{'}}_{τ_{2}} {(ϵ_{i j})}^{2} | X_{i j} = x)$ , $H ({\bar{x}}_{i j}) = E (ζ_{i j} ζ_{i j}^{Τ} | {\bar{X}}_{i j} = {\bar{x}}_{i j})$ , ${\bar{X}}_{i j} = {(X_{i 1}, \dots, X_{i j})}^{Τ}$ , ${\bar{x}}_{i j} = {(x_{i 1}, \dots, x_{i j})}^{Τ}$ , $δ_{i j} = {(X_{i j}^{Τ}, ζ_{i j}^{Τ})}^{Τ}$ , $ζ_{i j} = {(\sum_{k = 1}^{j - 1} ε_{i, k} Z_{j, k,1}^{(i)}, \dots, \sum_{k = 1}^{j - 1} ε_{i, k} Z_{j, k, q}^{(i)})}^{Τ}$ . To establish the asymptotic properties of the proposed ESL estimator, assume that the following regularity conditions hold:

(C1) There exists a positive integer M such that $\max_{1 \leq i \leq n} m_{i} \leq M < \infty$ . This means that n and N have the same order.

(C2) There exists a positive constant C such that $| X_{i j}^{(k)} | \leq C$ for $1 \leq i \leq n$ , $1 \leq j \leq m_{i}$ and $0 \leq k \leq p$ . In addition, $\frac{1}{n} \sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} X_{i j} X_{i j}^{Τ}$ converges to a finite positive definite matrix in probability.

(C3) $F_{1} (x, τ_{1})$ and $G_{1} (x, τ_{1})$ are continuous with respect to x. Moreover, for any $τ_{1} > 0$ , $F_{1} (x, τ_{1}) < 0$ .

(C4) $E ({ϕ^{'}}_{τ_{1}} (ε_{i j}) | X_{i j} = x) = 0$ , $E ({| {ϕ^{'}}_{τ_{1}} (ε_{i j}) |}^{3} | X_{i j} = x)$ , $E ({ϕ^{″}}_{τ_{1}} {(ε_{i j})}^{2} | X_{i j} = x)$ , $E ({ϕ^{‴}}_{τ_{1}} (ε_{i j}) | X_{i j} = x)$ and $E ({ϕ^{‴}}_{τ_{1}} {(ε_{i j})}^{2} | X_{i j} = x)$ are continuous with respect to x.

(C5) $\frac{1}{n} \sum_{i = 1}^{n} E (X_{i}^{Τ} {ϕ^{'}}_{τ_{1}} (ε_{i}) {ϕ^{'}}_{τ_{1}} {(ε_{i})}^{Τ} X_{i})$ converges to a finite positive definite matrix.

(C6) For $1 \leq i \leq n,1 \leq j_{1}, \dots, j_{6} \leq m_{i} - 1$ , $E {| ε_{i j_{1}} ε_{i j_{2}} ε_{i j_{3}} | | (X_{i j_{1}}, X_{i j_{2}}, X_{i j_{3}}) = x}$ and $E {ε_{i j_{1}} \dots ε_{i j_{k}} | (X_{i j_{1}}, \dots, X_{i j_{k}}) = x} (k = 2,3,4,6)$ are continuous with respect to x.

(C7) $\frac{1}{n} \sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} ζ_{i j} ζ_{i j}^{Τ}$ converges to a finite positive definite matrix in probability.

(C8) $F_{2} (x, τ_{2})$ and $G_{2} (x, τ_{2})$ are continuous with respect to x. Moreover, for any $τ_{2} > 0$ , $F_{2} (x, τ_{2}) < 0$ .

(C9) $E ({ϕ^{'}}_{τ_{2}} (ϵ_{i j}) | X_{i j} = x) = 0$ , $E ({| {ϕ^{'}}_{τ_{2}} (ϵ_{i j}) |}^{3} | X_{i j} = x)$ , $E ({ϕ^{″}}_{τ_{2}} {(ϵ_{i j})}^{2} | X_{i j} = x)$ , $E ({ϕ^{‴}}_{τ_{2}} (ϵ_{i j}) | X_{i j} = x)$ and $E ({ϕ^{‴}}_{τ_{2}} {(ϵ_{i j})}^{2} | X_{i j} = x)$ are continuous with respect to x.

Theorem 1 If regularity conditions (C1)-(C5) hold, then

$\sqrt{n} (\tilde{β} - β_{0}) \to N (0, Σ_{1}^{- 1} Σ_{2} Σ_{1}^{- 1})$

in distribution as $n \to \infty$ , where

$Σ_{1} = \lim_{n \to \infty} \frac{1}{n} \sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} E {F_{1} (X_{i j}, τ_{1}) X_{i j} X_{i j}^{Τ}},$

$Σ_{2} = \lim_{n \to \infty} \frac{1}{n} \sum_{i = 1}^{n} E (X_{i}^{Τ} {ϕ^{'}}_{τ_{1}} (ε_{i}) {ϕ^{'}}_{τ_{1}} {(ε_{i})}^{Τ} X_{i}) .$

Then if the random error $ε_{i j}$ ’s are independent, we have the following corollary, which is similar to Corollary 2 and useful for the choice of tuning parameters in Section 4.2.

Corollary 1 If regularity conditions (C1)-(C4) hold and $ε_{i j}$ ’s are independent, then

$\sqrt{n} (\tilde{β} - β_{0}) \to N (0, Σ_{1}^{- 1} Σ_{2}^{*} Σ_{1}^{- 1})$

in distribution as $n \to \infty$ , where

$Σ_{1} = \lim_{n \to \infty} \frac{1}{n} \sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} E {F_{1} (X_{i j}, τ_{1}) X_{i j} X_{i j}^{Τ}},$

$Σ_{2}^{*} = \lim_{n \to \infty} \frac{1}{n} \sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} E {G_{1} (X_{i j}, τ_{1}) X_{i j} X_{i j}^{Τ}} .$

Theorem 2 If regularity conditions (C1)-(C9) hold, then

$\sqrt{n} (\hat{θ} - θ_{0}) \to N (0, Σ_{3}^{- 1} Σ_{4} Σ_{3}^{- 1})$

in distribution as $n \to \infty$ , where

$Σ_{3} = \lim_{n \to \infty} \frac{1}{n} \sum_{i = 1}^{n} \sum_{j = 2}^{m_{i}} E {F_{2} (X_{i j}, τ_{2}) E (δ_{i j} δ_{i j}^{Τ} | {\bar{X}}_{i j})},$

$Σ_{4} = \lim_{n \to \infty} \frac{1}{n} \sum_{i = 1}^{n} \sum_{j = 2}^{m_{i}} E {G_{2} (X_{i j}, τ_{2}) E (δ_{i j} δ_{i j}^{Τ} | {\bar{X}}_{i j})} .$

In fact, we have

$E (δ_{i j} δ_{i j}^{Τ} | {\bar{X}}_{i j}) = (\begin{matrix} X_{i j} X_{i j}^{Τ} & 0 \\ 0 & H ({\bar{X}}_{i j}) \end{matrix}) .$ (8)

Then it can be deduced that $\hat{β}$ and $\hat{γ}$ are asymptotically independent, that is, the following corollaries hold.

Corollary 2 If regularity conditions (C1)-(C9) hold, then

$\sqrt{n} (\hat{β} - β_{0}) \to N (0, Σ_{5}^{- 1} Σ_{6} Σ_{5}^{- 1})$

in distribution as $n \to \infty$ , where

$Σ_{5} = \lim_{n \to \infty} \frac{1}{n} \sum_{i = 1}^{n} \sum_{j = 2}^{m_{i}} E {F_{2} (X_{i j}, τ_{2}) X_{i j} X_{i j}^{Τ}},$

$Σ_{6} = \lim_{n \to \infty} \frac{1}{n} \sum_{i = 1}^{n} \sum_{j = 2}^{m_{i}} E {G_{2} (X_{i j}, τ_{2}) X_{i j} X_{i j}^{Τ}} .$

Corollary 3 If regularity conditions (C1)-(C9) hold, then

$\sqrt{n} (\hat{γ} - γ_{0}) \to N (0, Σ_{7}^{- 1} Σ_{8} Σ_{7}^{- 1})$

in distribution as $n \to \infty$ , where

$Σ_{7} = \lim_{n \to \infty} \frac{1}{n} \sum_{i =1}^{n} \sum_{j =2}^{m_{i}} E {F_{2} (X_{i j}, τ_{2}) H ({\bar{X}}_{i j})},$

$Σ_{8} = \lim_{n \to \infty} \frac{1}{n} \sum_{i =1}^{n} \sum_{j =2}^{m_{i}} E {G_{2} (X_{i j}, τ_{2}) H ({\bar{X}}_{i j})} .$

4. Implementation of the ESL Estimator

4.1. IRLS Algorithm

In this subsection, we develop an IRLS algorithm to calculate the parameter estimates. The IRLS algorithm has been commonly adopted for general M-estimators. Since the maximizers of (2) and (7) can be regarded as special M-estimators, the IRLS algorithm can be carried out to find $\tilde{β}$ and $\hat{θ}$ . In the following, we first develop the IRLS algorithm to find the maximizer of (2), and then we can calculate the maximizer of (7) in a similar way. Later, we summarize the algorithm in detail.

Because $\tilde{β}$ maximizes (2), we have the following normal equation:

$\sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} X_{i j} {ϕ^{'}}_{τ_{1}} (Y_{i j} - X_{i j}^{Τ} \tilde{β}) = 0,$

$\sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} X_{i j} ϕ_{τ_{1}} (Y_{i j} - X_{i j}^{Τ} \tilde{β}) (Y_{i j} - X_{i j}^{Τ} \tilde{β}) = 0.$ (9)

Let $W_{i j} = ϕ_{τ_{1}} (Y_{i j} - X_{i j}^{Τ} \tilde{β})$ , then (9) can be transformed as

$\sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} X_{i j} W_{i j} (Y_{i j} - X_{i j}^{Τ} \tilde{β}) = 0,$

or $X^{Τ} W X \tilde{β} = X^{Τ} W Y$ , where $Y = {(Y_{1}, \dots, Y_{n})}^{Τ}$ , $X = {(X_{1}, \dots, X_{n})}^{Τ}$ and $W = diag (W_{11}, \dots, W_{n, m_{n}})$ . Given the k-th approximation ${\tilde{β}}^{(k)}$ , we can compute the corresponding weight matrix $W^{(k)}$ with $W_{i j}^{(k)} = ϕ_{τ_{1}} (Y_{i j} - X_{i j}^{Τ} {\tilde{β}}^{(k)})$ . Then we have

$β^{(k + 1)} = {(X^{Τ} W^{(k)} X)}^{- 1} X^{Τ} W^{(k)} Y .$ (10)

This iteration of (10) will monotonically non-decrease the objective function (2), that is, $L_{1} ({\tilde{β}}^{(k + 1)}) - L_{1} ({\tilde{β}}^{(k)}) \geq 0$ . In fact,

$\begin{array}{l} \log {L_{1} ({\tilde{β}}^{(k + 1)})} - \log {L_{1} ({\tilde{β}}^{(k)})} \\ = \log (\sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} W_{i j}^{(k + 1)}) - \log (\sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} W_{i j}^{(k)}) = \log (\sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} \frac{W_{i j}^{(k + 1)}}{\sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} W_{i j}^{(k)}}) \end{array}$

$= \log (\sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} \frac{W_{i j}^{(k)}}{\sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} W_{i j}^{(k)}} \frac{W_{i j}^{(k + 1)}}{W_{i j}^{(k)}}) = \log (\sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} π_{i j}^{(k)} \frac{W_{i j}^{(k + 1)}}{W_{i j}^{(k)}}),$

where $π_{i j}^{(k)} = \frac{W_{i j}^{(k)}}{\sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} W_{i j}^{(k)}}$ . Based on the Jensen’s inequality, we have

$\begin{array}{l} \log {L_{1} ({\tilde{β}}^{(k + 1)})} - \log {L_{1} ({\tilde{β}}^{(k)})} \\ \geq \sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} π_{i j}^{(k)} \log (\frac{W_{i j}^{(k + 1)}}{W_{i j}^{(k)}}) \\ = \frac{1}{τ_{1}} \sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} π_{i j}^{(k)} {{(Y_{i j} - X_{i j}^{Τ} {\tilde{β}}^{(k)})}^{2} - {(Y_{i j} - X_{i j}^{Τ} {\tilde{β}}^{(k + 1)})}^{2}} . \end{array}$

From expression (10), we can find out that ${\tilde{β}}^{(k + 1)}$ minimizes ${(Y - X β)}^{Τ} W^{(k)} (Y - X β)$ , or $\sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} W_{i j}^{(k)} {(Y_{i j} - X_{i j}^{Τ} β)}^{2}$ . Then we have $\log {L_{1} ({\tilde{β}}^{(k + 1)})} - \log {L_{1} ({\tilde{β}}^{(k)})} \geq 0$ .

The IRLS algorithm is summarized as follows:

Step 1. Computation of $\tilde{β}$ by maximizing (2). Take the initial value ${\tilde{β}}^{(0)}$ as the ordinary least squares (OLS) estimator. Given the k-th approximation ${\tilde{β}}^{(k)}$ , the IRLS iteration updates ${\tilde{β}}^{(k + 1)}$ through (10). Repeat this iteration until the convergence occurs. The resulting estimator is denoted as $\tilde{β}$ .

Step 2. Computation of $\hat{θ} = {({\hat{β}}^{Τ}, {\hat{γ}}^{Τ})}^{Τ}$ by maximizing (7). Take ${\hat{θ}}^{(0)}$ as the OLS estimator by minimizing $\sum_{i = 1}^{n} \sum_{j = 2}^{m_{i}} {(Y_{i j} - {\tilde{δ}}_{i j}^{Τ} θ)}^{2}$ . Similar to (10), given the k-th approximation ${\hat{θ}}^{(k)}$ , the IRLS iteration updates

${\hat{θ}}^{(k + 1)} = {({\tilde{Δ}}^{Τ} V^{(k)} \tilde{Δ})}^{- 1} {\tilde{Δ}}^{Τ} V^{(k)} Y,$

where $\tilde{Δ} = {({\tilde{δ}}_{1}, \dots, {\tilde{δ}}_{n})}^{Τ}$ , ${\tilde{δ}}_{i} = {({\tilde{δ}}_{i 2}, \dots, {\tilde{δ}}_{i m_{i}})}^{Τ}$ , $V^{(k)} = diag (V_{12}^{(k)}, \dots, V_{n, m_{n}}^{(k)})$ , and $V_{i j}^{(k)} = ϕ_{τ_{2}} (Y_{i j} - {\tilde{δ}}_{i j}^{Τ} {\hat{θ}}^{(k)})$ . Repeat this iteration until the convergence occurs. The resulting estimator is denoted as $\hat{θ}$ .

4.2. The Choice of Tuning Parameters

In this subsection, we give a data driving method to determine the tuning parameters ${τ_{1}, τ_{2}}$ . In order to simplify the calculation with respect to $τ_{1}$ , we assume that the random error $ε_{i j}$ ’s are independent of each other and $X_{i j}$ ’s. Then from Corollary 1, we can obtain that the ratio between the asymptotic variance of the initial ESL estimator and that of the OLS estimator for $β$ is

$r (τ_{1}) ≜ \frac{G_{1} (τ_{1}) F_{1}^{- 2} (τ_{1})}{σ^{2}},$

where $σ^{2} = var (ε_{i j})$ . Therefore, the ideal choice of $τ_{1}$ is

$τ_{1, opt} = \arg \min_{τ_{1}} r (τ_{1}) = \arg \min_{τ_{1}} G_{1} (τ_{1}) F_{1}^{- 2} (τ_{1}) / σ^{2} .$

Then $r (τ_{1})$ can be estimated by ${\tilde{G}}_{1} (τ_{1}) {\tilde{F}}_{1}^{- 2} (τ_{1}) / {\tilde{σ}}^{2}$ , where

${\tilde{F}}_{1} (τ_{1}) = \frac{1}{N} \sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} {ϕ^{″}}_{τ_{1}} ({\tilde{ε}}_{i j}) and {\tilde{G}}_{1} (τ_{1}) = \frac{1}{N} \sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} {ϕ^{'}}_{τ_{1}} {({\tilde{ε}}_{i j})}^{2}$

with ${\tilde{ε}}_{i j} = Y_{i j} - X_{i j}^{Τ} \tilde{β}$ , and $\tilde{σ}$ is the standard deviation of ${\tilde{ε}}_{i j}$ . Then $τ_{1, opt}$ can be easily obtained using the grid search approach. $τ_{2}$ can also be chosen in a similar way.

5. Simulation Studies

In this section, we conduct some simulation studies to investigate the finite sample performance of the proposed approach. We generate 200 datasets and consider sample sizes $n = 50$ , 100 and 200. In particular, the datasets are generated from the following model:

$Y_{i j} = X_{i j}^{Τ} β + ε_{i j}, i = 1, \dots, n, j = 1, \dots, m_{i},$

where $m_{i} - 1 \sim binomial (6,0.8)$ , $X_{i j 1}, X_{i j 2} \sim N (0,1)$ , $β = {(1,2)}^{Τ}$ ,

$t_{i j} \sim U (0,1)$ , $Z_{j, k}^{(i)} = {(1, t_{i j} - t_{i k})}^{Τ}$ , $γ = {(0.5,0.2)}^{Τ}$ , $D_{i} = I_{m_{i}}$ , and $ε_{i} \sim N (0, Σ_{i})$ with $Φ_{i} Σ_{i} Φ_{i}^{Τ} = D_{i}$ .

To investigate robustness, we denote the above datasets as no contamination (NC) situation and consider the following four contaminations:

1) $ε_{i}$ follows the multivariate t-distribution with 2 degrees of freedom and covariance matrix $Σ_{i}$ .

2) $ε_{i}$ follows the multivariate t-distribution with 2 degrees of freedom and covariance matrix $Σ_{i}$ , and randomly choose 2% of $X_{i j}$ to be $X_{i j} + {(6,6)}^{Τ}$ .

3) $ε_{i} \sim N (0, Σ_{i})$ , and randomly choose 2% of $ε_{i j}$ to be $ε_{i j} + 15$ .

4) $ε_{i} \sim N (0, Σ_{i})$ , randomly choose 2% of $ε_{i j}$ to be $ε_{i j} + 15$ and 2% of $X_{i j}$ to be $X_{i j} + {(6,6)}^{Τ}$ .

We compare the proposed ESL method with the OLS method, the M-estimator (M) in [26] and the quantile regression (QR) method in [27]. Note that the OLS, M and QR methods follow the estimation procedure similar to that of ESL, while the main difference is that the objective function is respectively replaced by their counterparts. To assess the finite sample performance, we calculate the mean and standard deviation (SD) for the estimators of $β$ and $γ$ . The corresponding simulation results are displayed in Tables 1-5.

From Table 1, it can be observed that the performance of the M, QR and ESL methods is comparable to that of the OLS method, when the error follows a normal distribution and there are no outliers in the data. From Tables 2-5, it can be found out that the M, QR and ESL methods outperform significantly the OLS method, particularly in terms of SD, in several contamination cases; moreover, the ESL method always perform best in these cases. More specifically, Table 2 and Table 3 indicate that the M, QR and ESL methods outperform significantly the OLS method with respect to both $β$ and $γ$ , when the error follows a heavy-tailed error distribution. Table 4 and Table 5 indicate that the M, QR and ESL methods outperform significantly the OLS method with respect to $β$ , when there are outliers in responses; the OLS, M and QR methods perform rather poorly with respect to $γ$ , however, the ESL method sill performs well in this case.

6. Real Data Analysis

In this section, we analyse the CD4 cell study, which was previously analysed by [7] [28]. This dataset comprises CD4 cell counts of 369 HIV-infected men, and there are totally 2376 values collected at different times for each individual, over a period of approximately eight and a half years. The number of measurements for each individual varies from 1 to 12 and the time points are not equally spaced. We use square roots of the CD4 counts [28] to make the response variable closer to the normal distribution, and the related six covariates are respectively time since seroconversion $t_{i j}$ , age relative to arbitrary origin $X_{i j 1}$ , packs of cigarettes smoked per day $X_{i j 2}$ , recreational drug use $X_{i j 3}$ , number of sexual partners $X_{i j 4}$ and mental illness score $X_{i j 5}$ . Note that Figure 1 displays the sample regressogram and local linear fitted curve for the square root of CD4 count over time, which reflects polynomial trend with respect to time. In the following, we use the mean model [1].

Table 1. Parameter estimates (with SD in parentheses) for the NC situation.

Table 2. Parameter estimates (with SD in parentheses) for the first contamination situation.

Table 3. Parameter estimates (with SD in parentheses) for the second contamination situation.

Table 4. Parameter estimates (with SD in parentheses) for the third contamination situation.

Table 5. Parameter estimates (with SD in parentheses) for the fourth contamination situation.

Figure 1. Sample regressogram and local linear fitted curve for the CD4 data: square root of CD4+ number versus time.

$Y_{i j} = β_{1} X_{i j 1} + β_{2} X_{i j 2} + β_{3} X_{i j 3} + β_{4} X_{i j 4} + β_{5} X_{i j 5} + f (t_{i j}) + ε_{i j},$

where $f (t) = β_{0} + β_{6} t_{+} + β_{7} t_{+}^{2}$ with $t_{+} = t I (t > 0)$ . We use cubic polynomial to model the generalized autoregressive parameters, that is,

$Z_{j, k}^{(i)} = {(1, t_{i j} - t_{i k}, {(t_{i j} - t_{i k})}^{2}, {(t_{i j} - t_{i k})}^{3})}^{Τ}$ .

We apply the OLS, M, QR methods and the proposed ESL method to the CD4 cell study. To assess the prediction performance, we randomly split the data into three parts, each with 369/3 = 123 subjects. We use the first two parts as the training dataset to fit the model, and then assess the out of sample performance on the testing dataset (defined as TD) which is left out. This process is repeated 200 times. We define the median absolute prediction error (MAPE) as median of ${| Y_{i j} - {\hat{Y}}_{i j} |, i \in TD, j = 1, \dots, m_{i}}$ . To illustrate the estimation robustness of the proposed ESL method compared with the other methods, we re-analyse the dataset by including 5% outliers in the dataset, which are randomly generated by replacing $Y_{i j}$ with $Y_{i j} + 300$ . Moreover, we also re-analyse the dataset by including 5% outliers only in the training data set, in order to assess the robustness in terms of prediction performance. The results are displayed in Table 6. It can be observed that the estimates for $γ$ are very similar based on different methods, but the estimates for $β$ based on the ESL method are somewhat distinct with those of the OLS, M and QR methods in the no outlier case. Moreover, the MAPE of the proposed ESL method is slightly larger than the others, indicating that these methods possess comparative prediction performance in the case of no outlier. In the 5% outliers case, the MAPEs indicate that the ESL, M and QR methods perform similarly but much better than the OLS method in the prediction performance. Comparing the no outlier with 5% outliers case, we can see that the estimates based on the ESL method varies more slightly than the other methods, especially in terms of the estimates for $γ$ ; the ESL, M and QR methods are robust to the outliers, while the OLS method is adversely affected by the outliers.

7. Conclusions

In this paper, based on the ESL function, we proposed a robust estimation approach for the mean and generalized autoregressive parameters with longitudinal

Table 6. Parameter estimates and prediction results for the CD4 data.

data. The generalized autoregressive parameters that resulted from the MCD of the covariance matrix are unconstrained and can be well interpreted by statistical concepts in the framework of time series. Then the mean and generalized autoregressive parameters can be estimated via linear regression models using the ESL function. Moreover, the balance between the robustness and efficiency can be achieved by choosing appropriate data adaptive tuning parameters. Under certain conditions, we established the theoretical properties. Simulation studies and real data analysis were also carried out to illustrate the finite sample performance of our approach.

Several further problems need to be investigated. First, the dimension of the covariates in regression models is assumed to be fixed, thus it is interesting to extend our approach to the high-dimensional settings. Second, the models can be extended to nonparametric and semiparametric models. For more discussion along this line, references including [10] [29] may be helpful. Moreover, this paper targets the conditional mean of the response given covariates, which suffers from difficulties when the conditional distribution of the response is asymmetric. In this case, the conditional mode may be a more useful summary than the conditional mean, and thus modal linear regression [30] may be an interesting problem.

Acknowledgements

We thank the editor and the referee for their comments. This work was supported by the National Natural Science Foundation of China (No. 11971001) and the Beijing Natural Science Foundation (No. 1182002).

Appendix

Lemma 1 If regularity conditions (C1)-(C5) hold, then with probability approaching to 1, there exists a local maximizer of (2), denoted as $\tilde{β}$ , such that

$‖ \tilde{β} - β_{0} ‖ = O_{P} (\frac{1}{\sqrt{n}}) .$

Proof of Lemma 1. Let $R_{1} (β) = \frac{1}{n} L_{1} (β) = \frac{1}{n} \sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} ϕ_{τ_{1}} (Y_{i j} - X_{i j}^{Τ} β)$ . It suffices to show that for any given $ρ > 0$ , there exists a large constant $C > 0$ such that

$P {\sup_{‖ v ‖ = C} R_{1} (β_{0} + v / \sqrt{n}) < R_{1} (β_{0})} \geq 1 - ρ,$ (1)

for any p-dimensional vector v satisfying that $‖ v ‖ = C$ . Based on the Taylor expansion, we have

$\begin{array}{l} R_{1} (β_{0} + v / \sqrt{n}) - R_{1} (β_{0}) \\ = \frac{1}{n} \sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} {ϕ_{τ_{1}} (Y_{i j} - X_{i j}^{Τ} (β_{0} + v / \sqrt{n})) - ϕ_{τ_{1}} (Y_{i j} - X_{i j}^{Τ} β_{0})} \\ = \frac{1}{n} \sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} {ϕ_{τ_{1}} (ε_{i j} - v^{Τ} X_{i j} / \sqrt{n}) - ϕ_{τ_{1}} (ε_{i j})} \\ = \frac{1}{n} \sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} {- {ϕ^{'}}_{τ_{1}} (ε_{i j}) v^{Τ} X_{i j} / \sqrt{n} + \frac{1}{2 n} {ϕ^{″}}_{τ_{1}} (ε_{i j}) {(v^{Τ} X_{i j})}^{2} - \frac{1}{6 n^{\frac{3}{2}}} {ϕ^{‴}}_{τ_{1}} (ε_{i j}^{*}) {(v^{Τ} X_{i j})}^{3}} \\ ≜ I_{1} + I_{2} + I_{3}, \end{array}$

where $ε_{i j}^{*}$ lies between $ε_{i j} - v^{Τ} X_{i j} / \sqrt{n}$ and $ε_{i j}$ . Then, for $I_{1}$ , we have

$\begin{matrix} E (I_{1}) = - \frac{1}{n} \sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} E {{ϕ^{'}}_{τ_{1}} (ε_{i j}) v^{Τ} X_{i j} / \sqrt{n}} \\ = - \frac{1}{n} \sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} E {E ({ϕ^{'}}_{τ_{1}} (ε_{i j}) | X_{i j}) v^{Τ} X_{i j} / \sqrt{n}} = 0, \end{matrix}$

and

$var (I_{1}) = E {\frac{1}{n} \sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} {ϕ^{'}}_{τ_{1}} (ε_{i j}) v^{Τ} X_{i j} / \sqrt{n}}^{2} = \frac{1}{n^{3}} \sum_{i = 1}^{n} E {\sum_{j = 1}^{m_{i}} {ϕ^{'}}_{τ_{1}} (ε_{i j}) v^{Τ} X_{i j}}^{2} .$

Using the C_r inequality and condition (C1), we can obtain that

$E {\sum_{j = 1}^{m_{i}} {ϕ^{'}}_{τ_{1}} (ε_{i j}) v^{Τ} X_{i j}}^{2} \leq m_{i} \sum_{j = 1}^{m_{i}} E {{ϕ^{'}}_{τ_{1}} (ε_{i j}) v^{Τ} X_{i j}}^{2} \leq M \sum_{j = 1}^{m_{i}} E {{ϕ^{'}}_{τ_{1}} (ε_{i j}) v^{Τ} X_{i j}}^{2}$

for $i = 1, \dots, n$ , then we have

$\begin{matrix} var (I_{1}) \leq \frac{M}{n^{3}} \sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} v^{Τ} E {{ϕ^{'}}_{τ_{1}} {(ε_{i j})}^{2} X_{i j} X_{i j}^{Τ}} v \\ = \frac{M}{n^{3}} \sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} v^{Τ} E {G_{1} (X_{i j}, τ_{1}) X_{i j} X_{i j}^{Τ}} v \\ = O ({‖ v ‖}^{2} n^{- 2}) = O (C^{2} n^{- 2}) . \end{matrix}$

Therefore, we can deduce that $I_{1} = E (I_{1}) + O_{P} (\sqrt{var (I_{1})}) = O_{P} (C n^{- 1})$ . Similarly, we get $I_{3} = O_{P} (n^{- 3 / 2})$ . For $I_{2}$ , we have

$\begin{matrix} I_{2} = \frac{1}{2 n^{2}} \sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} v^{Τ} E (F_{1} (X_{i j}, τ_{1}) X_{i j} X_{i j}^{Τ}) v (1 + o_{P} (1)) \\ = O ({‖ v ‖}^{2} n^{- 1}) = O (C^{2} n^{- 1}) . \end{matrix}$

Note that $‖ v ‖ = C$ , we can choose a sufficiently large C such that $I_{2}$ dominates both $I_{1}$ and $I_{3}$ with a probability of at least $1 - ρ$ . From condition (C3), we get $F_{1} (x, τ_{1}) < 0$ , then (1) holds. The proof of Lemma 1 is finished.

Proof of Theorem 1. Let ${\tilde{φ}}_{i j} = X_{i j}^{Τ} (\tilde{β} - β_{0})$ , then $\tilde{β}$ satisfies the following equation:

$\begin{matrix} 0 = \frac{1}{n} \sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} X_{i j} {ϕ^{'}}_{τ_{1}} (ε_{i j} - {\tilde{φ}}_{i j}) \\ = \frac{1}{n} \sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} X_{i j} {{ϕ^{'}}_{τ_{1}} (ε_{i j}) - {ϕ^{″}}_{τ_{1}} (ε_{i j}) {\tilde{φ}}_{i j} + \frac{1}{2} {ϕ^{‴}}_{τ_{1}} (ε_{i j}^{*}) {\tilde{φ}}_{i j}^{2}} \\ ≜ I_{4} + I_{5} + I_{6}, \end{matrix}$

where $ε_{i j}^{*}$ locates between $ε_{i j} - {\tilde{φ}}_{i j}$ and $ε_{i j}$ . It can be easily shown that

$\begin{matrix} I_{5} = - \frac{1}{n} \sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} {ϕ^{″}}_{τ_{1}} (ε_{i j}) X_{i j} X_{i j}^{Τ} (\tilde{β} - β_{0}) \\ = - \frac{1}{n} \sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} E (F_{1} (X_{i j}, τ_{1}) X_{i j} X_{i j}^{Τ}) (\tilde{β} - β_{0}) + o_{P} (1) . \end{matrix}$

Since $| {\tilde{φ}}_{i j} | = | X_{i j}^{Τ} (\tilde{β} - β_{0}) | = O_{P} (‖ \tilde{β} - β_{0} ‖)$ , then ${| {\tilde{φ}}_{i j} |}^{2} = O_{P} ({‖ \tilde{β} - β_{0} ‖}^{2})$ . Thus, $I_{6} = o_{P} (I_{5})$ . Then,

$\frac{1}{n} \sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} E (F_{1} (X_{i j}, τ_{1}) X_{i j} X_{i j}^{Τ}) \sqrt{n} (\tilde{β} - β_{0}) = \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} \sum_{j = 1}^{m_{i}} X_{i j} {ϕ^{'}}_{τ_{1}} (ε_{i j}) + o_{P} (1) .$

Notice that $E (X_{i j} {ϕ^{'}}_{τ_{1}} (ε_{i j})) = 0$ . For any $p \times 1$ vector $κ$ , let $e_{i} = κ^{Τ} U_{i}$ , where $U_{i} = \sum_{j = 1}^{m_{i}} X_{i j} {ϕ^{'}}_{τ_{1}} (ε_{i j}) = X_{i}^{Τ} {ϕ^{'}}_{τ_{1}} (ε_{i})$ . By C_r inequality and condition (C1), we have

$\begin{matrix} E {| e_{i} |}^{3} \leq M \sum_{j = 1}^{m_{i}} E {| (κ^{Τ} X_{i j}) {ϕ^{'}}_{τ_{1}} (ε_{i j}) |}^{3} \\ = M \sum_{j = 1}^{m_{i}} E {E ({| {ϕ^{'}}_{τ_{1}} (ε_{i j}) |}^{3} | X_{i j}) {| κ^{Τ} X_{i j} |}^{3}} \\ \leq K_{1}, \end{matrix}$

where $K_{1}$ is a constant independent of i. Then by condition (C5), $\frac{1}{n} \sum_{i = 1}^{n} E (U_{i} U_{i}^{Τ})$ converges to a finite positive definite matrix, which is denoted by $Σ_{2}$ here. Thus, we obtain that

$\begin{array}{l} \frac{1}{n} \sum_{i = 1}^{n} var (e_{i}) = \frac{1}{n} \sum_{i = 1}^{n} E (e_{i}^{2}) = \frac{1}{n} \sum_{i = 1}^{n} κ^{Τ} E (U_{i} U_{i}^{Τ}) κ \to κ^{Τ} Σ_{2} κ > 0. \end{array}$

So the proof of theorem 1 is completed by the Lyapunov central limit theorem and Slutsky’s theorem.

Lemma 2 If regularity conditions (C1)-(C9) hold, then with probability approaching to 1, there exists a local maximizer of (7), denoted as $\hat{θ}$ , such that

$‖ \hat{θ} - θ_{0} ‖ = O_{P} (\frac{1}{\sqrt{n}}) .$

Proof of Lemma 2. Since $‖ \tilde{β} - β_{0} ‖ = O_{P} (\frac{1}{\sqrt{n}})$ , we can replace $\tilde{β}$ with $β_{0}$ during the proof from now on. Then ${\tilde{δ}}_{i j} = {(X_{i j}^{Τ}, {\tilde{ζ}}_{i j}^{Τ})}^{Τ}$ can be substituted by $δ_{i j} = {(X_{i j}^{Τ}, ζ_{i j}^{Τ})}^{Τ}$ , where $ζ_{i j} = {(\sum_{k = 1}^{j - 1} ε_{i, k} Z_{j, k, 1}^{(i)}, \dots, \sum_{k = 1}^{j - 1} ε_{i, k} Z_{j, k, q}^{(i)})}^{Τ}$ . Let

$R_{2} (θ) = \frac{1}{n} \sum_{i = 1}^{n} \sum_{j = 2}^{m_{i}} ϕ_{τ_{2}} (Y_{i j} - X_{i j}^{Τ} β - ζ_{i j}^{Τ} γ) = \frac{1}{n} \sum_{i = 1}^{n} \sum_{j = 2}^{m_{i}} ϕ_{τ_{2}} (Y_{i j} - δ_{i j}^{Τ} θ) .$

It suffices to show that for any given $ρ > 0$ , there exists a large constant $C > 0$ such that

$P {\lim_{‖ ν ‖ = C} R_{2} (θ_{0} + ν / \sqrt{n}) < R_{2} (θ_{0})} \geq 1 - ρ,$ (2)

for any $(p + q)$ -dimensional vector $ν$ satisfying that $‖ ν ‖ = C$ . Based on the Taylor expansion, we have

$\begin{array}{l} R_{2} (θ_{0} + ν / \sqrt{n}) - R_{2} (θ_{0}) \\ = \frac{1}{n} \sum_{i = 1}^{n} \sum_{j = 2}^{m_{i}} {ϕ_{τ_{2}} (Y_{i j} - δ_{i j}^{Τ} (θ_{0} + ν / \sqrt{n})) - ϕ_{τ_{2}} (Y_{i j} - δ_{i j}^{Τ} θ_{0})} \\ = \frac{1}{n} \sum_{i = 1}^{n} \sum_{j = 2}^{m_{i}} {ϕ_{τ_{2}} (ϵ_{i j} - ν^{Τ} δ_{i j} / \sqrt{n}) - ϕ_{τ_{2}} (ϵ_{i j})} \\ = \frac{1}{n} \sum_{i = 1}^{n} \sum_{j = 2}^{m_{i}} {- {ϕ^{'}}_{τ_{2}} (ϵ_{i j}) ν^{Τ} δ_{i j} / \sqrt{n} + \frac{1}{2 n} {ϕ^{″}}_{τ_{2}} (ϵ_{i j}) {(ν^{Τ} δ_{i j})}^{2} - \frac{1}{6 n^{\frac{3}{2}}} {ϕ^{‴}}_{τ_{2}} (ϵ_{i j}^{*}) {(ν^{Τ} δ_{i j})}^{3}} \\ ≜ I_{7} + I_{8} + I_{9}, \end{array}$

where $ϵ_{i j}^{*}$ lies between $ϵ_{i j} - ν^{Τ} δ_{i j} / \sqrt{n}$ and $ϵ_{i j}$ . Recall that ${\bar{X}}_{i j} = {(X_{i 1}, \dots, X_{i j})}^{Τ}$ . It should be emphasized that $δ_{i j}$ consists of both covariates and $ε_{i 1}, \dots, ε_{i, j - 1}$ , and thus $ϵ_{i j}$ is independent of $δ_{i j}$ conditional on ${\bar{X}}_{i j}$ . In addition, from (8), we have

$E (δ_{i j} δ_{i j}^{Τ}) = (\begin{matrix} E (X_{i j} X_{i j}^{Τ}) & 0 \\ 0 & E (ζ_{i j} ζ_{i j}^{Τ}) \end{matrix}) .$

Then, for $I_{7}$ , we have

$\begin{matrix} E (I_{7}) = - \frac{1}{n} \sum_{i = 1}^{n} \sum_{j = 2}^{m_{i}} E {{ϕ^{'}}_{τ_{2}} (ϵ_{i j}) ν^{Τ} δ_{i j} / \sqrt{n}} \\ = - \frac{1}{n} \sum_{i = 1}^{n} \sum_{j = 2}^{m_{i}} E {E ({ϕ^{'}}_{τ_{2}} (ϵ_{i j}) | X_{i j}) E (ν^{Τ} δ_{i j} | {\bar{X}}_{i j}) / \sqrt{n}} \\ = 0, \end{matrix}$

and

$\begin{matrix} var (I_{7}) = E {\frac{1}{n} \sum_{i = 1}^{n} \sum_{j = 2}^{m_{i}} {ϕ^{'}}_{τ_{2}} (ϵ_{i j}) ν^{Τ} δ_{i j} / \sqrt{n}}^{2} \\ = \frac{1}{n^{3}} \sum_{i = 1}^{n} \sum_{j = 2}^{m_{i}} ν^{Τ} E {{ϕ^{'}}_{τ_{2}} {(ϵ_{i j})}^{2} δ_{i j} δ_{i j}^{Τ}} ν \\ = \frac{1}{n^{3}} \sum_{i = 1}^{n} \sum_{j = 2}^{m_{i}} ν^{Τ} E {G_{2} (X_{i j}, τ_{2}) E (δ_{i j} δ_{i j}^{Τ} | {\bar{X}}_{i j})} ν \\ = O ({‖ ν ‖}^{2} n^{- 2}) = O (C^{2} n^{- 2}) . \end{matrix}$

Therefore, we can deduce that $I_{7} = E (I_{7}) + O_{P} (\sqrt{var (I_{7})}) = O_{P} (C n^{- 1})$ . Similarly, we get $I_{9} = O_{P} (n^{- 3 / 2})$ . For $I_{8}$ , we have

$\begin{matrix} I_{8} = \frac{1}{2 n^{2}} \sum_{i = 1}^{n} \sum_{j = 2}^{m_{i}} ν^{Τ} E {F_{2} (X_{i j}, τ_{2}) E (δ_{i j} δ_{i j}^{Τ} | {\bar{X}}_{i j})} ν (1 + o_{P} (1)) \\ = O ({‖ ν ‖}^{2} n^{- 1}) = O (C^{2} n^{- 1}) . \end{matrix}$

Note that $‖ ν ‖ = C$ , we can choose a sufficiently large C such that $I_{8}$ dominates both $I_{7}$ and $I_{9}$ with a probability of at least $1 - ρ$ . From condition (C8), we get $F_{2} (x, τ_{2}) < 0$ , then (2) holds. The proof of Lemma 2 is finished.

Proof of Theorem 2. Let ${\hat{α}}_{i j} = δ_{i j}^{Τ} (\hat{θ} - θ_{0})$ , then $\hat{θ}$ satisfies the following equation:

$\begin{matrix} 0 = \frac{1}{n} \sum_{i = 1}^{n} \sum_{j = 2}^{m_{i}} δ_{i j} {ϕ^{'}}_{τ_{2}} (ϵ_{i j} - {\hat{α}}_{i j}) \\ = \frac{1}{n} \sum_{i = 1}^{n} \sum_{j = 2}^{m_{i}} δ_{i j} {{ϕ^{'}}_{τ_{2}} (ϵ_{i j}) - {ϕ^{'}}_{τ_{2}} (ϵ_{i j}) {\hat{α}}_{i j} + \frac{1}{2} {ϕ^{‴}}_{τ_{2}} (ϵ_{i j}^{*}) {\hat{α}}_{i j}^{2}} \\ ≜ I_{10} + I_{11} + I_{12}, \end{matrix}$

where $ϵ_{i j}^{*}$ locates between $ϵ_{i j} - {\hat{α}}_{i j}$ and $ϵ_{i j}$ . It can be easily obtained that

$\begin{matrix} I_{11} = - \frac{1}{n} \sum_{i = 1}^{n} \sum_{j = 2}^{m_{i}} {ϕ^{″}}_{τ_{2}} (ϵ_{i j}) δ_{i j} δ_{i j}^{Τ} (\hat{θ} - θ_{0}) \\ = - \frac{1}{n} \sum_{i = 1}^{n} \sum_{j = 2}^{m_{i}} E {F_{2} (X_{i j}, τ_{2}) E (δ_{i j} δ_{i j}^{Τ} | {\bar{X}}_{i j})} (\hat{θ} - θ_{0}) + o_{P} (1), \end{matrix}$

and

$I_{12} = \frac{1}{2 n} \sum_{i = 1}^{n} \sum_{j = 2}^{m_{i}} {ϕ^{‴}}_{τ_{2}} (ϵ_{i j}^{*}) δ_{i j} {δ_{i j}^{Τ} (\hat{θ} - θ_{0})}^{2} = O_{P} ({‖ \hat{θ} - θ_{0} ‖}^{2}) = o_{P} (I_{11}) .$

Then

$\frac{1}{n} \sum_{i = 1}^{n} \sum_{j = 2}^{m_{i}} E {F_{2} (X_{i j}, τ_{2}) E (δ_{i j} δ_{i j}^{Τ} | {\bar{X}}_{i j})} \sqrt{n} (\hat{θ} - θ_{0}) = \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} \sum_{j = 2}^{m_{i}} δ_{i j} {ϕ^{'}}_{τ_{2}} (ϵ_{i j}) + o_{P} (1) .$

Notice that $E (δ_{i j} {ϕ^{'}}_{τ_{2}} (ϵ_{i j})) = 0$ . For any $(p + q) \times 1$ vector $η$ , let $ψ_{i} = \sum_{j = 2}^{m_{i}} (η^{Τ} δ_{i j}) {ϕ^{'}}_{τ_{2}} (ϵ_{i j})$ . By C_r inequality and condition (C1), we have

$\begin{matrix} E {| ψ_{i} |}^{3} \leq M \sum_{j = 2}^{m_{i}} E {| (η^{Τ} δ_{i j}) {ϕ^{'}}_{τ_{2}} (ϵ_{i j}) |}^{3} \\ = M \sum_{j = 2}^{m_{i}} E {E ({| {ϕ^{'}}_{τ_{2}} (ϵ_{i j}) |}^{3} | X_{i j}) E ({| η^{Τ} δ_{i j} |}^{3} | {\bar{X}}_{i j})} \\ \leq K_{2}, \end{matrix}$

where $K_{2}$ is a constant independent of i. Moreover,

$\begin{matrix} \frac{1}{n} \sum_{i = 1}^{n} var (ψ_{i}) = \frac{1}{n} \sum_{i = 1}^{n} E (ψ_{i}^{2}) = \frac{1}{n} \sum_{i = 1}^{n} E {\sum_{j = 2}^{m_{i}} (η^{Τ} δ_{i j}) {ϕ^{'}}_{τ_{2}} (ϵ_{i j})}^{2} \\ = \frac{1}{n} \sum_{i = 1}^{n} \sum_{j = 2}^{m_{i}} E {{(η^{Τ} δ_{i j})}^{2} {ϕ^{'}}_{τ_{2}} {(ϵ_{i j})}^{2}} \\ = \frac{1}{n} \sum_{i = 1}^{n} \sum_{j = 2}^{m_{i}} η^{Τ} E {G_{2} (X_{i j}, τ_{2}) E (δ_{i j} δ_{i j}^{Τ} | {\bar{X}}_{i j})} η \\ \to K_{3} > 0. \end{matrix}$

So the proof is completed by the Lyapunov central limit theorem and Slutsky’s theorem.

List of Main Symbols

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1]	Diggle, P.J., Heagerty, P.J., Liang, K.-Y. and Zeger, S. (2002) Analysis of Longitudinal Data. Oxford University Press, Oxford.
[2]	Diggle, P.J. and Verbyla, A.P. (1998) Nonparametric Estimation of Covariance Structure in Longitudinal Data. Biometrics, 91, 403-415. https://doi.org/10.2307/3109751
[3]	Liang, K.-Y. and Zeger, S.L. (1986) Longitudinal Data Analysis Using Generalized Linear Models. Biometrika, 73, 13-22. https://doi.org/10.1093/biomet/73.1.13
[4]	Pourahmadi, M. (1999) Joint Mean-Covariance Models with Applications to Longitudinal Data: Unconstrained Parameterisation. Biometrika, 86, 677-690. https://doi.org/10.1093/biomet/86.3.677
[5]	Pourahmadi, M. (2000) Maximum Likelihood Estimation of Generalised Linear Models for Multivariate Normal Covariance Matrix. Biometrika, 87, 425-435. https://doi.org/10.1093/biomet/87.2.425
[6]	Pan, J. and Mackenzie, G. (2003) On Modelling Mean-Covariance Structures in Longitudinal Studies. Biometrika, 90, 239-244. https://doi.org/10.1093/biomet/90.1.239
[7]	Ye, H. and Pan, J. (2006) Modelling of Covariance Structures in Generalised Estimating Equations for Longitudinal Data. Biometrika, 93, 927-941. https://doi.org/10.1093/biomet/93.4.927
[8]	Pourahmadi, M. (2007) Cholesky Decompositions and Estimation of a Covariance Matrix: Orthogonality of Variance-Correlation Parameters. Biometrika, 94, 1006-1013. https://doi.org/10.1093/biomet/asm073
[9]	Daniels, M.J. and Pourahmadi, M. (2009) Modeling Covariance Matrices via Partial Autocorrelations. Journal of Multivariate Analysis, 100, 2352-2363. https://doi.org/10.1016/j.jmva.2009.04.015
[10]	Leng, C., Zhang, W. and Pan, J. (2010) Semiparametric Mean-Covariance Regression Analysis for Longitudinal Data. Journal of the American Statistical Association, 105, 181-193. https://doi.org/10.1198/jasa.2009.tm08485
[11]	Zhang, W. and Leng, C. (2012) A Moving Average Cholesky Factor Model in Covariance Modelling for Longitudinal Data. Biometrika, 99, 141-150. https://doi.org/10.1093/biomet/asr068
[12]	Zhang, W., Leng, C. and Tang, C.Y. (2015) A Joint Modelling Approach for Longitudinal Studies. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 77, 219-238. https://doi.org/10.1111/rssb.12065
[13]	Maadooliat, M., Pourahmadi, M. and Huang, J.Z. (2013) Robust Estimation of the Correlation Matrix of Longitudinal Data. Statistics and Computing, 23, 17-28. https://doi.org/10.1007/s11222-011-9284-6
[14]	Zheng, X., Fung, W.K. and Zhu, Z. (2013) Robust Estimation in Joint Mean-Covariance Regression Model for Longitudinal Data. Annals of the Institute of Statistical Mathematics, 65, 617-638. https://doi.org/10.1007/s10463-012-0383-8
[15]	Zheng, X., Fung, W.K. and Zhu, Z. (2014) Variable Selection in Robust Joint Mean and Covariance Model for Longitudinal Data Analysis. Statistica Sinica, 24, 515-531.
[16]	Lv, J. and Guo, C. (2017) Efficient Parameter Estimation via Modified Cholesky Decomposition for Quantile Regression with Longitudinal Data. Computational Statistics, 32, 947-975. https://doi.org/10.1007/s00180-017-0714-6
[17]	Lv, J., Guo, C., Yang, H. and Li, Y. (2017) A Moving Average Cholesky Factor Model in Covariance Modeling for Composite Quantile Regression with Longitudinal Data. Computational Statistics & Data Analysis, 112, 129-144. https://doi.org/10.1016/j.csda.2017.02.015
[18]	Lv, J., Guo, C. and Wu, J. (2019) Smoothed Empirical Likelihood Inference via the Modified Cholesky Decomposition for Quantile Varying Coefficient Models with Longitudinal Data. TEST, 28, 999-1032. https://doi.org/10.1007/s11749-018-0616-0
[19]	Lv, J. and Guo, C. (2019) Quantile Estimations via Modified Cholesky Decomposition for Longitudinal Single-Index Models. Annals of the Institute of Statistical Mathematics, 71, 1163-1199. https://doi.org/10.1007/s10463-018-0673-x
[20]	Wang, X., Jiang, Y., Huang, M. and Zhang, H. (2013) Robust Variable Selection with Exponential Squared Loss. Journal of the American Statistical Association, 108, 632-643. https://doi.org/10.1080/01621459.2013.766613
[21]	Lv, J., Yang, H. and Guo, C. (2015) An Efficient and Robust Variable Selection Method for Longitudinal Generalized Linear Models. Computational Statistics & Data Analysis, 82, 74-88. https://doi.org/10.1016/j.csda.2014.08.006
[22]	Wang, K. and Lin, L. (2016) Robust and Efficient Estimator for Simultaneous Model Structure Identification and Variable Selection in Generalized Partial Linear Varying Coefficient Models with Longitudinal Data. Statistical Papers, 1-28.
[23]	Li, S., Wang, K. and Ren, Y. (2018) Robust Estimation and Empirical Likelihood Inference with Exponential Squared Loss for Panel Data Models. Economics Letters, 164, 19-23. https://doi.org/10.1016/j.econlet.2017.12.029
[24]	Yang, H., Lv, J. and Guo, C. (2015) Robust Variable Selection in Modal Varying-Coefficient Models with Longitudinal. Journal of Statistical Computation and Simulation, 85, 3064-3079. https://doi.org/10.1080/00949655.2014.949716
[25]	Wang, K., Li, S., Sun, X. and Lin, L. (2019) Modal Regression Statistical Inference for Longitudinal Data Semivarying Coefficient Models: Generalized Estimating Equations, Empirical Likelihood and Variable Selection. Computational Statistics & Data Analysis, 133, 257-276. https://doi.org/10.1016/j.csda.2018.10.010
[26]	Huber, P.J. (2011) Robust Statistics. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04898-2_594
[27]	Koenker, R. (2005) Quantile Regression (Econometric Society Monographs; No. 38). Cambridge University Press, Cambridge. https://doi.org/10.1017/CBO9780511754098
[28]	Zeger, S.L. and Diggle, P.J. (1994) Semiparametric Models for Longitudinal Data with Application to CD4 Cell Numbers in HIV Seroconverters. Biometrics, 50, 689-699. https://doi.org/10.2307/2532783
[29]	Yao, W. and Li, R. (2013) New Local Estimation Procedure for a Non-Parametric Regression Function for Longitudinal Data. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 75, 123-138. https://doi.org/10.1111/j.1467-9868.2012.01038.x
[30]	Yao, W. and Li, L. (2014) A New Regression Model: Modal Linear Regression. Scandinavian Journal of Statistics, 41, 656-671. https://doi.org/10.1111/sjos.12054

Journals Menu

Follow SCIRP

	customer@scirp.org
	+86 18163351462(WhatsApp)
	1655362766

	Paper Publishing WeChat

Journals Menu

Home

About SCIRP

Service

Policies