ESL-Based Robust Estimation for Mean-Covariance Regression with Longitudinal Data

When longitudinal data contains outliers, the classical least-squares approach is known to be not robust. To solve this issue, the exponential squared loss (ESL) function with a tuning parameter has been investigated for longitudinal data. However, to our knowledge, there is no paper to investigate the robust estimation procedure against outliers within the framework of mean-covariance regression analysis for longitudinal data using the ESL function. In this paper, we propose a robust estimation approach for the model parameters of the mean and generalized autoregressive parameters with longitudinal data based on the ESL function. The proposed estimators can be shown to be asymptotically normal under certain conditions. Moreover, we develop an iteratively reweighted least squares (IRLS) algorithm to calculate the parameter estimates, and the balance between the robustness and efficiency can be achieved by choosing appropriate data adaptive tuning parameters. Simulation studies and real data analysis are carried out to illustrate the finite sample performance of the proposed approach.

lead to inefficient estimators of the mean parameter. Furthermore, the covariance matrix itself may be of scientific interest [2]. However, it is challenging to model the covariance matrix which suffers from the positive-definiteness constraint and includes many unknown parameters. To avoid this challenge, a common strategy is to specify the working correlation structure [3], which does not permit more general structures and cannot flexibly incorporate covariates that may be helpful to explain the covariations. To overcome this limitation, joint modelling for the mean and covariance of longitudinal data has received increasing interest recently; see, for example, [4]- [12]. Among these joint modeling approaches, the modified Cholesky decomposition (MCD) for the covariance matrix proposed in [4] [5] is attractive owing to the fact that it leads automatically to positive definite covariance matrix, and the parameters in it can be interpreted by suitable statistical concepts. As a result, the regression techniques and model based inference can be adopted to the parameters in this decomposition, see [7] for more details.
However, the aforementioned approaches are very sensitive to outliers or heavy-tailed distributions. [13] proposed a robust procedure for modeling the correlation matrix of longitudinal data based on an alternative Cholesky decomposition and heavy-tailed multivariate t-distributions with unknown degrees of freedom. It should be pointed out that the use of the multivariate t-distribution alone does not necessarily guarantee robustness. In addition, [14] [15] developed robust generalized estimating equations (GEE) for regression parameters in joint mean-covariance model by employing the Huber's function and leverage-based weights. [16] developed an efficient parameter estimation via MCD for quantile regression with longitudinal data. [17] further proposed a moving average Cholesky factor model, which is transformed from MCD, in covariance modeling for composite quantile regression with longitudinal data. Then, [18] carried out smoothed empirical likelihood inference via MCD for quantile varying coefficient models with longitudinal data. Later, [19] developed quantile estimation via MCD for longitudinal single-index models.
Although M-type regression and quantile regression procedures can overcome outliers and heavy-tail errors, they may lose efficiency under the normal distribution. To overcome this difficulty, [20] recently proposed a robust variable selection approach by adopting the exponential squared loss (ESL) function with a tuning parameter. They have showed that, with properly selected tuning parameter, the proposed approach not only achieves good robustness with respect to outliers in the dataset, but also is as asymptotically efficient as the least squares estimation without outliers under normal error. Later, some authors employed the ESL funtion for longitudinal data. For example, [21] propsed an efficient and robust variable selection method for longitudinal generalized linear models based on GEE. [22] proposed a robust and efficient estimation procedure for simultaneous model structure identification and variable selection in generalized partial linear varying coefficient models for longitudinal data. [23] developed GEE-based robust estimation and empirical likelihood inference approach with ESL for panel data models. With a similar loss function, [24] proposed a robust variable selection method in modal varying-coefficient models with longitudinal data. [25] proposed modal regression statistical inference for longitudinal semivarying coefficient models, including GEE, empirical likelihood and variable selection. However, to our knowledge, there is no paper to investigate the robust estimation procedure against outliers within the framework of mean-covariance regression analysis for longitudinal data employing the ESL function.
In this paper, we propose a robust estimation approach for the model parameters of the mean and generalized autoregressive parameters in the within subject covariance matrices for longitudinal data based on the ESL function. We begin with the ESL-based estimation for the mean parameters pretending that the repeated measurements within a subject are independent. Then based on the roughly estimated mean parameters, the simultaneous estimation for the mean and generalized autoregressive parameters is carried out using the ESL function. The proposed estimators can be shown to be asymptotically normal under certain conditions. Moreover, we develop an iteratively reweighted least squares (IRLS) algorithm [26] to calculate the parameter estimates. Numerical studies are carried out to illustrate the finite sample performance of our approach.
The outline of this paper is organized as follows. Section 2 develops the robust estimation procedure. Section 3 establishes the asymptotic properties of the proposed ESL estimators. The IRLS algorithm and a data-driven method for the selection of tuning parameters are presented in Section 4. Simulations are carried out in Section 5. Section 6 analyses a real data set. A discussion is given in Section 7. The technical proofs are provided in the Appendix.

Robust Estimation
where β is the 1 p × vector of associated parameters, From (4), (5) and (6), we can obtain the simultaneous estimate by maximizing the following objective function: , , , , , To establish the asymptotic properties of the proposed ESL estimator, assume that the following regularity conditions hold: (C1) There exists a positive integer M such that (C2) There exists a positive constant C such that and 0 k p ≤ ≤ . In addition, ,  , G x τ are continuous with respect to x. Moreover, are continuous with respect to x.
Theorem 1 If regularity conditions (C1)-(C5) hold, then ( ) ( ) Then if the random error ij ε 's are independent, we have the following corollary, which is similar to Corollary 2 and useful for the choice of tuning parameters in Section 4.2.
Corollary 1 If regularity conditions (C1)-(C4) hold and ij ε 's are independent, then ( ) ( ) Then it can be deduced that β and γ are asymptotically independent, that is, the following corollaries hold.

IRLS Algorithm
In this subsection, we develop an IRLS algorithm to calculate the parameter estimates. The IRLS algorithm has been commonly adopted for general M-estimators. Since the maximizers of (2) and (7) can be regarded as special M-estimators, the IRLS algorithm can be carried out to find β  and θ . In the following, we first develop the IRLS algorithm to find the maximizer of (2), and then we can calculate the maximizer of (7) in a similar way. Later, we summarize the algorithm in detail.
Because β  maximizes (2), we have the following normal equation: (9) can be transformed as ( ) This iteration of (10) will monotonically non-decrease the objective function (2), that is, Based on the Jensen's inequality, we have From expression (10), we can find out that The IRLS algorithm is summarized as follows: Step 1. Computation of β  by maximizing (2). Take the initial value ( ) 0 β  as the ordinary least squares (OLS) estimator. Given the k-th approximation ( ) through (10). Repeat this iteration until the convergence occurs. The resulting estimator is denoted as β  .
Step 2. Computation of ( )ˆ, Repeat this iteration until the convergence occurs. The resulting estimator is denoted as θ .

The Choice of Tuning Parameters
In this subsection, we give a data driving method to determine the tuning parameters { } 1 2 , τ τ . In order to simplify the calculation with respect to 1 τ , we assume that the random error ij ε 's are independent of each other and ij X 's. Then from Corollary 1, we can obtain that the ratio between the asymptotic variance of the initial ESL estimator and that of the OLS estimator for β is ( ) ( ) ( ) , and σ is the standard deviation of ij ε . Then 1,opt τ can be easily obtained using the grid search approach. 2 τ can also be chosen in a similar way.

Simulation Studies
In this section, we conduct some simulation studies to investigate the finite sample performance of the proposed approach. We generate 200 datasets and consider sample sizes 50 n = , 100 and 200. In particular, the datasets are generated from the following model: To investigate robustness, we denote the above datasets as no contamination (NC) situation and consider the following four contaminations: 1) i ε follows the multivariate t-distribution with 2 degrees of freedom and covariance matrix i Σ .
2) i ε follows the multivariate t-distribution with 2 degrees of freedom and covariance matrix i Σ , and randomly choose 2% of ij X to be ( ) We compare the proposed ESL method with the OLS method, the M-estimator (M) in [26] and the quantile regression (QR) method in [27]. Note that the OLS, M and QR methods follow the estimation procedure similar to that of ESL, while the main difference is that the objective function is respectively replaced by their counterparts. To assess the finite sample performance, we calculate the mean and standard deviation (SD) for the estimators of β and γ . The corresponding simulation results are displayed in Tables 1-5.
From Table 1, it can be observed that the performance of the M, QR and ESL methods is comparable to that of the OLS method, when the error follows a normal distribution and there are no outliers in the data. From Tables 2-5, it can be found out that the M, QR and ESL methods outperform significantly the OLS method, particularly in terms of SD, in several contamination cases; moreover, the ESL method always perform best in these cases. More specifically, Ta-ble 2 and Table 3 indicate that the M, QR and ESL methods outperform significantly the OLS method with respect to both β and γ , when the error follows a heavy-tailed error distribution. Table 4 and Table 5 indicate that the M, QR and ESL methods outperform significantly the OLS method with respect to β , when there are outliers in responses; the OLS, M and QR methods perform rather poorly with respect to γ , however, the ESL method sill performs well in this case.

Real Data Analysis
In this section, we analyse the CD4 cell study, which was previously analysed by [7] [28]. This dataset comprises CD4 cell counts of 369 HIV-infected men, and there are totally 2376 values collected at different times for each individual, over a period of approximately eight and a half years. The number of measurements for each individual varies from 1 to 12 and the time points are not equally spaced. We use square roots of the CD4 counts [28] to make the response variable closer to the normal distribution, and the related six covariates are respectively time since seroconversion ij t , age relative to arbitrary origin and mental illness score 5 ij X . Note that Figure 1 displays the sample regressogram and local linear fitted curve for the square root of CD4 count over time, which reflects polynomial trend with respect to time. In the following, we use the mean model [1].     Table 5. Parameter estimates (with SD in parentheses) for the fourth contamination situation.  ( ) > . We use cubic polynomial to model the generalized autoregressive parameters, that is, We apply the OLS, M, QR methods and the proposed ESL method to the CD4 cell study. To assess the prediction performance, we randomly split the data into three parts, each with 369/3 = 123 subjects. We use the first two parts as the training dataset to fit the model, and then assess the out of sample performance on the testing dataset (defined as TD) which is left out. This process is repeated 200 times. We define the median absolute prediction error (MAPE) as median of To illustrate the estimation robustness of the proposed ESL method compared with the other methods, we re-analyse the dataset by including 5% outliers in the dataset, which are randomly generated by replacing ij Y with 300 ij Y + . Moreover, we also re-analyse the dataset by including 5% outliers only in the training data set, in order to assess the robustness in terms of prediction performance. The results are displayed in Table 6. It can be observed that the estimates for γ are very similar based on different methods, but the estimates for β based on the ESL method are somewhat distinct with those of the OLS, M and QR methods in the no outlier case. Moreover, the MAPE of the proposed ESL method is slightly larger than the others, indicating that these methods possess comparative prediction performance in the case of no outlier. In the 5% outliers case, the MAPEs indicate that the ESL, M and QR methods perform similarly but much better than the OLS method in the prediction performance. Comparing the no outlier with 5% outliers case, we can see that the estimates based on the ESL method varies more slightly than the other methods, especially in terms of the estimates for γ ; the ESL, M and QR methods are robust to the outliers, while the OLS method is adversely affected by the outliers.

Conclusions
In this paper, based on the ESL function, we proposed a robust estimation approach for the mean and generalized autoregressive parameters with longitudinal  data. The generalized autoregressive parameters that resulted from the MCD of the covariance matrix are unconstrained and can be well interpreted by statistical concepts in the framework of time series. Then the mean and generalized autoregressive parameters can be estimated via linear regression models using the ESL function. Moreover, the balance between the robustness and efficiency can be achieved by choosing appropriate data adaptive tuning parameters. Under certain conditions, we established the theoretical properties. Simulation studies and real data analysis were also carried out to illustrate the finite sample performance of our approach. Several further problems need to be investigated. First, the dimension of the covariates in regression models is assumed to be fixed, thus it is interesting to extend our approach to the high-dimensional settings. Second, the models can be extended to nonparametric and semiparametric models. For more discussion along this line, references including [10] [29] may be helpful. Moreover, this paper targets the conditional mean of the response given covariates, which suffers from difficulties when the conditional distribution of the response is asymmetric. In this case, the conditional mode may be a more useful summary than the conditional mean, and thus modal linear regression [30] may be an interesting problem.
Lemma 1 If regularity conditions (C1)-(C5) hold, then with probability approaching to 1, there exists a local maximizer of (2), denoted as β  , such that for any p-dimensional vector v satisfying that v C = . Based on the Taylor expansion, we have and ij ε . Then, for 1 I , we have Using the C r inequality and condition (C1), we can obtain that