ESL-Based Robust Estimation for Mean-Covariance Regression with Longitudinal Data ()
1. Introduction
Longitudinal data arises frequently in many fields, such as biological research, social science and other fields. The observations on the same subject are measured repeatedly over time, and thus intrinsically correlated [1]. The covariance matrix of such data is important since ignoring the correlation structure may lead to inefficient estimators of the mean parameter. Furthermore, the covariance matrix itself may be of scientific interest [2]. However, it is challenging to model the covariance matrix which suffers from the positive-definiteness constraint and includes many unknown parameters. To avoid this challenge, a common strategy is to specify the working correlation structure [3], which does not permit more general structures and cannot flexibly incorporate covariates that may be helpful to explain the covariations. To overcome this limitation, joint modelling for the mean and covariance of longitudinal data has received increasing interest recently; see, for example, [4] - [12]. Among these joint modeling approaches, the modified Cholesky decomposition (MCD) for the covariance matrix proposed in [4] [5] is attractive owing to the fact that it leads automatically to positive definite covariance matrix, and the parameters in it can be interpreted by suitable statistical concepts. As a result, the regression techniques and model based inference can be adopted to the parameters in this decomposition, see [7] for more details.
However, the aforementioned approaches are very sensitive to outliers or heavy-tailed distributions. [13] proposed a robust procedure for modeling the correlation matrix of longitudinal data based on an alternative Cholesky decomposition and heavy-tailed multivariate t-distributions with unknown degrees of freedom. It should be pointed out that the use of the multivariate t-distribution alone does not necessarily guarantee robustness. In addition, [14] [15] developed robust generalized estimating equations (GEE) for regression parameters in joint mean-covariance model by employing the Huber’s function and leverage-based weights. [16] developed an efficient parameter estimation via MCD for quantile regression with longitudinal data. [17] further proposed a moving average Cholesky factor model, which is transformed from MCD, in covariance modeling for composite quantile regression with longitudinal data. Then, [18] carried out smoothed empirical likelihood inference via MCD for quantile varying coefficient models with longitudinal data. Later, [19] developed quantile estimation via MCD for longitudinal single-index models.
Although M-type regression and quantile regression procedures can overcome outliers and heavy-tail errors, they may lose efficiency under the normal distribution. To overcome this difficulty, [20] recently proposed a robust variable selection approach by adopting the exponential squared loss (ESL) function with a tuning parameter. They have showed that, with properly selected tuning parameter, the proposed approach not only achieves good robustness with respect to outliers in the dataset, but also is as asymptotically efficient as the least squares estimation without outliers under normal error. Later, some authors employed the ESL funtion for longitudinal data. For example, [21] propsed an efficient and robust variable selection method for longitudinal generalized linear models based on GEE. [22] proposed a robust and efficient estimation procedure for simultaneous model structure identification and variable selection in generalized partial linear varying coefficient models for longitudinal data. [23] developed GEE-based robust estimation and empirical likelihood inference approach with ESL for panel data models. With a similar loss function, [24] proposed a robust variable selection method in modal varying-coefficient models with longitudinal data. [25] proposed modal regression statistical inference for longitudinal semivarying coefficient models, including GEE, empirical likelihood and variable selection. However, to our knowledge, there is no paper to investigate the robust estimation procedure against outliers within the framework of mean-covariance regression analysis for longitudinal data employing the ESL function.
In this paper, we propose a robust estimation approach for the model parameters of the mean and generalized autoregressive parameters in the within subject covariance matrices for longitudinal data based on the ESL function. We begin with the ESL-based estimation for the mean parameters pretending that the repeated measurements within a subject are independent. Then based on the roughly estimated mean parameters, the simultaneous estimation for the mean and generalized autoregressive parameters is carried out using the ESL function. The proposed estimators can be shown to be asymptotically normal under certain conditions. Moreover, we develop an iteratively reweighted least squares (IRLS) algorithm [26] to calculate the parameter estimates. Numerical studies are carried out to illustrate the finite sample performance of our approach.
The outline of this paper is organized as follows. Section 2 develops the robust estimation procedure. Section 3 establishes the asymptotic properties of the proposed ESL estimators. The IRLS algorithm and a data-driven method for the selection of tuning parameters are presented in Section 4. Simulations are carried out in Section 5. Section 6 analyses a real data set. A discussion is given in Section 7. The technical proofs are provided in the Appendix.
2. Robust Estimation Procedure
2.1. Initial Estimate for the Mean Parameters
Consider n subjects, where each subject is measured repeatedly over time. For the ith subject, suppose that
is the observed scale response variable at time
, and
is the corresponding
covariate vector,
. Denote
. Furthermore, let
,
,
. A longitudinal linear regression model has the form
(1)
where
is the
vector of associated parameters,
is the random error satisfying that
and
.
Define the ESL function
. We first estimate
pretending that the random error
’s are independent. More specifically, we estimate
by maximizing the objective function
(2)
where
is a tuning parameter. The resulting initial estimate of
is denoted by
.
2.2. Simultaneous Estimate for the Mean and Generalized Autoregressive Parameters
Based on the Cholesky decomposition, there exists a lower triangle matrix
with diagonal ones such that
where
. In other words, let
, we obtain that
(3)
where
is the negative of the
element of
. It’s obvious that
’s are uncorrelated with
and
,
. If
’s were available, then (1) could be transformed as the following linear model with uncorrelated error
’s:
(4)
[4] pointed out that the MCD has a well-founded statistical interpretation, and it has the advantage that the generalized autoregressive parameter
’s and log-innovation variance
’s are unconstrained. For simplicity, we assume that
for all
. Since
may depend on
, we adopt a more parsimonious structure,
(5)
where
is the regression coefficient, and the covariates
may contain the time, the baseline covariates
, the interactions and so on.
Based on
, we can obtain the estimated residuals
(6)
From (4), (5) and (6), we can obtain the simultaneous estimate
of
by maximizing the following objective function:
(7)
where
with
. Then we can obtain the estimates of
’s in model (3) by combining (5) and
.
3. Asymptotic Properties
Define
,
,
,
,
,
,
,
,
. To establish the asymptotic properties of the proposed ESL estimator, assume that the following regularity conditions hold:
(C1) There exists a positive integer M such that
. This means that n and N have the same order.
(C2) There exists a positive constant C such that
for
,
and
. In addition,
converges to a finite positive definite matrix in probability.
(C3)
and
are continuous with respect to x. Moreover, for any
,
.
(C4)
,
,
,
and
are continuous with respect to x.
(C5)
converges to a finite positive definite matrix.
(C6) For
,
and
are continuous with respect to x.
(C7)
converges to a finite positive definite matrix in probability.
(C8)
and
are continuous with respect to x. Moreover, for any
,
.
(C9)
,
,
,
and
are continuous with respect to x.
Theorem 1 If regularity conditions (C1)-(C5) hold, then
in distribution as
, where
Then if the random error
’s are independent, we have the following corollary, which is similar to Corollary 2 and useful for the choice of tuning parameters in Section 4.2.
Corollary 1 If regularity conditions (C1)-(C4) hold and
’s are independent, then
in distribution as
, where
Theorem 2 If regularity conditions (C1)-(C9) hold, then
in distribution as
, where
In fact, we have
(8)
Then it can be deduced that
and
are asymptotically independent, that is, the following corollaries hold.
Corollary 2 If regularity conditions (C1)-(C9) hold, then
in distribution as
, where
Corollary 3 If regularity conditions (C1)-(C9) hold, then
in distribution as
, where
4. Implementation of the ESL Estimator
4.1. IRLS Algorithm
In this subsection, we develop an IRLS algorithm to calculate the parameter estimates. The IRLS algorithm has been commonly adopted for general M-estimators. Since the maximizers of (2) and (7) can be regarded as special M-estimators, the IRLS algorithm can be carried out to find
and
. In the following, we first develop the IRLS algorithm to find the maximizer of (2), and then we can calculate the maximizer of (7) in a similar way. Later, we summarize the algorithm in detail.
Because
maximizes (2), we have the following normal equation:
or
(9)
Let
, then (9) can be transformed as
or
, where
,
and
. Given the k-th approximation
, we can compute the corresponding weight matrix
with
. Then we have
(10)
This iteration of (10) will monotonically non-decrease the objective function (2), that is,
. In fact,
where
. Based on the Jensen’s inequality, we have
From expression (10), we can find out that
minimizes
, or
. Then we have
.
The IRLS algorithm is summarized as follows:
Step 1. Computation of
by maximizing (2). Take the initial value
as the ordinary least squares (OLS) estimator. Given the k-th approximation
, the IRLS iteration updates
through (10). Repeat this iteration until the convergence occurs. The resulting estimator is denoted as
.
Step 2. Computation of
by maximizing (7). Take
as the OLS estimator by minimizing
. Similar to (10), given the k-th approximation
, the IRLS iteration updates
where
,
,
, and
. Repeat this iteration until the convergence occurs. The resulting estimator is denoted as
.
4.2. The Choice of Tuning Parameters
In this subsection, we give a data driving method to determine the tuning parameters
. In order to simplify the calculation with respect to
, we assume that the random error
’s are independent of each other and
’s. Then from Corollary 1, we can obtain that the ratio between the asymptotic variance of the initial ESL estimator and that of the OLS estimator for
is
where
. Therefore, the ideal choice of
is
Then
can be estimated by
, where
with
, and
is the standard deviation of
. Then
can be easily obtained using the grid search approach.
can also be chosen in a similar way.
5. Simulation Studies
In this section, we conduct some simulation studies to investigate the finite sample performance of the proposed approach. We generate 200 datasets and consider sample sizes
, 100 and 200. In particular, the datasets are generated from the following model:
where
,
,
,
,
,
,
, and
with
.
To investigate robustness, we denote the above datasets as no contamination (NC) situation and consider the following four contaminations:
1)
follows the multivariate t-distribution with 2 degrees of freedom and covariance matrix
.
2)
follows the multivariate t-distribution with 2 degrees of freedom and covariance matrix
, and randomly choose 2% of
to be
.
3)
, and randomly choose 2% of
to be
.
4)
, randomly choose 2% of
to be
and 2% of
to be
.
We compare the proposed ESL method with the OLS method, the M-estimator (M) in [26] and the quantile regression (QR) method in [27]. Note that the OLS, M and QR methods follow the estimation procedure similar to that of ESL, while the main difference is that the objective function is respectively replaced by their counterparts. To assess the finite sample performance, we calculate the mean and standard deviation (SD) for the estimators of
and
. The corresponding simulation results are displayed in Tables 1-5.
From Table 1, it can be observed that the performance of the M, QR and ESL methods is comparable to that of the OLS method, when the error follows a normal distribution and there are no outliers in the data. From Tables 2-5, it can be found out that the M, QR and ESL methods outperform significantly the OLS method, particularly in terms of SD, in several contamination cases; moreover, the ESL method always perform best in these cases. More specifically, Table 2 and Table 3 indicate that the M, QR and ESL methods outperform significantly the OLS method with respect to both
and
, when the error follows a heavy-tailed error distribution. Table 4 and Table 5 indicate that the M, QR and ESL methods outperform significantly the OLS method with respect to
, when there are outliers in responses; the OLS, M and QR methods perform rather poorly with respect to
, however, the ESL method sill performs well in this case.
6. Real Data Analysis
In this section, we analyse the CD4 cell study, which was previously analysed by [7] [28]. This dataset comprises CD4 cell counts of 369 HIV-infected men, and there are totally 2376 values collected at different times for each individual, over a period of approximately eight and a half years. The number of measurements for each individual varies from 1 to 12 and the time points are not equally spaced. We use square roots of the CD4 counts [28] to make the response variable closer to the normal distribution, and the related six covariates are respectively time since seroconversion
, age relative to arbitrary origin
, packs of cigarettes smoked per day
, recreational drug use
, number of sexual partners
and mental illness score
. Note that Figure 1 displays the sample regressogram and local linear fitted curve for the square root of CD4 count over time, which reflects polynomial trend with respect to time. In the following, we use the mean model [1].
![]()
Table 1. Parameter estimates (with SD in parentheses) for the NC situation.
![]()
Table 2. Parameter estimates (with SD in parentheses) for the first contamination situation.
![]()
Table 3. Parameter estimates (with SD in parentheses) for the second contamination situation.
![]()
Table 4. Parameter estimates (with SD in parentheses) for the third contamination situation.
![]()
Table 5. Parameter estimates (with SD in parentheses) for the fourth contamination situation.
![]()
Figure 1. Sample regressogram and local linear fitted curve for the CD4 data: square root of CD4+ number versus time.
where
with
. We use cubic polynomial to model the generalized autoregressive parameters, that is,
.
We apply the OLS, M, QR methods and the proposed ESL method to the CD4 cell study. To assess the prediction performance, we randomly split the data into three parts, each with 369/3 = 123 subjects. We use the first two parts as the training dataset to fit the model, and then assess the out of sample performance on the testing dataset (defined as TD) which is left out. This process is repeated 200 times. We define the median absolute prediction error (MAPE) as median of
. To illustrate the estimation robustness of the proposed ESL method compared with the other methods, we re-analyse the dataset by including 5% outliers in the dataset, which are randomly generated by replacing
with
. Moreover, we also re-analyse the dataset by including 5% outliers only in the training data set, in order to assess the robustness in terms of prediction performance. The results are displayed in Table 6. It can be observed that the estimates for
are very similar based on different methods, but the estimates for
based on the ESL method are somewhat distinct with those of the OLS, M and QR methods in the no outlier case. Moreover, the MAPE of the proposed ESL method is slightly larger than the others, indicating that these methods possess comparative prediction performance in the case of no outlier. In the 5% outliers case, the MAPEs indicate that the ESL, M and QR methods perform similarly but much better than the OLS method in the prediction performance. Comparing the no outlier with 5% outliers case, we can see that the estimates based on the ESL method varies more slightly than the other methods, especially in terms of the estimates for
; the ESL, M and QR methods are robust to the outliers, while the OLS method is adversely affected by the outliers.
7. Conclusions
In this paper, based on the ESL function, we proposed a robust estimation approach for the mean and generalized autoregressive parameters with longitudinal
![]()
Table 6. Parameter estimates and prediction results for the CD4 data.
data. The generalized autoregressive parameters that resulted from the MCD of the covariance matrix are unconstrained and can be well interpreted by statistical concepts in the framework of time series. Then the mean and generalized autoregressive parameters can be estimated via linear regression models using the ESL function. Moreover, the balance between the robustness and efficiency can be achieved by choosing appropriate data adaptive tuning parameters. Under certain conditions, we established the theoretical properties. Simulation studies and real data analysis were also carried out to illustrate the finite sample performance of our approach.
Several further problems need to be investigated. First, the dimension of the covariates in regression models is assumed to be fixed, thus it is interesting to extend our approach to the high-dimensional settings. Second, the models can be extended to nonparametric and semiparametric models. For more discussion along this line, references including [10] [29] may be helpful. Moreover, this paper targets the conditional mean of the response given covariates, which suffers from difficulties when the conditional distribution of the response is asymmetric. In this case, the conditional mode may be a more useful summary than the conditional mean, and thus modal linear regression [30] may be an interesting problem.
Acknowledgements
We thank the editor and the referee for their comments. This work was supported by the National Natural Science Foundation of China (No. 11971001) and the Beijing Natural Science Foundation (No. 1182002).
Appendix
Lemma 1 If regularity conditions (C1)-(C5) hold, then with probability approaching to 1, there exists a local maximizer of (2), denoted as
, such that
Proof of Lemma 1. Let
. It suffices to show that for any given
, there exists a large constant
such that
(1)
for any p-dimensional vector v satisfying that
. Based on the Taylor expansion, we have
where
lies between
and
. Then, for
, we have
and
Using the Cr inequality and condition (C1), we can obtain that
for
, then we have
Therefore, we can deduce that
. Similarly, we get
. For
, we have
Note that
, we can choose a sufficiently large C such that
dominates both
and
with a probability of at least
. From condition (C3), we get
, then (1) holds. The proof of Lemma 1 is finished.
Proof of Theorem 1. Let
, then
satisfies the following equation:
where
locates between
and
. It can be easily shown that
Since
, then
. Thus,
. Then,
Notice that
. For any
vector
, let
, where
. By Cr inequality and condition (C1), we have
where
is a constant independent of i. Then by condition (C5),
converges to a finite positive definite matrix, which is denoted by
here. Thus, we obtain that
So the proof of theorem 1 is completed by the Lyapunov central limit theorem and Slutsky’s theorem.
Lemma 2 If regularity conditions (C1)-(C9) hold, then with probability approaching to 1, there exists a local maximizer of (7), denoted as
, such that
Proof of Lemma 2. Since
, we can replace
with
during the proof from now on. Then
can be substituted by
, where
. Let
It suffices to show that for any given
, there exists a large constant
such that
(2)
for any
-dimensional vector
satisfying that
. Based on the Taylor expansion, we have
where
lies between
and
. Recall that
. It should be emphasized that
consists of both covariates and
, and thus
is independent of
conditional on
. In addition, from (8), we have
Then, for
, we have
and
Therefore, we can deduce that
. Similarly, we get
. For
, we have
Note that
, we can choose a sufficiently large C such that
dominates both
and
with a probability of at least
. From condition (C8), we get
, then (2) holds. The proof of Lemma 2 is finished.
Proof of Theorem 2. Let
, then
satisfies the following equation:
where
locates between
and
. It can be easily obtained that
and
Then
Notice that
. For any
vector
, let
. By Cr inequality and condition (C1), we have
where
is a constant independent of i. Moreover,
So the proof is completed by the Lyapunov central limit theorem and Slutsky’s theorem.
List of Main Symbols