Robust Inference for Time-Varying Coefficient Models with Longitudinal Data

Time-varying coefficient models are useful in longitudinal data analysis. Various efforts have been invested for the estimation of the coefficient functions, based on the least squares principle. Related work includes smoothing spline and kernel methods among others, but these methods suffer from the shortcoming of non-robustness. In this paper, we introduce a local M-estimation method for estimating the coefficient functions and develop a robustified generalized likelihood ratio (GLR) statistic to test if some of the coefficient functions are constants or of certain parametric forms. The robustified GLR test is robust against outliers and the error distribution. This provides a useful robust inference tool for the models with longitudinal data. The bandwidth selection issue is also addressed to facilitate the implementation in practice. Simulations show that the proposed testing method is more powerful in some situations than its counterpart based on the least squares principle. A real example is also given for illustration.


Introduction
The defining characteristic of a longitudinal data study is that individuals are measured repeatedly over a given time period, longitudinal studies are in contrast to cross-sectional studies, in which a single outcome is measured for each individual.The repeated measurements within each subject are generally correlated with each other, but different individuals can be assumed to be independent.The primary advantage of a longitudinal study is its effectiveness for studying changes over time.Statistical research in this field has been very active, and many pa-rametric models have been developed.See Diggle et al. [1], Davidian and Giltinan [2], Vonesh and Chinchilli [3], and the references therein.
As overwhelming longitudinal data exist in biomedical studies, there are increasing demands for a generally applicable inference tool for analysing these datasets.The parametric models are efficient for analysing longitudinal data but may suffer from mis-specification.To reduce possible modeling bias, different nonparametric and semiparametric methods have been studied in the literature, for example Zeger and Diggle [4], Müller [5], Eubank et al. [6], He et al. [7] [8], among others.In particular, a useful nonparametric model for analysing time-varying effects of covariates receives much attention.Examples include Hoover et al. [9], Fan and Zhang [10], Huang et al. [11] and others.These works focus on the least-squares based estimation.While they are useful in some applications, the shortcoming for lack of robustness is naturally raised.This motivates us to consider a robust inference tool for the time-varying coefficient models.
In this paper, we consider a local M-estimation approach based on local polynomial smoother and a robustified "generalized likelihood ratio (GLR)" statistic to test if parts of the coefficients are constants or of certain parametric forms.This in particular allows one to nonparametrically check the goodness-of-fit of the usual linear models widely used in practice (see for example Diggle et al. [1]).We conduct extensive simulations to demonstrate that the proposed estimation method is robust against outliers and error distributions, and that the robustified GLR tests are powerful than its counterpart when the error deviates away from normality.
This paper is organized as follows.In section 2, the local M-estimation approach is introduced; a data-driven bandwidth selection rule is also given.In Section 3, we focus on the robustified GLR tests.Simulations are conducted in Section 4, where the performances of the different tests are compared, and a real example is also used to illustrate the proposed method.Finally, the paper is concluded by a discussion.

Local M-Estimation
Consider the following time-varying coefficient model, where is a zero-mean correlated stochastic process that cannot be explained by the covariates ( ) . As in Hoover et al. [9] we regard the repeated observations ( )  , as a random sample from model (1), that is, where is the observed covariates for the ith subject at time ij t , and For this model, we assume that the measurements on the responses for different subjects are independent, but ij X may be correlated at different time points within each subject, ( ) i t ε 's are also independent for different subjects.Model (1) or ( 2) is a useful extension to the usual linear model for longitudinal/panel data analysis in Lindsey [12], Jones [13], Diggle, et al. [1], Hand and Crower [14], among others.
There are several methods for estimating the coefficient ( ) t β , for example, the smoothing spline method in Brumback and Rice [15], Hoover et al. [9], and Chiang et al. [16], the kernel smoother in Hoover et al. [9], Wu et al. [17], Wu and Chiang [18], and Chiang et al. [16], and other methods such as the two-step estimation in Fan and Zhang [10] and the global smoothing procedure using basis function approximations in Huang et al. [11].These methods are all based on the least squares principle and suffer from the shortcoming of non-robustness.It is worthy of developing a robust estimation and testing method for the model (1).Generally speaking, one can develop different testing methods for different estimating approaches.To introduce our method, we begin with the local polynomial estimation (see Hoover et al. [9]), that is, to find rl b 's to minimize ( ) ( ) where ( ) ( ) ⋅ , and d denotes the order of the polynomial used in smoothing.
The above local least squares based estimation is not robust against outliers and heavy tailed errors.To fix this problem, we propose to estimate the coefficient function by minimizing ( ) ( ) where ( ) ρ ⋅ is an outlier resistant function.The coefficient functions ( ) , the solutions of the above optimization problem.The resulting estimators are so-called the local M-type estimator of ( ) ψ ⋅ , then the solutions of (4) solve the following equa- tions: ( ) ( ) ( ) The above method is in the same spirit of the local M-estimation studied for cross-sectional data in Fan and Jiang [19] and Jiang and Mack [20].It can be shown that the estimator is n -consistent under certain conditions.The estimator involves a selection of the outlier resistant function.Much work in the literature has demonstrated that the Huber's k ψ -class, i.e., ( ) , is satisfactory for robust estimation in the location model and nonparametric regression (Huber [21]; Jiang and Mack [20]), where a method for determining the parameter k was studied in Jiang and Mack [20].However, our experience shows that a rule of thumb for the choice of k, such as where s is the robust standard deviation of the residuals (Serfling [22]), is simple and satisfactory in the present situation.Other choices of the outlier resistant function are also possible, such as ( ) which leads to the least absolute deviation estimation (Jiang et al., [23]).The Newton algorithm can be used to find the solutions to the Equations ( 5).If the initial values of the parameters for iteration are good enough, for example, from the least squares estimation, then the iterative solutions can be found in a few steps, which is theoretically verified in Jiang and Mack [20]) for nonparametrically modeling dependent data.In our simulations, the least squares based estimators in (2) will be employed as the initial values.

Bandwidth Selection
The performance of the estimator in (4) depends on the smoothing parameter h.There are several approaches to the selection of the bandwidth, such as the cross-validation (CV) and generalized cross-validation (GCV) criteria in Hastie and Tibshirani [24].We here extend the CV method to the present situation for determining the bandwidth h.Specifically, denote by ( ) ( ) ˆi t − β the solution of (4) or (5) based on all the observations without the measurements for ith subject.Then the bandwidth h can be estimated by the minimizer ˆcv h of ( ) where The minimization of ( 6) is time-consuming for simulations, even though it is not for real data analysis.We here suggest a pre-determination of ˆcv h before simulations.Specifically, the procedure is detailed as follows: 1) Generate several samples (20 for instance) from (2), then minimize (6) to get the ˆcv h for each sample and compute the average cv h of the ˆcv h 's; 2) Fix the bandwidth cv h in each simulation to find the estimator in (4).In step 1), the computational burden can be further reduced if one uses an easy-evaluated bandwidth as initial value, such as the one from the GCV criterion for the local least squares based estimator.Since the estimated bandwidths for the local least squares estimation and the local M-estimation are highly correlated in general, the minimization of ( ) CV h will be achieved in a few iterations.

Robustified GLR Tests
For simplicity, consider the following testing problem ( ) ( ) where , and where ˆl β 's are the usual M-estimators of the coefficients under 0 H , and ( ) ˆl β ⋅ 's are the local M-estimators of the coefficient functions under 1 H . Define the following testing statistic for the testing problem (7): Intuitively, large values of N T provide evidences against 0 H .The proposed statistic is basically based on the comparison of residuals between the null and the alternative.It can be regarded as a robustification for the GLR testing statistic studied in Fan et al. [25], Fan and Huang [26], and Fan and Jiang ([27] [28]).In particular, if ( ) 2 x x ρ = is employed, then N T is in the same formulas as those in the afore-mentioned papers.For the errors departing away from normality, one can expect the N T to be robust and more powerful than its local least-squares counterpart with ( ) 2 x x ρ = .For more general testing problems such as ( ) ( ) for θ ∈ Θ , which tests whether ( ) ⋅ β admits a certain parametric form, the above testing statistic can be similarly constructed if one replaces the ˆl β with the local M-estimator under 0 H ′ .

Bootstrap Estimate of Null Distribution
To implement the robustified GLR tests, one needs to obtain the null distribution of N T under 0 H .In the following, we use simulations to compute the null distribution of the test statistic for a finite sample.It can be applied to both the local least squares and the local M-estimation methods.The computational algorithm is given as follows.For an easy illustration, we first consider the situation with 1 T .Similar simulation approaches to determining the p-value of a testing statistic were given in Fan and Jiang [27] for additive models and Hui and Jiang [29] for DTARCH models.For the case that i n 's are not equal, the conditional bootstrap method is infeasible, but one can use a resampling subject bootstrap method (see for example Huang, Wu and Zhou, [11]) to replace the bootstrap method above.Specifically, let { } where β is the estimate under 0 H and ˆij ε 's are the residuals from the null model.

Robustness of the Estimation
In this section, we compare the performance of the local M-estimation with the local least squares estimation.
For the outlier-resistant function, we opt for the Huber k ψ -functions in our numerical study: The rule of thumb in Section 2.1 for the choice of k will be employed.Example 1.Consider the following model, where ( ) are respectively from ( ) in such a way that ( ) ( ) . For simplicity, we used the time points 0.
which are equally spaced for each i.The errors are iid for all i and j.We considered the following four distributions for the errors: .9 0,1 0.1 0,5 We simulated 200 samples of size 100 n = and 20 i n = from the model (9).The averages of ( ) ˆt β and the 2.5% and 97.5% sample quantiles of ( ) ˆt β among simulations were calculated at each time points.In addition, we also computed the mean absolute deviation error (MADE) of the estimators at the time points (for . The average of values of the above MADE over simulations was calculated and reported in Table 1 for the error models (A1) and (A3).The results show that the local LS estimator and the local M-estimators have similar accuracy if the error is normal, but the latter performs better than the former if the error deviates away from normality.We display in Figure 1 the estimators of the coefficient functions along with the widths of the envelopes formed by pointwise 2.5% and 97.5% quantiles of the estimators among simulations, where ( ) and the estimated envelopes' widths for ( ) 0 ˆt β are not reported for saving space.It is evident that both the local LS estimator and the local M-estimator have little biases, and the latter is much better than the former in terms of the envelopes' widths when the error deviates away from the normal distribution.
We did simulations with the error distributions in (A2) and (A4).The results are similar and omitted for saving space.imately equal type I errors.Figure 4 shows the powers of the tests based on the local LS and local M-estimators.It demonstrates that the proposed test is more powerful than that based on local LS-estimator when the error deviates away from the normal distribution.Both tests are approximately the same powerful under the normal error.

A Real Example
We here illustrate how to use the proposed method in practice.Consider the body-weight of male Wistar rats dataset in Brunner et al. ([30], Table A11).For this dataset, the objective of the experience is to assess the toxicity of a drug on the body-weight evolution of Wistar rats.A group of ten rats was given a placebo, while a second group of ten was given a high dose of the drug.For each rat in the study, its body-weight was observed once a week over a period of 22 weeks.To check if the body-weights of the two test groups differ in their evolution over time, Brunner et al. [2] compared the time curves of the mean body-weights for both groups, and evaluated the ANOVA-type statistics.Their results do not support the conjecture of different body-weight evolutions in the two groups.
We are interested in investigating the conjecture.Since      treatment group, we subtract the average body-weight in each group from the body-weight of each subject.This will make the estimators of the coefficient functions be approximately zeros at the beginning., β β ′ = β is an unknown parameter vector.This is used to test if the average body-weights of rats evolve over time.
2) ( ) ( ) ≠ .This checks if the drug affects the average body-weights of rats in study.
We used 600 B = bootstrap replicates for computing the null distributions of the testing statistics.For 01 H , the P-values are 0.0117 and 0.005 respectively for the local LS-estimation based test and the local M-estimation based test.It seems that the null hypothesis does not hold, which means that the average body-weight of rats for each group changes over time.For 02 H , the P-values are 0.9633 and 0.8833 for the tests respectively based on the local LS and local M-estimation methods.This evidences that for the dataset the drug has no effect on the average body-weights of rats.This is consistent to the result of Brunner et al. [30].

Discussion
We have introduced a robust inference method based on the local M-estimation method and the robustified GLR test.It is demonstrated that the local M-estimators are robust against outliers and error distributions, and the proposed robustified GLR test is more powerful than its counterpart (the GLR test) under certain situations with heavy tailed errors, while both of them perform well under the normal error.The proposed inference approach seems appealing in robustly modeling longitudinal data.
Our method is also applicable to other estimating methods, such as the global smoothing one in Huang et al. [11], but will not be discussed further.funds provided by the University of North Carolina at Charlotte.Correspondence should be addressed to Jiancheng Jiang at University of North Carolina at Charlotte, USA (E-mail: jjiang1@unc.edu).
and the bandwidth ĥ to calculate the value * N T of the robustified GLR testing statistic.5) Repeat steps 3 and 4 B times to obtain the bootstrap statistics * P-value of the testing statistic N T is the percentage of the bootstrap statistics { }

Figure 1 .
Figure 1.Estimated curves and widths of envelopes.Upper panel: results for the error in (A1); lower panel: results for the error in (A2).(a) and (c): the average of estimated curves; solid-true curves, dash-dotted-local LS estimator, dashed-local M-estimator.(b) and (d): pointwise widths of envelopes for ( ) 1 t β , which were formed by the pointwise 2.5% & 97.5% quantiles of the estimators among simulations; solid-local LS estimator, dashed-local M-estimator.

Figure 2 .
Figure 2.Estimated curves and envelopes.Upper panel: estimated curves for

Figure 3 .
Figure 3. Histograms and estimated probability densities of T N 's.Left panel: based on the local LS-estimator; right panel: based on the local M-estimator with Huber k ψ -function.(a) (b): results under the error in (C1); (c) (d): results under the er- ror in (C2); (e) (f): results under the error in (C3).

Figure 5 (
a) and Figure 5(b) displays the data curves for the two groups.

Figure 5 (
a) and Figure 5(b) exhibit the time-varying feature of the body-weights of rats, it seems reasonable to model the dataset via the following time-varying coefficient model:

(
of average body-weight for the control group, and ( )1 t βreflects that for the treatment group.Since in the beginning the average body-weight of the rats in the placebo group is bigger than that in the

Figure 4 .
Figure 4. Powers of T N 's.Left panel: powers under significance level 95%; right panel: powers under significance level 90%.(a) (b): results under the error in (C1); (c) (d): results under the error in (C2); (e) (f): results under the error in (C3).Solidpower based on the local LS-estimator; dash-dotted-power based on the local M-estimator with Huber's k ψ -function.

Figure 5 .
Figure 5. Data curves and the estimated coefficient functions.Left panel: (a) body-weights for the placebo group; (b) body-weights for treatment group.Right panel: (c) estimated curve for ( ) 0 t β ; (d) estimated curve for ( ) 1 t β ; solid-the

Figure 5
is not, which plausibly reflects that the drug did not affect the average body-weights of rats.Now we consider the following two hypothesis testing problems: