Robust Differentiable Functionals for the Additive Hazards Model

In this article, we present a new family of estimators for the regression parameter β in the Additive Hazards Model which represents a gain in robustness not only against outliers but also against unspecific contamination schemes. They are consistent and asymptotically normal and furthermore, and they have a nonzero breakdown point. In Survival Analysis, the Additive Hazards Model proposes a hazard function of the form ( ) ( ) ′ t t z 0 λ λ β = + , where ( ) t 0 λ is a common nonparametric baseline hazard function and z is a vector of independent variables. For this model, the seminal work of Lin and Ying (1994) develops an estimator for the regression parameter β which is asymptotically normal and highly efficient. However, a potential drawback of that classical estimator is that it is very sensitive to outliers. In an attempt to gain robustness, Álvarez and Ferrarrio (2013) introduced a family of estimators for β which were still highly efficient and asymptotically normal, but they also had bounded influence functions. Those estimators, which are developed using classical Counting Processes methodology, still retain the drawback of having a zero breakdown point.


Introduction
In Survival Analysis, a main goal is how to model a random variable * T which is nonnegative and typically continuous and represents the waiting time until some events.A common method for collection of survival-type data consists in deciding on an observation window [ ] 0,τ over which n individuals are followed.Naturally, some events may take longer to occur than the window length τ ; also, some individuals can be lost from the sample due to different reasons (such as changing hospitals in clinical studies).In those cases, instead of an event time a censoring time is observed.In that manner, at the end of the observation window, the researcher ends up with a sample of triplets ( ) , , T T ∆ = = is the indicator that the observed time is uncensored, and i Z is a vector of individual covariates.Statistical models for that type of data are the main goal of the branch of statistics called Survival Analysis, and the relevant literature is by now enormous.Some clasical textbook-long treatises were Kalbfleish and Prentice (1980) [1], Fleming and Harrington (1991) [2], Andersen et al. (1993) [3], Aalen et al. (2008) and [4], among others.
A particular type of survival models with great appeal among practitioners focuses on the so-called hazard function ( ) t λ , which intuitively measures the instantaneous risk of the occurrence of the event at any given moment in time.While the most widespread model for the hazard function was the semiparametric Multiplicative Hazards Model due to Cox (1972) [5], a popular alternative for datasets without proportionality of hazards was the Additive Hazards Model (AHM) presented by Aalen (1980) [6].With time-fixed covariates, the latter proposes that ( ) ( ) , where β is a vector of p nonnegative parameters.An estimation method for β and the nonparametric baseline function 0 λ for this model were first described in a seminal article by Lin and Ying (1994).They proposed an estimating equation for the Euclidean parameter β which was independent of ( ) 0 t λ and which had the additional benefit of yielding an estimate in closed form, in addition to being consistent and asymptotically normal.It's drawback, however, lies in the sensitivity to outliers.
Within the Cox model, the potential harmful effects of outliers were commented by Kalbfleisch and Prentice (1980, ch. 5) [1] and Bednarski (1989) [7].Robust alternatives were first introduced by Sasieni (1993aSasieni ( , 1993b) [8] [9] by essentially modifying the Cox's partial likelihood score function introducing weight functions.Along the same line, important work had been developed by Bednarski (1993) [10], who proposed estimators that were consistent and efficient not only at the model but also on small contaminated neighbourhoods.His estimators had the advantage of being Fréchét differentiable for a wide class of weight functions.
As for the Additive Hazards Model, the proposal of robust alternatives has received much less atention in the literature.In Álvarez and Ferrario (2013) [11], we introduced a family of estimators for the Euclidean parameter β in the AHM by introducing weights in Lin and Yings' estimating equation.The weight functions are carefully chosen so that the estimators remain Fisher-consistent and asymptotically normal, but they additionally represent a gain in robustness at the price of a modest loss in efficiency.The proposed family of estimators exhibits a gain in robustness against the classical LY (Lin and Yings) estimator because they have a bounded influence function.That type of robustness is qualitative in nature, and it means intuitively that an estimation method is able to tolerate a very small proportion of extreme values.To see this, recall that the Influence Function is heuristically the population version of the so called Sensitivity Curve.

(
) ( ) where ˆn β is the estimator from the pure (noncontaminated) sample and ( ) δ is the estimator in a contaminated sample where one random observation has been replaced by the triplet ( ) , , t z δ .i.e. the population contamination model for the triplet ( ) where H is the noncontaminated distribution that belongs to the additive hazards family and Q represents a point mass at its argument.For the practitioner, estimators with bounded influence functions are of interest when (s) he seeks a guard against a very small proportion of outliers.
Appart from the fact that the contamination scheme above is very specific, a further drawback of the estimators presented in Álvarez and Ferrario (2013) [11] is that they have a zero breakdown point.Heuristically this means that just a small proportion of contamination, strategically located, is sufficient to drive the estimators nonsensical.Different notions and measures of robustness and their implications are developed in many classical books, such as Maronna, Martin and Yohai (2006) [12], Huber and Ronchetti (2009) [13] and Hampel et al. (1986) [14].
In this article we propose a new family of robust estimators for the additive hazards model in a manner similar to Bednarski (1993) [10].This is, we start from Lin and Yings' estimating equation and modify it by introducing appropriate weight functions that retain consistency and assymptotic normality while improving robustness, in the sense that the resulting estimating functionals are Fréchét differentiable about small neighborhoods of the true model.That type of differentiability entails three important consequences: 1) that the proposed family of estimators has bounded influence functions; 2) that they have a strictly positive (nonzero) breakdown point; and 3) that consistency and asymptotic normality hold about small neighbourhoods of the true model for generic contamination schemes, i.e. our family of estimators resists not only the outlier-type contamination presented in (2) but also small deviations in the structure of the model itself.For instance, one could contaminate with model a which does not have additive hazards, or with a model in which T and ∆ are dependent, even conditional on Z.That makes the proposed family of estimators attractive from the practitioner point of view, as a backup against model misspecification.
The advantage of the estimators we present in this paper over previous proposals arises whenever a dataset contains outliers.When a sample is contaminated by unusual observations, the classical estimator (Ling and Yings) rapidly becomes nonsensical (in that its value drifts away towards zero or infinity).The estimators in Álvarez and Ferrario [11] on the other hand, while they resist contamination by large times or large values of the covariates, they exhibit no advantage against more involved types of contamination.Here we develop a family of estimators that resist arbitrary contamination schemes.This paper is organized as follows, in section we introduce the estimating method and we construct explicitely the Additive Hazards Family of distribution functions for survival data.In subsection we prove that our estimators are Fréchét differentiable.That entails asymptotic normality not only at true distributions in the additive hazards family, but also under contiguous alternatives.In order to assess the performance of the proposed method in small samples, section contains a small simulation study which serves two purposes: 1) it illustrates the improvement of our proposed estimators from the robustness point of view against the classical counterparts; and 2) it exhibits a non-zero breakdown point which is apparently fairly high.A simulation approach to the breakdown point is important because it is not feasible to compute it analytically.That is in part beacause the calculations involve are formidable, as they involve identifying the worst possible contaminating distribution.But more importantly, it is because the breakdown point depends on the joint distribution of the triplet ( ) , , T Z ∆ , wich is only specified semi- parametrically in the Additive Hazards Model, i.e. even if available, the closed-form expression for the theoretical breakdown point would depend on unknown quantities.We have written computing code for the method proposed in this article in the form or R-scripts, which is available from the authors upon request.All the proofs are presented as succintly as possible in the Appendix; further calculations can be found in Julieta Ferrario's PhD dissertation [15].

Robust Differentiable Estimators
be the counting process which records the occurrence of the event for individual i, this is ≤ , so that it jumps from zero to unity at the random time * i T ; and let be the so-called at risk process defined by ( ) ( ) ∧ > , which denotes that by time t neither the terminal event nor censoring has occurred for individual i, so that (s) he is still at risk.For the Additive Hazards Model, Lin and Ying (1994) [16] proposed the estimating equation where for a column vector v, we denote the matrix 2 : , and we define the process . This yields an estimator in closed form Using classical Counting Process theory, Lin and Ying prove that their ˆn β is consistent and asymptotically normal, and they provide a formula for the estimation of the asymptotic variance.
In order to propose a Fréchét differentiable alternative to the classical (Lin and Ying's) estimator we need to express the estimator as a functional of the joint empirical distribution function and we need to make explicit the structure of the Additive Hazard Family of distributions.We pursue this as follows.

Construction of the Additive Hazards Family
Event times: Let * T represent the event time, which could be unobserved when censored.Given the covariate values { } , and the density function is

Covariates:
The covariates ~Z Z F are nonnegative and nondegenerate.Censoring: Conditional on Z, censoring and event times are independent, i.e.
Observed times: Due to censoring, the observed times are We now develop the joint bivariate distribution function
Censoring indicator: Let ( ) Thus taking the derivative with respect to t we obtain We define that a cummulative joint distribution function where we introduce the process and the n  's are the empirical distributions.
In Alvarez and Ferrario (2013) [11] we illustrated that the classical estimator is very sensitive to outliers and we have shown that its influence function is unbounded.In this article we propose an alternative family of estimators which is robust not only against outliers but also against unspecific contamination, in that the defining functional is not only continous but also Fréchét differentiable.This entails a nonzero breakdown point and bounded influence curve.As a reference, the implication and uses of Fréchét differentiable statistical functionals in Asymptotic Statistics and Robust Statistics are thoroughly presented in Bednarski (1991) [17].

Fréchét Differentiability
In order to define contamination ε-neighborhoods, let H ∈  be a distribution in the additive hazards family with true parameter 0 p β + ∈  and let G be some contaminating distribution for the triplet ( ) point mass at some triplet ( ) , , t z δ this contamination scheme corresponds to what are most usually called outliers.Our formulation is more general in that an arbitrary measure G may introduce model misspecification also from a disruption of the additive hazards property, of the contitional independence of the event and the censoring times given the covariates, or any other feature of the additive hazards family.
We propose here an estimating equation by introducing weight functions in the classical formulation, i.e. , where :  is a bivariate weight function in a set  , the properties of which will be enume- rated below.Also, Naturally, in the the special case where ( ) , 1 W u z = we obtain the classical (Lin and Yings') estimator.Instead, if , W u z w u w z = and both factors are either deterministic or predictable stochastic processes, this gives the estimators proposed in Álvarez and Ferrario (2013) [11].In that article, the choice of weight function led to estimators in closed form, but more importantly, it made possible for all the properties and proofs to be developed using Counting Process Martingale Theory.We depart from that treatment in this article, as we specify the weight functions differently, i.e. we do not seek here predictability as stochastic processes, but Fréchét differentiable functionals.
Let us denote by In order to find a linear approximation for we need to choose some type of differentiability.We opt here for differentiability in the sense of Fréchét (or strong differentiability).This type of differentiability is stronger than continuity in that it implies the existence of a linear functional approximation, i.e. for any where FD is a linear functional called "Fréchét derivative".Notice that we opt here for a uniform type of differentiability over  .This is important in order to allow for instance adaptive choice of the weight function, such those based on preeliminary estimators of β .In contrast with this generality, data-dependent choices were explicitely excluded in our first proposal of estimators in [11], for that mechanism would automatically destroy the precitability of the stochastic weight processes.
In order to avoid excessive notation we will in the sequel develop the proofs without censoring.: In order that our family of estimators become Fréchét differentiable we will need the following assumptions.

A2) All the functions in *
 vanish outside some bounded set, are absolutely continuous and have joinlty bounded variation.The set *  is compact with respect to the supremum norm.Assumptions A1) and A2) ensure differentiability.The compactness assumption in A2) is needed to allow posibly adaptive choices of W based on some preeliminary estimate of β (i.e. in a data-dependent manner).
The assumption of joinly bounded variation is needed in order to obtain uniform Fréchét differentiability over  .
We seek now a linear approximation of ( )

L G W L H W L G W L H W L H W L H W
For the first difference in functionals above, the following Lemma gives a linear approximation: Lemma 1.Under assumptions A1) and A2), ( . Moreover, ( ) As for the second difference in (11) we have: where ( ) Further, the following result gives a bound of ( ) in terms of ( ) G H − : Lemma 3.Under assumptions A1) and A2) there are constants .
At this point, for further results we need to add another assumption that guarantees the existence of the inverse of ( )

namely. A3)
There is a pair of constants 1 a , 0 0 a > , so that for all W ∈  , the determinant is bounded, i.e.
( ) Thus, the consistency of the estimator in a neighborhood of  is given by the following theorem: Theorem 1.Let the family of functions  satisfy Assumptions A1) though A3).Then there exist 0 ε > and , the equation ( ) Moreover, Fréchét differentiability is asserted as follows: denote a solution of the Equation () If the class of functions  satisfy Assumptions A1) through A3) then: This implies that the Fréchét derivative of ( ) In the following theorem we investigate convergence in distribution under contiguous alternatives to some distribution in the additive hazards family H ∈  with true parameter 0 β : Theorem 3. Let n  be the empirical distribution of a sample of size n from a distribution ( ) for some constant 0 0 c > .Assume the class of functions  satisfies A1) through A3).Then for all 0 ε > and 0 δ > there exists 0 n so that for all , and W ∈  , we have δ is a point mass at ( ) : , ,  x t z = ∆ .The result above implies that asymptotic normality holds not only under the true model but also under contiguous alternatives.

Simulations
In this section, we evaluate the performance of our proposed family of estimators via simulations.Specifically, we carry out three simulation experiments choosing for simplicity a single covariate ( ) and a true additive hazard function Z χ and, in order to avoid censoring 1000 C = .In all cases, in the estimating Equation ( 8), we opt for ( ) ( ) , where M is the lower % m percentile of the z's.
In the first simulation we study the behavior or our estimator, denoted "RD" (Robust Differentiable) for increasing sample sizes.We take 50 n = , 200, 500, 1000 and 10,000.In the function ( ) W t z , we select 90 m = , 95 and 99.Table 1 compares our estimators with the classical ones, denoted "LY" (Lin and Yings').We perform 100 replicates and average out the results.We observe in all cases a very good performance of the of the robust estimators at very small price in increased standard error, which decreases as m or n increase.
In the second simulation, we do a comparison among the classical estimator (LY), the bounded-influencefunction (BIF) estimators proposed in Álvarez and Ferrario (2013) [11], and the ones proposed in this paper (RD).This is done under outlier-type contamination, where an increasing percentage of the sample was replaced by a large covariate value equal to 10.We take 100 replicates for a sample size of 200 n = . In Table 2, we show the results of this experiment.For the BIF estimators we take the weight function

W t z as z =
 where  .With that weight function, 90% of the observations were unmodified if Z had an exponential distribution (e.g.Álvarez and Ferrario 2013 [11]).For the RD estimators we opt for 90 m = .We observe that while both the Classical and the BIF estimators appear to break down rather fast, the robust estimator behaves very well.It is worth to emphasize that the type of contamination here is not the worst possible one, so that there is no reason to expect the good performance of the RD estimators to extend to other contamination schemes.
Lastly, we carry out a third simulation experiment in order to detect what the breakdown points of the RD estimators may be under a different type of model departure.As a model for the contaminating distribution, we chose point masses on the line 4 0.25 . This artificially introduces outliers in the sample as shown in Figure 1, where the red observations in the first plot are replaced by the blue-colored points shown in the second plot.This graphical illustration shows that the contaminating distribution is very different from the proposed Additive Hazards Model and it may thus have the hability to severely affect the estimates.If, instead, we have contaminated by large values of z (high leverage) or large values of t (outliers), keeping the other variable unchanged, the potential harmfull effects are greatly diminished.This is because we can see in the first (uncontaminated) plot many of such points.The simulation results of 200 replicates are shown in Table 3, for  .We observe that the breakdown point seems to be about 30% for 70 m = and at about 10% for 10 m = .This is intuitive as the constant m regulates the trimming, given our choice of the weight function.
Intuitively, the finite sample breakdown point of an estimator is the largest proportion of contaminated observations, and a method can resist before the estimates become nonsensical, which usually means that the estimate drifts away towards zero of infinity, or in general towards the boundaries of a parameter space.Equivalently, its functional version is called the asymptotic breakdown point and it measures the largest proportion of contamination.A statistical functional could tolerate before becoming nonsensical in the same sense (e.g.Maronna, Martin and Yohai 2006 [12] for formal definitions).It is noteworthy that either in its finite sample or in its asymptotic version, calculating a breakdown point requires identifying the worst possible type of contamination.This would depend on the joint distribution of the triplet ( ) , , T Z ∆ , which is only partially specified in the Additive Hazards Model.Therefore, the breakdown point cannot be calculated explicitly.Nor is it feasible to provide reasonable bounds.For that reason, it is not possible to give a numeric value for the breakdown point and its assesment via simulations becomes illustrative.An extensive investigation of the breakdown point under different distributions ( ) , , H t z δ and under different weight functions ( ) H .

∫∫ ∫∫∫
where after distributing the inner brackets 1 L , through 4 L are each of the integral terms above.By assumption A2), we can choose large enough values . The integral within the argument , where in order to simplify notation, we introduced the operator [ ] ( ) ( ) ( ) ( ) Denote also the set [ ] 1 0, . By assumption A2) this integral is bounded.i.e. for all for some finite constant b c .A similar application of integration by parts and the assumptions gives that for all u K ∈ , ( ) for some finite constant c c .Lastly, we apply the same methodology to the integrals

∫∫
Similar calculations hold for the other i L 's.Detailed calculations can be found in Juieta Ferrario's Ph.D. dissertation [15].Finally, joining all bounds together we can assert that function and z is a vector of independent variables.For this model, the seminal work of Lin and Ying (1994) develops an estimator for the regression parameter β which is asymptotically normal and highly efficient.However, a potential drawback of that classical Now we express the clasical estimator ˆn β as a (nonlinear) functional of the joint empirical distribution function of ( ) Let 0 B β ∈ , where B denotes a open bounded subset of p+  and take a member in the additive hazards family H ∈  corre- sponding to the true value of the parameter 0 β .Take further a set of functions ∞ .With the above, we define the family of weight fuctions  with which we work in this article by

Proof of Theorem 1 .
integrals are defined similarly.Following the same arguments as in Lemma 1, relying on integration by parts and Assumptions A1)-A2), we see that all the terms are ( )o G H ∞ −, which finishes the proof.Detailed calculations are shown in[15].By Lemmas 1 and 3, for all W ∈  and B β ∈ , there exists some constant 0 of B. Also, by Equation (18), if some B β ∈ , its image ( ) g B β ∈ too.By Brouwer's fixed point theorem, there exists B β * ∈ for which ( )

Table 2 .
Comparison of estimators with outliers.

Table 3 .
Comparison of estimators under model contamination.