Nelson-Aalen and Kaplan-Meier Estimators in Competing Risks

In this paper, stochastic processes developed by Aalen [1] [2] are adapted to the Nelson-Aalen and Kaplan-Meier [3] estimators in a context of competing risks. We focus only on the probability distributions of complete downtime individuals whose causes are known and which bring us to consider a partition of individuals into sub-groups for each cause. We then study the asymptotic properties of nonparametric estimators obtained.


Introduction
Let us consider a data model which lives time where the event of interest is a failure (or death) due to the th j event,

{ }
1, 2, , , j J m ∈ =  and the non-zero integer m, the number of possible causes.By convention, 0 j = corresponds to the state of functioning (or of life) of the observed individual.It is assumed that the observation is stopped when a failure (or death) occurs, but this observation may be right-censored in a non-informative way.Some examples of this situation corresponds to the case where the event of interest is due to another cause, or withdrawal of the individual from the study or at the end of the study.In the case of right censoring time, the time of failure of year for individuals and their causes are not known to the experimenter.A data model as described above is commonly called "competing risks model" (or competitors) and is studied in fields such as medical control, demography, actuarial science, economics or industrial reliability.In Andersen et al. [4], an il-lustration and details of mathematics techniques on competing risks in biomedical applications are developed.For example in the study of AIDS, the different competitive risks can be 1) death due to AIDS, 2) death due to tuberculosis or 3) death due to other causes and in this case 3 m = (see Figure 1).It is important to note that in most data models in competing risks, the functions that characterize the probability distribution of the variable of interest and the marginal are not always observable (see Tsiatis [5], Heckman and Honoré [6]).Issues to be resolved include virtually the underlying functions for different causes and effects of covariates on the rate of occurrence of competing risks.One of the problems we may face is that the information on the cause of failure of the individual observation can only be known after the autopsy, while we don't know anything about individuals censored in monitoring.In addition, the incident distributions (due to specific causes) do not allow to describe satisfactorily the probabilities of the various marginal (failures τ case j ) in competing risks models.Assumptions of independence of competing risks can help ensure observability in some cases, but they are not reasonable only in such models.

Related Works
The estimators of Nelson-Aalen and Kaplan-Meier [3] are generally studied in the literature following two approaches: firstly, the method of martingale (Aalen [1] [2]; Andersen et al. [4]; Fleming and Harrington [7], Prentice et al. [8]) and secondly the law of the iterated logarithm (Breslow and Crowley [9], Földes and Rejtö [10] or Major and Rejtö [11], Földes and Rejtö [12], Gill [13], Csörgö and Horváth [14], Ying [15] and Chen and Lo [16]).Recently, applications have been made in the context of competing risks (Latouche [17]; Belot [18]).Latouche [17] states that during the planification of clinical trials, the evaluation of the number of patients to be included is a critical issue because such a formulation does not exist in the Fine and Gray's [19] model.For this purpose, he therefore computes the number of patients within the context of competition for an inference based function on cumulative incidence and then, he studies the properties of the model of Fine and Gray when it is wrongly specified.Belot [18] presents the data got from randomized clinical tests on prostate cancer patients who died for several reasons.

Contributions
In this paper, the stochastic processes developed by Aalen [1] [2] are adapted to Nelson-Aalen and Kaplan-Meier estimators [3] in a context of competing risks (e.g.Aalen and Johansen [20], Andersen et al. [4]).We focus only on the complete probability distributions of downtime individuals whose causes are known and which bring us to consider a partition of individuals sub-groups for each cause.We provide a new proof of the consistency of the Nelson-Aalen estimator in the context of competing risks by using the method of martingale.Under the regularity assumptions for the sequence ( ) and n is the number of observable samples) we obtain an almost-safe speed estimator of Kaplan-Meier [3], which is the same as that obtained by Giné and Guillou [21] which is ( ) ( ) loglog .
n n k  The rest of the paper is organized as follows: Section 2 describes preliminary results and notations used in the paper and Section 3 evaluates the conditional functions of distribution to the specific causes.Section 4 contains the main results of the paper as well as some properties of our estimators obtained.The last section concludes the paper.

Preliminary Results
Lifetime analysis (also referred to as survival analysis) is the area of statistics that focuses on analyzing the time duration between a given starting point and a specific event.This endpoint is often called failure and the corresponding length of time is called the failure time or survival time or lifetime.
Formally, a failure time is a nonnegative random variable (r.v.) X that describes the length of time from a time origin until an event of interest occurs.We will suppose throughout that [ ] The most basic quantities used to summarize and describe the time elapsed from a starting point until the occurrence of an event of interest are the distribution function and the hazard function.The cumulative distribution function at time , t also called lifetime distribution or the failure distribution, is the probability that the failure time of an individual is less or equal than the value .t It is given for 0 t ≥ by: ( ) ( ).
The function F is right-continuous, nondecreasing and satisfies ( ) F ∞ = We denote by F − the left-continuous function obtained from F in the following way: ( ) ( ).
The distribution of X may equivalently be dealt with in terms of the survival function which is given, for 0, t ≥ by: The cumulative hazard function is defined for 0 t ≥ by: ( ) If F admits a derivative with respect to Lebesgue measure on ,  the probability density function exists and is defined for 0 t ≥ by: Heuristically, the function f may be seen as the instantaneous probability of experiencing the event.
With the same hypothesis of differentiability, the hazard function exists and is defined for 0 t ≥ by: The quantity ( ) t λ can be interpreted as the instantaneous probability that an individual dies at time , t conditionally on he or she having survived until that time.
For an extensive introduction to lifetime analysis, the reader is referred e.g. to the books of Cox and Oakes [22] and Kalbfleisch and Prentice [23].
The main difficulty in the analysis of lifetime data lies in the fact that the actual failure times of some individuals may not be observed.An observation is right-censored if it is known to be greater than a certain value, provided the exact time is unknown.Let C be the nonnegative r.v. with distribution function G that stands for the censoring time of the individual.As before, the nonnegative r.v.X with distribution function F denotes the failure time of the individual.If X is censored, instead of , X we observe C which gives the in- formation that X is greater than .C In any case, the observable r.v.consists of ( ) denotes the indicator function.The nonnegative r.v.T stands for the observed duration of time which may correspond either to the event of interest ( ) or to a censoring time ( ) As a sequel to above, it is assumed that X and C are independent.Consequently, the random variable T has the distribution function H given by ( )( ) The following subdistribution functions of H will be needed: is valid for any

t ≥
The relations that connect the subdistribution functions and to the distribution functions F and G are given by: The cumulative hazard function of X can be expressed as: ) be n independent copies of the random vector ( ) be the order statistics associated to the sample 1 , , .We define the empirical counterparts of ( ) and , H by: The Kaplan-Meier product-limit estimator is defined for 0 t ≥ by: The Nelson-Aalen estimator for Λ is then defined for 0 t ≥ by: ( ) ( ) The following relations are valid for where ˆn G the Kaplan-Meier estimator of G , is defined for 0 t ≥ by: Let ( ) be a sequence of integers between 1 and 1. n − In order to always have asymptotical results, we suppose that the sequence ( ) satisfies the following hypothesis: ( ) for n large enough, the sequence ( ) for n large enough, the sequence ( ) n k n is non-increasing and there exists a constant > 0 R such that log , is a non-increasing sequence such that: e.g.log log log , log log log log log log log , etc .
is required when applying the results of Gin? and Guillou [21] while Condition ( ) required when applying the results of Cs o  rgö [26].
The following result formulates the laws of the iterated logarithm-type (LIL-type) result on the mentioned increasing intervals.
We have 1 : If, in addition, F is assumed continuous, then we also have: .
Proof.See Csörgö [26]; Giné and Guillou [21]. The continuity of F is required to linearize the Kaplan-Meier process.Indeed, if F is continuous, then Hypothesis ( ) Let ( ) ( ) , , , , , , represents the number of individuals who may fall down specific cause j or be censored.
Estimator similar analogue to (2) and on the sub-group j A individuals crashing case j is given by (7) and with and The relation between the cumulative hazard rate * is given by (10) The size ( ) j Y t of the subgroup j A individuals is not observable due to the inaccessibility of all subgroups of specific causes .j J ∈ Nevertheless, we can assign a probability ij α to each of the individuals belonging to one of the m subgroups.Thus, one can estimate the size ( ) j Y t by ( ) ˆj Y t given by ( see e.g. in Satten and Datta [27] or Datta and Satten [28]) where ˆij α is the estimator of the probability that the individual n˚i in the sample subgroup j A , subset of risk of specific-cause j .Thus, the final estimators for the cumulative hazard rate

Main Results
Let T be a positive random variable and C be a censoring variable such that  Naturally, it appears that we considered the information provided over time as a filter, which is used to describe the fact that past information is contained in the current information, hence we have the natural filtration d by independence of and where t  is the natural filtration (all information available at time t ), where the notation ( ) writing made possible because ( ) i K t is a growing process.The expression of Thus, we have is the martingale associated with the subject at risk .i Thereafter i Λ is the compensating process i K because it is the integral of the product of two predictable processes.

Theorem 2 Let i T be an absolutely continuous lifetime and i C be a censoring variable for any arbitrary distribution
Let i λ be the risk function associated with .
for t such that ( ) > > 0.  4), ( 5) and ( 2) respectively.Using these notations, we can directly obtain the following preliminary result: Proposition 2 For a given 0 t ≥ and a given , the stochastic processes defined by is the martingale associated with the subject specific cause .j Proof.ˆ0, for all 1, , .
where the expectation of the martingale (See is Deheuvels and Einmahl [31]  [32] for very fine results of the model law iterated logarithm functional and available in a point or on a compact strictly included in the support of H).This result is consistent with that of Stute [33] which constitutes a compromise between the results of Breslow and Crowley [9], Földes and Rejtö [10] or Major and Rejtö [11], and those of Földes and Rejtö [12], Gill [13], Csörgö and Horváth [14], Ying [15] and Chen and Lo [16].
Following Giné and Guillou [34], we say that a non-increasing sequence ( ) where  is the Landau in almost sure sense, and where   is the Landau in probability.Both results of Theorem above always provides a rate in probability of uniform convergence of * ˆjn F to * F Firstly, under the Hypothesis ( ) Equality (14)  Notice that the assumption of continuity of j F for 1, , j m =  ensures that F is continuous according to proposition 1.We then conclude with Theorem 1 and Lemma 1. 

Conclusion
In this paper, we have adapted the stochastic processes of Aalen [1] [2] to the Nelson-Aalen and Kaplan-Meier [3] estimators in a context of competing risks.We have focused particularly on the probability distributions of complete downtime individuals whose causes are known and which bring us to consider a partition of individuals into sub-groups for each cause.We have also provided some asymptotic properties of nonparametric estimators obtained.
Kaplan and Meier [3] introduced the product-limit estimator for the survival distribution function.The estimator of the cumulative hazard function is the Nelson-Aalen estimator introduced by Nelson [24] [25] and generalized by Aalen [1] [2].Let( If there are ties between a failure time (or several failure times) and a censoring time, then the failure time(s) is (are) ranked ahead of the censoring time(s).

1 
random variables representing respectively the lifetimes in each of the m risks competing, of index cause, where 0 corresponds to the condition of the individual observed, We notice that δ and ξ are observable and η is so only for T uncensored.We assume that censorship is not informative.The joint law ( ) , T η is completely specified by the specific is the Landau in almost sure sense and   is the Landau in probability.other than the sub-distributions of the specific cause of failure 1, , .j m = The cumulative hazard rate of specific-cause ( )

τ
represent the time that an individual i is subject to the cause .j If i T and i C are independent, the random variable i Z admits distribution function of the number of failures observed in case of j the time interval [ ] of individuals in the sample observation that survive beyond time .t Thus, for any estimator of the distribution function ( ) * j F t of time life in subgroups j A is defined by specific cause j and the corresponding distribution function

τ
is the time that an individual i is subject to the cause .j For a given 0 t ≥ and an individual i with 1, , , i n = the counting process is defined by: Therefore, if an individual i undergoes event before time , which indicates whether the individual i is still at risk just before time t (the individual has not yet under- gone the event).Therefore, • if ( ) by is a t −  martingale if and only if Proof.See Breuils ([30], p. 25) and Fleming and Harrington ([7], p. 26). For a given 0 t ≥ and a given represents the difference between the number of failures due to a specific cause j observed in the time interval [ ] 0,t , i.e. ( ) j N t , and the number of failures predicted by the model for the th j cause.This definition fulfills the Doob-Meyer decomposition.The first result of this paper concerns the consistency of the Nelson-Aalen estimator for the competing risks based on martingale approach. j To prove Theorem 4, we have drawn from results based on the inference of empirical processes, given that in order to linearize the Kaplan-Meier process, it is necessary to impose continuity condition on .
Our second LIL-type result provides almost sure and in probability rates of convergence of * ˆjn F to * , j t ΛHence, we arrived at result. entails that: