Applied Mathematics
Vol.5 No.4(2014), Article ID:43843,12 pages DOI:10.4236/am.2014.54073
Nelson-Aalen and Kaplan-Meier Estimators in Competing Risks
Didier Alain Njamen-Njomen1,2, Joseph Ngatchou-Wandji3
1Department of Mathematics and Computer Sciences, Faculty of Sciences, University of Maroua, Maroua, Cameroon
2Department of Mathematics, Faculty of Sciences, University of Yaounde 1, Yaounde, Cameroon
3University of Lorraine, Lorraine, France
Email: dangaza@yahoo.fr, didiernjamen1@gmail.com, Joseph.ngatchou-wandji@univ-lorraine.fr
Copyright © 2014 by authors and Scientific Research Publishing Inc.
This work is licensed under the Creative Commons Attribution International License (CC BY).
http://creativecommons.org/licenses/by/4.0/
Received 11 December 2013; revised 11 January 2014; accepted 18 January 2014
ABSTRACT
In this paper, stochastic processes developed by Aalen [1] [2] are adapted to the Nelson-Aalen and Kaplan-Meier [3] estimators in a context of competing risks. We focus only on the probability distributions of complete downtime individuals whose causes are known and which bring us to consider a partition of individuals into sub-groups for each cause. We then study the asymptotic properties of nonparametric estimators obtained.
Keywords:Censored Data; Counting Process; Competitive Risk; Non-Parametric Estimators; The Cumulative Incidence Function; Risk Function Specific Cause; Conditional Distribution Function
1. Introduction
Let us consider a data model which lives time where the event of interest is a failure (or death) due to the event,
and the non-zero integer m, the number of possible causes. By convention,
corresponds to the state of functioning (or of life) of the observed individual. It is assumed that the observation is stopped when a failure (or death) occurs, but this observation may be right-censored in a non-informative way. Some examples of this situation corresponds to the case where the event of interest is due to another cause, or withdrawal of the individual from the study or at the end of the study. In the case of right censoring time, the time of failure of year for individuals and their causes are not known to the experimenter. A data model as described above is commonly called “competing risks model” (or competitors) and is studied in fields such as medical control, demography, actuarial science, economics or industrial reliability. In Andersen et al. [4] , an illustration and details of mathematics techniques on competing risks in biomedical applications are developed. For example in the study of AIDS, the different competitive risks can be 1) death due to AIDS, 2) death due to tuberculosis or 3) death due to other causes and in this case
(see Figure 1).
It is important to note that in most data models in competing risks, the functions that characterize the probability distribution of the variable of interest and the marginal are not always observable (see Tsiatis [5] , Heckman and Honoré [6] ). Issues to be resolved include virtually the underlying functions for different causes and effects of covariates on the rate of occurrence of competing risks. One of the problems we may face is that the information on the cause of failure of the individual observation can only be known after the autopsy, while we don’t know anything about individuals censored in monitoring. In addition, the incident distributions (due to specific causes) do not allow to describe satisfactorily the probabilities of the various marginal (failures case
) in competing risks models. Assumptions of independence of competing risks can help ensure observability in some cases, but they are not reasonable only in such models.
1.1. Related Works
The estimators of Nelson-Aalen and Kaplan-Meier [3] are generally studied in the literature following two approaches: firstly, the method of martingale (Aalen [1] [2] ; Andersen et al. [4] ; Fleming and Harrington [7] , Prentice et al. [8] ) and secondly the law of the iterated logarithm (Breslow and Crowley [9] , Földes and Rejtö [10] or Major and Rejtö [11] , Földes and Rejtö [12] , Gill [13] , Csörgö and Horváth [14] , Ying [15] and Chen and Lo [16] ). Recently, applications have been made in the context of competing risks (Latouche [17] ; Belot [18] ). Latouche [17] states that during the planification of clinical trials, the evaluation of the number of patients to be included is a critical issue because such a formulation does not exist in the Fine and Gray’s [19] model. For this purpose, he therefore computes the number of patients within the context of competition for an inference based function on cumulative incidence and then, he studies the properties of the model of Fine and Gray when it is wrongly specified. Belot [18] presents the data got from randomized clinical tests on prostate cancer patients who died for several reasons.
1.2. Contributions
In this paper, the stochastic processes developed by Aalen [1] [2] are adapted to Nelson-Aalen and KaplanMeier estimators [3] in a context of competing risks (e.g. Aalen and Johansen [20] , Andersen et al. [4] ). We focus only on the complete probability distributions of downtime individuals whose causes are known and which bring us to consider a partition of individuals sub-groups for each cause. We provide a new proof of the consistency of the Nelson-Aalen estimator in the context of competing risks by using the method of martingale. Under the regularity assumptions for the sequence (
is a sequence of integers such that
and
is the number of observable samples) we obtain an almost-safe speed estimator of Kaplan-Meier [3] which is the same as that obtained by Giné and Guillou [21] which is
The rest of the paper is organized as follows: Section 2 describes preliminary results and notations used in the paper and Section 3 evaluates the conditional functions of distribution to the specific causes. Section 4 contains the main results of the paper as well as some properties of our estimators obtained. The last section concludes the paper.
2. Preliminary Results
Lifetime analysis (also referred to as survival analysis) is the area of statistics that focuses on analyzing the time
Figure 1. Example of 3 risks competing model.
duration between a given starting point and a specific event. This endpoint is often called failure and the corresponding length of time is called the failure time or survival time or lifetime.
Formally, a failure time is a nonnegative random variable (r.v.) that describes the length of time from a time origin until an event of interest occurs. We will suppose throughout that
The most basic quantities used to summarize and describe the time elapsed from a starting point until the occurrence of an event of interest are the distribution function and the hazard function. The cumulative distribution function at time also called lifetime distribution or the failure distribution, is the probability that the failure time of an individual is less or equal than the value
It is given for
by:
The function is right-continuous, nondecreasing and satisfies
and
We denote by
the left-continuous function obtained from
in the following way:
The distribution of may equivalently be dealt with in terms of the survival function which is given, for
by:
The cumulative hazard function is defined for by:
When is continuous, the relation
is valid for all
We can then call
the log-survival function.
If admits a derivative with respect to Lebesgue measure on
the probability density function exists and is defined for
by:
Heuristically, the function may be seen as the instantaneous probability of experiencing the event.
With the same hypothesis of differentiability, the hazard function exists and is defined for by:
The quantity can be interpreted as the instantaneous probability that an individual dies at time
conditionally on he or she having survived until that time.
For an extensive introduction to lifetime analysis, the reader is referred e.g. to the books of Cox and Oakes [22] and Kalbfleisch and Prentice [23] .
The main difficulty in the analysis of lifetime data lies in the fact that the actual failure times of some individuals may not be observed. An observation is right-censored if it is known to be greater than a certain value, provided the exact time is unknown. Let be the nonnegative r.v. with distribution function
that stands for the censoring time of the individual. As before, the nonnegative r.v.
with distribution function
denotes the failure time of the individual. If
is censored, instead of
we observe
which gives the information that
is greater than
In any case, the observable r.v. consists of
, where
denotes the indicator function. The nonnegative r.v.
stands for the observed duration of time which may correspond either to the event of interest
or to a censoring time
As a sequel to above, it is assumed that and
are independent. Consequently, the random variable
has the distribution function
given by
The following subdistribution functions of will be needed:
and
The relation
is valid for any
The relations that connect the subdistribution functions
and to the distribution functions
and
are given by:
and
The cumulative hazard function of can be expressed as:
Kaplan and Meier [3] introduced the product-limit estimator for the survival distribution function. The estimator of the cumulative hazard function is the Nelson-Aalen estimator introduced by Nelson [24] [25] and generalized by Aalen [1] [2] .
Let for
be
independent copies of the random vector
Let
be the order statistics associated to the sample
If there are ties between a failure time (or several failure times) and a censoring time, then the failure time(s) is (are) ranked ahead of the censoring time(s).
We define the empirical counterparts of
and
by:
The Kaplan-Meier product-limit estimator is defined for by:
The Nelson-Aalen estimator for is then defined for
by:
The following relations are valid for
where the Kaplan-Meier estimator of
, is defined for
by:
Let be a sequence of integers between
and
In order to always have asymptotical results, we suppose that the sequence
satisfies the following hypothesis:
for
large enough, the sequence
is non-increasing and
for
large enough, the sequence
is non-increasing and there exists a constant
such that
with
is a non-increasing sequence such that:
Condition is required when applying the results of Gin? and Guillou [21] while Condition
is required when applying the results of Cs
rgö [26] .
The following result formulates the laws of the iterated logarithm-type (LIL-type) result on the mentioned increasing intervals.
Theorem 1 (Csörgö [26] ; Giné and Guillou [21] ) Let be a sequence of integers such that
and, for the almost sure results, satisfying
We have1:
If, in addition, is assumed continuous, then we also have:
Proof. See Csörgö [26] ; Giné and Guillou [21] .
The continuity of is required to linearize the Kaplan-Meier process. Indeed, if
is continuous, then
can be approximated by
on the random interval
Precisely, we have the following result.
Proposition 1 (Giné and Guillou [21] ) Let be a sequence of integers satisfying
and Hypothesis
. If
is continuous, then
Proof. See Giné and Guillou [21] .
3. Evaluation of the Conditional Functions of Distribution to the Specific Causes
Let be a continuous random variables representing respectively the lifetimes in each of the
risks competing,
be the set of index cause, where 0 corresponds to the condition of the individual observed,
the random variable of the event of interest and
the random variable case, where
if
for all
is the distribution function of
the survival function such that
the random variable C of the event censoring right,
and for technical reasons,
such that
if (
and
) and
if
.
We notice that and
are observable and
is so only for
uncensored.
We assume that censorship is not informative. The joint law is completely specified by the specific incident distributions cause
defined by
(1)
which are none other than the sub-distributions of the specific cause of failure
The cumulative hazard rate of specific-cause corresponding to
is given by
(2)
Let be n-sample of observable triplet
where
and
, with
and where
represent the time that an individual
is subject to the cause
If
and
are independent, the random variable
admits distribution function
defined by
Then the Nelson-Aalen estimator of
is given for
by (see e.g. in Andersen et al. [4] )
(3)
with
and where
(4)
is the counting of the number of failures observed in case of the time interval
and
(5)
is the number of individuals in the sample observation that survive beyond time Thus, for any
(6)
represents the number of individuals who may fall down specific cause or be censored.
Estimator similar analogue to (2) and on the sub-group
individuals crashing case
is given by
(7)
and with and
The relation between the cumulative hazard rate and survival
in the subgroup Aj is given by2
(8)
A nonparametric estimator of the distribution function of time life in subgroups
is defined by
(9)
is given by
(10)
The size of the subgroup
individuals is not observable due to the inaccessibility of all subgroups of specific causes
Nevertheless, we can assign a probability
to each of the individuals belonging to one of the
subgroups. Thus, one can estimate the size
by
given by ( see e.g. in Satten and Datta [27] or Datta and Satten [28] )
where
is the estimator of the probability that the individual n˚
in the sample subgroup
, subset of risk of specific-cause
. Thus, the final estimators for the cumulative hazard rate
due to the specific cause
and the corresponding distribution function
have the respective expressions
(11)
and for
(12)
4. Main Results
Let be a positive random variable and
be a censoring variable such that
and
In this model of random censorship, for a sample
subject to a specific causes
we can observe the couple
where
and
with
and where
is the time that an individual
is subject to the cause
For a given and an individual
with
the counting process is defined by:
Therefore, if an individual undergoes event before time
then
otherwise
We can also define the counting process
Naturally, it appears that we considered the information provided over time as a filter, which is used to describe the fact that past information is contained in the current information, hence we have the natural filtration where
For
and for
we have
If denotes the left boundary at
of
we have
since, the quantity takes only the values 0 and 1.
For a given we define the function
which indicates whether the individual is still at risk just before time
(the individual has not yet undergone the event). Therefore• if
then,
and
• if then,
where is the natural filtration (all information available at time
), where the notation
refers to formal writing of the stochastic integral
writing made possible because is a growing process. The expression of
in function of the counting process
is given by
Thus, we have
The stochastic process defined for and
by
is the martingale associated with the subject at risk Thereafter
is the compensating process
because it is the integral of the product of two predictable processes.
Theorem 2 Let be an absolutely continuous lifetime and
be a censoring variable for any arbitrary distribution
Let
be the risk function associated with
Let’s put
and
.
For the process defined by
is a martingale if and only if
for such that
Proof. See Breuils ([30] , p. 25) and Fleming and Harrington ([7] , p. 26).
For a given and a given
, the expressions of
,
and
are those of formulas (4), (5) and (2) respectively. Using these notations, we can directly obtain the following preliminary result:
Proposition 2 For a given and a given
, the stochastic processes defined by
(13)
is the martingale associated with the subject specific cause
Proof.
The martingale represents the difference between the number of failures due to a specific cause
observed in the time interval
, i.e.
, and the number of failures predicted by the model for the
cause. This definition fulfills the Doob-Meyer decomposition.
The first result of this paper concerns the consistency of the Nelson-Aalen estimator for the competing risks based on martingale approach.
Theorem 3 For such that
we have
Proof.
where the expectation of the martingale (specific for
cause) is equal to zero and where
Indeed,
Hence, we arrive at result.
Using the fact that
we have:
It follows that is an asymptotically unbiased estimator of
Hence, we arrived at result.
Our second LIL-type result provides almost sure and in probability rates of convergence of to
for
uniformly over the random increasing intervals
. (See is Deheuvels and Einmahl [31] [32] for very fine results of the model law iterated logarithm functional and available in a point or on a compact strictly included in the support of H). This result is consistent with that of Stute [33] which constitutes a compromise between the results of Breslow and Crowley [9] , Földes and Rejtö [10] or Major and Rejtö [11] , and those of Földes and Rejtö [12] , Gill [13] , Csörgö and Horváth [14] , Ying [15] and Chen and Lo [16] .
Following Giné and Guillou [34] , we say that a non-increasing sequence of numbers is regular if there exists a constant
such that for all
We denote by
the following hypothesis:
for
large enough, the sequence
is regular non-increasing and there exists a constant
such that
with
is a non-increasing sequence such that
Theorem 4 Let be a sequence of integers such that
for all
and which satisfies hypothesis
for the almost-sure part. For all
we assume that
is alway continuous. Therefore,
where is the Landau in almost sure sense, and
where is the Landau in probability.
Both results of Theorem above always provides a rate in probability of uniform convergence of to
for all
through a random growing intervals
To prove Theorem 4, we have drawn from results based on the inference of empirical processes, given that in order to linearize the Kaplan-Meier process, it is necessary to impose continuity condition on Firstly, under the Hypothesis
we have the following result:
Lemma 1 Let be a sequence of integers such that
and, for the almost-sure results, such that
is satisfied. The rate of convergence of
to
is given by
Proof. The proof of this result follows straightforwardly from the proof of the first part of Theorem 1 concerning the supremum of
Proof of Theorem 4. The following decomposition is obtained for by means of integration by parts:
(14)
Equality (14) entails that:
Notice that the assumption of continuity of for
ensures that
is continuous according to proposition 1. We then conclude with Theorem 1 and Lemma 1.
5. Conclusion
In this paper, we have adapted the stochastic processes of Aalen [1] [2] to the Nelson-Aalen and Kaplan-Meier [3] estimators in a context of competing risks. We have focused particularly on the probability distributions of complete downtime individuals whose causes are known and which bring us to consider a partition of individuals into sub-groups for each cause. We have also provided some asymptotic properties of nonparametric estimators obtained.
Acknowledgements
I would like to thank Prof. Nicolas Gabriel ANDJIGA, Prof. Celestin NEMBUA CHAMENI, Prof. Eugene Kouassi for their support and their advices. I would also like to thank specially Prof. Kossi Essona GNEYOU for his collaboration and his cooperation during the preparation of this paper.
References
- Aalen, O.O. (1978) Nonparametric Estimation of Partial Transition Probabilities in Multiple Decrement Models. The Annals of Statistics, 6, 534-545. http://dx.doi.org/10.1214/aos/1176344198
- Aalen, O.O. (1978) Nonparametric Inference for a Family of Counting Processes. The Annals of Statistics, 6, 701-726. http://dx.doi.org/10.1214/aos/1176344247
- Kaplan, E.L. and Meier, P. (1958) Nonparametric Estimation from Incomplete Observations. Journal of the American Statistical Association, 53, 457-481. http://dx.doi.org/10.1080/01621459.1958.10501452
- Andersen, P.K., Borgan, Ø., Gill, R.D. and Keiding, N. (1993) Statistical Models Based on Counting Processes. Springer Series in Statistics, Spring-Verlag, New York,. http://dx.doi.org/10.1007/978-1-4612-4348-9
- Tsiatis, A. (1975) A Nonidentifiability Aspect of the Problem of Competing Risks. Proceeding of the National Academy of Sciences of the United States of America, 72, 20-22. http://dx.doi.org/10.1073/pnas.72.1.20
- Heckman, J. and Honoré, B. (1989) The Identifiability of the Competing Risks Models. Biometrika, 76, 325-330. http://dx.doi.org/10.1093/biomet/76.2.325
- Fleming, T. and Harrington, D. (1990) Counting Processes and Survival Analysis. John Wiley & Sons, Inc, Hoboken.
- Prentice, R.L., Kalbfleisch, J.D., Peterson, A.V., Flournoy, N., Farewell, V.T. and Breslow, N.E. (1978) The Analysis of Failure Times in the Presence of Competing Risks. Biometrics, 34, 541-554. http://dx.doi.org/10.2307/2530374
- Breslow, N. and Crowley, J. (1974) A Large Sample Study of the Life Table and Product-Limit Estimates under Random Censorship. The Annals of Statistics, 2, 437-453. http://dx.doi.org/10.1214/aos/1176342705
- Földes, A. and Rejtö, L. (1981) Strong Uniform Consistency for Nonparametric Survival Curve Estimators from Randomly Censored Data. The Annals of Statistics, 9, 122-129. http://dx.doi.org/10.1214/aos/1176345337
- Major, P. and Rejtö, L. (1998) Strong Embedding of the Estimator of the Distribution Function under Random Censorship. The Annals of Statistics, 16, 1113-1132. http://dx.doi.org/10.1214/aos/1176350949
- Földes, A. and Rejtö, L. (1981) A LIL-Type Result for the Product-Limit Estimator. Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete, 56, 75-86.
- Gill, R. (1983) Large Sample Behavior of the Product-Limit Estimator on the Whole Line. The Annals of Statistics, 11, 49-58. http://dx.doi.org/10.1214/aos/1176346055
- Csörgö, S. and Horváth, L. (1981) On the Koziol-Green Model for Random Censorship. Biometrika, 68, 391-401.
- Ying, Z. (1989) A Note on the Asymptotic Properties of the Product-Limit Estimator on the Whole Line. Statistics & Probability Letters, 7, 311-314. http://dx.doi.org/10.1016/0167-7152(89)90113-2
- Chen, K. and Lo, S.-H. (1997) On the Rate of Uniform Convergence of the Product-Limit Estimator: Strong and Weak Laws. The Annals of Statistics, 25, 1050-1087. http://dx.doi.org/10.1214/aos/1069362738
- Latouche, A. (2004) Modèles de Régression en Présence de Compétition. Thèse de Doctorat, Université de Paris, Paris.
- Belot, A. (2009) Modélisation Flexible des Données de Survie en Présence de Risques Concurrents et Apports de la Méthode du Taux en Excès. Thèse de Doctorat, Université de la Méditerranée, Marseille.
- Fine, J.P. and Gray, R.J. (1999) A Proportional Hazards Model for the Subdistribution of a Competing Risk. Journal of the American Statistical Association, 99, 496-509. http://dx.doi.org/10.1080/01621459.1999.10474144
- Aalen, O.O. and Johansen, S. (1978) An Empirical Transition Matrix for Non-Homogeneous Markov Chains Based on Censored Observations. Scandinavian Journal of Statistics, 5, 141-150.
- Giné, E. and Guillou, E. (1999) Laws of the Iterated Logarithm for Censored Data. The Annals of Probability, 27, 2042-2067. http://dx.doi.org/10.1214/aop/1022874828
- Cox, D. and Oakes, D. (1984) Analysis of Survival Data. Chapman and Hall, London.
- Kalbfleisch, J. and Prentice, R. (1980) The Statistical Analysis of Failure Time Data. John Wiley, New York.
- Nelson, W. (1969) Hazard Plotting for Incomplete Observations. Journal of Quality Technology, 1, 27-52.
- Nelson, W. (1972) A Short Life Test for Comparing a Sample with Previous Accelerated Test Results. Technometrics, 14, 175-185. http://dx.doi.org/10.1080/00401706.1972.10488894
- Csörgö, S. (1996) Universal Gaussian Approximations under Random Censorship. The Annals of Statistics, 24, 2744- 2778. http://dx.doi.org/10.1214/aos/1032181178
- Satten, G.A. and Datta, S. (1999) Kaplan-Meier Representation of Competing Risk Estimates. Statistics & Probability Letters, 42, 299-304. http://dx.doi.org/10.1016/S0167-7152(98)00220-X
- Datta, S. and Satten, G.A. (2000) Estimating Future Stage Entry and Occupation Probabilities in a Multistage Model Based on Randomly Right-Censored Data. Statistics & Probability Letters, 50, 89-95. http://dx.doi.org/10.1016/S0167-7152(00)00086-9
- Gill, R. and Johansen, S. (1990) A Survey of Product-Integration with a View toward Application in Survival Analysis. The Annals of Statistics, 18, 1501-1555. http://dx.doi.org/10.1214/aos/1176347865
- Breuils, C. (2003) Analyse de Durées de Vie: Analyse Séquentielle du Modèle des Risques Proportionnels et Tests d’Homogénéité. Thèse de Doctorat, Université de Technologie de Compiègne, Compiègne.
- Deheuvels, P. and Einmahl, J. (1996) On the Strong Limiting Behavior of Local Functionals of Empirical Processes Based upon Censored Data. The Annals of Statistics, 24, 504-525. http://dx.doi.org/10.1214/aop/1042644729
- Deheuvels, P. and Einmahl, J. (2000) Functional Limit Laws for the Increments of Kaplan-Meier Product-Limit Processes and Applications. The Annals of Statistics, 28, 1301-1335. http://dx.doi.org/10.1214/aop/1019160336
- Stute, W. (1994) Strong and Weak Representations of Cumulative Hazard Function and Kaplan-Meier Estimators on Increasing Sets. Journal of Statistical Planning and Inference, 42, 315-329. http://dx.doi.org/10.1016/0378-3758(94)00032-8
NOTES
1 is the Landau in almost sure sense and
is the Landau in probability.
2 denote the product integral (see Gill & Johansen ).