Statistical Diagnosis for Random Right Censored Data Based on Kaplan-Meier Product Limit Estimate

In this work, we consider statistical diagnostic for random right censored data based on K-M product limit estimator. Under the definition of K-M product limit estimator, we obtain that the relation formula between estimators. Similar to complete data, we define likelihood displacement and likelihood ratio statistic. Through a real data application, we show that our proposed procedure is validity.


Introduction
Statistical diagnosis developed in the mid-1970s, which is a new statistical branch.
In the course of development of the past 40 years, most scholars have studied the data into a convenient and effective statistical model.For example, the diagnosis and influence analysis of linear regression model has been fully developed (R. D. Cook and S. Weisberg [1], Bocheng Wei, Guobin Lu & Jianqing Shi [2]); The varing coefficient model is a useful extension of classical linear model.Regarding the varying coefficient model, especially for the B-spline estimation of parameter, diagnosis and influence analysis have some results (Cai, Z., Fan, J., Li, R. [3], Fan, J., Zhang, W. [4]).However, all the above results are obtained under the uncensored case.In many applications, some of the responses and/or covariants may not be observed, but are censored.For censored data, the usual statistical techniques for complete data situations are not readily applicable.Because there are too many hypothesis, it is easy to lose information.
As we all known, the distribution function of a random variable X contains all of the probabilistic information about X. Hence this paper tries to use non-parametric maximum likelihood estimate (NPMLE) [5] of distribution function in follow-up study.
The rest of the paper is organized as follows.The right censoring and K-M product limit estimator is introduced in Section 2. Outlier diagnosis and influence analysis are presented in Section 3.An example is given to illustrate our results in Section 4.

Right Censoring and Kaplan-Meier Product Limit Estimator
Here, the distribution of a real-valued random variables . For example, i X could be survival time after an operation, with i Y the time from the operation to the end of the study.The idea of the K-M product limit estimator is given by the conditional probability.Let

F t S t P T t P T t T t P T t T t P T t P T t T t P T t T t P T t
We assume that at the start of the study all subjects were alive, so ( ) where i r is the number of subjects at risk in the study at the time i t , and i d is the number of subject dying at time i t .The Kaplan-Meier estimator of CDF is

Statistical Diagnostic
For complete data, diagnostic measures of outlier contain case deletion and mean shift, influence statistics contain Cook's distance, W-K statistic, covariance ratio statistic, AP statistic, likelihood distance and so on.Similarly, we derive several diagnostic measures for right censored data.

( )
F i is the K-M product limit estimator of distribution function after case deletion, then there are lemma Proof: By the definition of K-M product limit estimator, there are j j j j j j j j j j j t t j t t j j j i j j j j j t t j t t j j j i j j j j i i i From the lemma, we can construct the relation formula between estimators, which is the foundation of discussion.

Likelihood Displacement
The likelihood function is defined as ,which can be computed from the i Z and i δ without knowing the i Y from uncensored i X .Likelihood displacement is the method for measuring influence, which is advanced by Cook and Weisberg in 1982, which is advanced from the view of data fitting.Considering the influence of deleting the i -th case.Then, the likelihood displacement can be expressed as follows

Likelihood Ratio Statistic
For complete data, there is likelihood ratio statistic.Similarly, we define the likelihood ratio statistic for censored data based on K-M product limit estimator as follows

Numerical Studies
(Vicious Tumour Data) In this section, we consider an example as the illustration for the above results.Considering a clinical research trial data (see Andersen [6]).There are 205 cancer patients who have been treated in Odense university hospital and tracked until the end of 1977.The survival time of some individuals due to death or end of the trial for other reasons were censored.Figure 1 and Figure 2 show that the first, second, third, fourth and fifth data are outliers.Indeed, this result is similar to Wang Shuling et al. [8].

Figure 1 .
Figure 1.The value of LDi.

Figure 2 .
Figure 2. The value of Ri.

Table 1 .
i L Fis likelihood function after deleting the i -th case, the results of i LD and i R are as fol- lows.

Table 1 .
The originality data and the value of i L F .