Local Influence Analysis of Varying-Coefficient Model with Random Right Censorship ()
1. Introduction
Local influence analysis is proposed from the viewpoint of differential geometry [1]. Nearly thirty years, the diagnosis and influence analysis of linear regression model have been fully developed (Ref. [2,3]). The varing-coefficient model is a useful extension of classical linear model. It has been widely applied in statistical modelling, for example, see Ref. [1,4-6]. However, all the above results are obtained under the uncensored case. In many applications, some of the responses and/or covariates may not be observed, but are censored. For censored data, the usual statistical techniques for complete data situations are not readily applicable. When the response is censored, the relationship between the response and the covariate has been widely studied in the literature [7-10].
So far the local influence analysis of varying-coefficient model with random right censorship has not yet seen in the literature, this paper attempts to study it. The paper is organized as follows: The introduction of local influence is given in Section 2; The model and the estimators are introduced in Section 3; The statistical diagnostics are given in Section 4; The example to illustrate our results is given in Section 5.
2. Local Influence
Ref. [2,3] have discussed the method of local influence analysis. Let
be an unknown k-dimensional parameter, whose domain is an open subset of Euclidean space
.
is a object function (for example, likelihood function, punishment log-likelihood function).
is a n-vector which denotes disturbed factor, for example weighted or tiny shift. Let
be the disturbed model, whose object function is
.
is the estimate which is from
. Given
makes
and
, where
has continuous second-order partial derivatives,
is the function of
. In geometry,
denotes n-dimentional surface
(1)
This image is called influence image, which varies with
. The variation rate in
of influence image reflects that the sensitivity of model, where
corresponds to the primary model. This method is called local influence. COOK advanced that utilize influence curvature to measure the change of influence image near
.
Ref. [2,3] pointed out that the influence curvature of
is given by
(2)
where
is second derivatives of
with respect to
, and
(3)
D and
are
matrix, where
.
The influence matrix is given by
(4)
Formula (2) shows that the maximal influence curvature
, where
is the eigenvalue of
whose absolute value is maximal, and
is the corresponding eigenvector which is called the direction of maximal influence curvature. Ref. [5] pointed out that the diagonal value of influence matrix also is the important diagnostic statistics.
3. The Model and Estimators
Let Y be the response variable and
be its associated covariates. The varying-coefficient regression model assumes the following structure:
(5)
where
is of dimension
and
is a p-dimensional vector of unknown coefficient functions.
is a stochastic error with
.
Consider the model (5), where Y is the survival time. Let C be the censoring time associated with the survival time Y. Assume that Y and C are conditionally independent given the associate covariates
. Denote
and
, where
is the index function. The observations are
which are random samples from
, where
. Thus instead of observing
, we observe the pairs
, where
and
. Observations on
for which
are uncensored, and observations on
for which
are censored. Model (5) is called varying-coefficient regression model with random right censorship right now. Let
is the distribution function of
, G is the common distribution function of
, and
. Note that
and
.
Lemma
,
.
Proof. Since
![](https://www.scirp.org/html/16-7401195\9cbfd008-a12d-448d-859f-3e4e2ffc9ab2.jpg)
and
![](https://www.scirp.org/html/16-7401195\d923095a-785d-4e02-b716-ee24472c8aef.jpg)
thus
,
.
Now we consider
follow the model
(6)
where
is i.i.d. and
,
. In practice, we replace
with
which is the KaplanMeier product-limited estimator of
(Ref. [11]). The expression of
is given as follows:
(7)
where ![](https://www.scirp.org/html/16-7401195\832a64a2-cf36-40d0-922f-19d95d15d469.jpg)
.
Let
, model (5) is transformed to following varying-coefficient regression model
(8)
Now we want to estimate the unknown coefficient function vector based on the transformed data. In varying-coefficient model, there are a lot of estimates for
. Here we use the B-spline estimate
.
Let
are the knots in
,
and
are the basis functions of m-th B-spline,
is the space of m-th Bspline function. We use the lemma 1.2 of Ref. [3], every smooth coefficient function
can be approximated by B-spline function
. The B-spline estimator of the coefficient function
in model (8) is the solution of following formula
(9)
In order to depict conveniently, supposed that
,
,
,
,
,
,
,
,
then
, and Formula (9) can be transformed to following minimize problem
(10)
Utilize the least-square method, the estimator of
is
![](https://www.scirp.org/html/16-7401195\83c2e329-9b1b-45df-93c3-6888145deaa5.jpg)
The estimator of the l-th coefficient function
,
is
![](https://www.scirp.org/html/16-7401195\b8624075-860c-4a1e-8e92-d7b7e017e8fc.jpg)
Then, the estimator of the coefficient function
is
(11)
where
is an
unit matrix, and
is Kronecker product of matrix.
4. The Local Influence of the Model
4.1. Weighted Perturbation Model
Suppose that
, then the weighted perturbation model can be shown that
(12)
Substituting this result into (3) yields
(13)
where
and
the second derivatives of
with respect to ![](https://www.scirp.org/html/16-7401195\6206a593-c602-423f-bcda-88afe8052344.jpg)
is given by
(14)
Substituting (13) and (14) into (4), we obtain the corresponding influence matrix
(15)
Here
denotes the direction of maximal influence curvature.
4.2. Response Variable Perturbation Model
Suppose that
, then the response variable perturbation model can be shown that
(16)
Substituting this result into (3) yields
(17)
the second derivatives of
with respect to
is given by
(18)
Substituting (17) and (18) into (4), we obtain the corresponding influence matrix
(19)
Here
denotes the direction of maximal influence curvature.
5. An Illustrative Example
(Vicious Tumour Data) Now we consider an example as the illustration for the above results. Considering a clinical research trial data (see Ref. [4]), there are 205 cancer patients who have been treated in Odense university hospital and tracked until the end of 1977. The survival time of some individuals due to death or end of the trial for other reasons were censored. Ref. [11] utilized a linear semi-parametric model to fit this test data. We utilized varying-coefficient model to fit the data of 57 patients. Where
denoted the thickness of tumour,
denoted the sex (1 is male, 0 is female). Considering that there was
![](https://www.scirp.org/html/16-7401195\2c2c17e2-79ea-4449-87c4-2dcd56179604.jpg)
Figure 1. The direction of maximal influence curvature dwj.
![](https://www.scirp.org/html/16-7401195\e9e8f6da-d04e-40af-a18a-ed7181b1ec61.jpg)
Figure 2. The diagonal value of influence matrix Fwj.
![](https://www.scirp.org/html/16-7401195\351d9f1c-8b60-4df4-8e7a-3a63de1832f7.jpg)
Figure 3. The diagonal value of influence matrix Frj.
![](https://www.scirp.org/html/16-7401195\1b0e5da8-8975-4229-83ea-133874a6e906.jpg)
Figure 4. The direction of maximal influence curvature drj.
relation between the thickness of tumor and the sex, so we supposed that there was a relation between the coefficient
and
. Hence, we utilized the varying-coefficient model
to analyze these data. The results are as Table 1 and Figures 1-4.
Figures 1 and 2 show that the first and the fourth data are the outlier, Figures 3 and 4 show that the first and the fourth data are the outliers. Indeed, the diagnostic effect of the diagonal value is identical with the direction of maximal influence curvature and this result is similar to Li Yali [12].