Functional Kernel Estimation of the Conditional Extreme Quantile under Random Right Censoring ()
1. Introduction
Estimation of extreme quantile is one of the most important keys in many studies of rare events that happen occasionally but have a big impact on the behaviors of distribution of these rare events. The useful material for modeling those types of extreme events are provided by Extreme Value Theory (EVT), such as estimation of tailed index and associated extreme quantile. The study of extreme events is taking attention in numerous fields of applied statistics for example in hydrology where people are interested for example to estimate a maximum level reached by seawater along a coast over a given period or estimation of conditional quantile of rainfall for given region, see [1]. While in medicine the conditional quantile has been used to determine the probability of a patient with AIDS to survive within different age group, for more details see [2].
The main purpose of studying the problem of rare events is not in the estimation of “central” parameters of the random variable such as mean, mode and median fortunately researchers are interested on the understanding the properties or behaviors on its right tails. One of the known famous results in extreme value theory is the Fisher-Tippett-Gnedenko Theorem [3] [4]. Let
be independent and identically distributed random variables with distribution function F. Suppose there exists a sequence of constants
, and
real and a non-degenerate distribution
such that for all
,
then
belongs to the type1 of one of the following three distribution functions:
(Fréchet)
(Weibull)
for all
(Gumbel)
The three above distribution functions
and
are the only possible limit laws of the normalized maximum of a sample of independent and identically distributed random variables. They are referred to as the Extreme Value Distribution (EVD). A parametrization of these three distributions into a single formula called Generalized Extreme Value Distribution (GEV) is given by:
The parameter
so-called the extreme-value index or the tail-index completely characterizes the behaviour of the tail of the distribution F. Its sign also determines the notion of domain of attraction.
The estimation of
is a cornerstone when we deal with various problem in extreme value analysis such as estimation of conditional extreme quantile of random variable in presence of covariate. When some covariate information X is available and the distribution of Y depends on X, the problem is to estimate the conditional extreme-value index and conditional extreme quantiles.
Then, in this paper, we consider the situation where some covariate information X is available to the investigator, and the distribution of Y depends on X. We focus on the problem of estimating a conditional extreme-quantile of a heavy-tailed distribution when some functional covariate information
is available, where
is an infinite dimensional space associated with a semi-metric
.
In the literature, many studies have conducted a research on estimating the conditional extreme quantiles of a random variables Y. Daouia et al. [5] introduced the kernel-type estimation of the extreme conditional quantile
from heavy-tailed distribution which belongs to Fréchet maximum domain of attraction, while [6] proposed new estimation procedure for estimating the conditional survival function
by considering different double kernel estimator. They proposed a Weissman-type estimator to estimate the conditional extreme quantile.
In normal case, it may happen that we observe the incomplete information for the variable of interest. In classical applications such as the analysis of lifetime data (survival analysis, reliability theory, insurance), a typical feature which appears is censorship. For example, in medical follow up, the response variable Y represents the time elapsed from the entry of a patient in, say, a follow-up study until death. If, at the time that the data collection is performed, the patient is still alive or has withdrawn from the study for some reason, the variable of interest Y will not be available. Many authors have addressed this issue among them [7] [8] for more details.
Recently, many authors have been interested in the estimation of the extreme value index and extreme quantile we can enumerate few of them such as [9] [10] [11] [12] have considered the cases of the estimation of extreme value index and extreme quantile from censored data when the covariate information are not available. In [9] the authors proposed to estimate extreme value index by using the modification of Hill’s estimator version. In [13] [14] [15] authors proposed the Bayesian extreme value index and extreme quantile for the case of uncensored data. [16] [17] [18] investigated the estimation of extreme value index and extreme quantile where there is not covariate information and censored data are taking into consideration. [8] investigate estimation of the conditional extreme value-index and conditional extreme quantile under randomly right censored with presence of covariate for finite dimension.
However, based on our knowledge, estimation of the conditional extreme quantile of a heavy-tailed distribution under random right censoring data and functional covariate has not yet been addressed, which motivated us to tackle this issue by taking into consideration the heavy tail distribution and the functional covariate under random right censored data. In our methodology, we consider the Kernel conditional Kaplan-Meier estimator of the conditional survival function and the functional covariate (infinite dimension) is present. Then we construct Weissman-type estimators of the conditional extreme quantile
under censoring and we establish their asymptotic normality. Finally, the finite-sample performance of these estimators are assessed via simulations and compared with several alternative estimators.
The remainder of this paper is organized as follows. Section 2 consists of introduction of notations and describes the framework of the study. The construction of our estimator of functional conditional extreme quantile is summarized and the asymptotic normality of the proposed estimator is established in Section 3 and some proofs are given in Section 4. In Section 5, we assess via simulations the finite sample behavior of our estimator. The conclusion and some perspectives are presented in Section 6.
2. Framework
In this section, we are interested to describe the behaviors of the nonparametric estimator of the conditional quantile using the Kaplan-Meier estimator with covariate as functional random variable (infinite dimension) when the censored data are available, then for more details we can see [19].
Let
be the independent copies of the random pairs
, where Y is positive real random variable and X be a functional random variable,
is an infinite dimensional space associated to semi-metric
. We assume that the random variable Y can be a randomly right censored by a positive random variable C. Therefore, we now observe triple of the independent
, where
and
for
where
is the indicator function of the event A. Regarding that the random variable C is defined on the some probability space
as Y. We assume that Y andC are independent given
, where
are independent each other.
Let
and
be the conditional cumulative distribution function of random variable Y and C given
respectively.
Let
and
be the conditional survival function of random variable Y and C given
respectively. Since, here we are dealing with the case of heavy tails therefore, we assume that the following condition to be satisfied
(A1)
(1)
and
(2)
where
are positive unknown functions of the covariate x,
are positive functions and
are continuous and ultimately decreasing to zero. From Equations (1) and (2), we can state that the conditional distribution functions of Y and C given
are in Fréchet maximal domain of attraction. Thus,
and
are taken as the conditional extreme tail index functions. Therefore, for all
,
and
are regularly
varying functions at infinity with index
and
respectively. Thus,
where for x fixed,
and
are slowly varying functions at infinity, that is, for all
,
By condition of independence between Y and C, the conditional survival function
of Z given
is also a regularly varying function at infinity with index
as expressed as follow:
with
where
is the ultimate proportion
of uncensored observations among
; the proof of this statement is out of scope of presented paper (see [2] [10] for more details) and
,
. In the sequel, we further assume that
belong to the Hall class of slowly-varying functions.
Normally, let
be
valued random element where,
is a semi-metric space. Let
be semi-metric correspondent with the space
and suppose that now we observe a sequence of
a copies of
.
In this paper, we are interested on the problem of the estimation of conditional extreme quantile
of order
of the conditional survival distribution function
of Y given
.
By considering the random right censored model
, Z is random variable and
is indicator of censoring, then
equal to one if
and zero otherwise, therefore we say that Y is right censored by C. Hence, conditional cumulative Hazard ratio is given by:
Therefore, the estimator
of
is given by
(3)
where
and
with
is an indicator function of
associated to
and
is the Nadaraya-Watson weighted [20] [21] expressed as follows:
where K is a kernel density and h is a bandwidth parameter such that
as
.
From the Equation (3), estimator of survival function may be expressed as
By applying Taylor expansion of
around
where
, we obtained
We denote the conditional moderated quantile, of the order
as
of random variable Y given
, by
Therefore, a natural estimator of
is given by
Let us denote that
be a ball centered at point t and the radius r for
and defined by
and let h be a positive sequence tending to zero as
. The proposed method for moving windows adopted in [22] shows that the response of variable Yi’s correspond to the covariates xi’s belongs in ball
, therefore such proposition is given by
For
, we denote the conditional probability distribution function of Y given
by
By assuming that
be the ball of center x and the radius h, at the end
can be rewritten as
a small ball probability of X.
3. Estimation of Conditional Extreme Quantile
We now investigate the estimation of large conditional quantile
of order
of
for a variable Y given
defined by
with
as
. To define our estimator, we have in the first step to define
the functional estimator of a large conditional quantile
within the sample.
Let us consider the Kernel conditional Kaplan-Meier estimator of the conditional survival function
, for all
and
defined as follows:
This function may be rewritten as
(4)
and zero otherwise where
denoted the order statistics of
.
By taking into account the estimator in Equation (4), we propose to estimate conditional quantile
within the sample of observation (i.e. for fixed
) as a generalized inverse of
as
where
as
, we propose to estimate the conditional extreme quantile
by Weissman-type estimator
(5)
The term
is an extrapolation factor allowing to estimate
arbitrary large quantiles and
is the estimator of the censored functional conditional extreme value index
.
Some regularity conditions are needed for proving our results (these conditions are adapted from [5] and [23] ). We also require some Lipschitz conditions to be fulfilled.
(A2) K is a function with support
and there exist
such that
for all
.
(A3) Let consider
and let a fixed
, the conditional quantile function
is differentiable and the function defined by
is continuous and such that
.
The behavior of the log quantile function with respect to the first derivative is controlled under the hypothesis (A3) which is a necessary and sufficient condition to obtain the heavy-tail property.
The largest oscillation of the log-quantile function with respect to its second variable is defined for all
as
Theorem 1. Assume that (A1)-(A3) hold, let
and consider
and
be sequence such that
and
,
as
. Consider
such that
with
and
.
Let
and
as
, then
4. Proofs
4.1. Preliminary Results
Lemma 2. Let
and
be two sequence of random variables. Suppose there exists an event
such that
with
, then
implies
.
Proof of Lemma 2: See [1].
Lemma 3. Suppose (A1) and (A2) holds. Let consider
, such that
as
, then
Proof of Lemma 3: See [5].
Proposition 4. Let
the nonrandom number of observations in the slice
. Let
be a sequence satisfying
and
. if
as
, for some
. Then,
Proof of Proposition 4
The proof is similar to proof of Theorem 1 in [24] and is therefore omitted.
4.2. Main Result
Proof of Theorem 1
Let
, then the conditional estimation of extreme quantile defined as
then,
Therefore
with,
Under the assumption in Theorem 1 and applying the result of Proposition 4
(6)
By using some notation, we see that
Using the expression in Equation (6) and the hypothesis of Theorem 1 leads to
in probability as n goes to infinity. According to the assumption of the Theorem 1,
converges in distribution to a centered Gaussian distribution with a covariate matrix AV (see [25] ). Finally, under Lemma 2(i) in [6], Lemma 3 and the assumption of Theorem 1 we have
which goes to zero. This concludes the proof.
5. Simulation Studies
5.1. Simulation Design
In this part of simulation, the main purpose is to assess the performance of the proposed estimators. We will make a comparison of the results with two simple estimation approaches based on tail index of heavy tailed distribution under right random censored. By assuming that the theoretical distribution of Y given X and C given X are known. We consider the simulation of
replications of a sample size
of random triple observation of
from
to construct the estimates.
Where the curve
is given by the following expression of a functional covariate
which is defined by
for all
with W is normally distributed on
and
in a random variable which follows a Bernoulli distribution with a probability equal to half as adapted in [26]. Figure 1 shows a sample of 300 curves representing a realisation of the functional random variable X.
The conditional distribution of Y given
is a Burr distribution with parameter
, which implies that
, with
Figure 1. A sample of curves
.
The conditional distribution of C given
is also Burr distribution where the parameter
is chosen to yield various values for the overall censoring percentage c
. Since
with
is the ultimate
proportion of uncensored observations among
for
then
is selected, we choose
such that
is approximately to
as censoring percentage.
In practice, there are some parameters to be fixed as kernel density K be an asymmetric linear kernel defined as
, the estimator
dependents on parameters
. The bandwidth parameter h is chosen using the cross-validation method which was implemented in [6].
with
is the kernel conditional Kaplan-Meier estimator presented in Equation (4) adopted in [8], which depends on parameter h The aforementioned estimator is calculated on the sample
and
. Here we are considering that h belongs to a regular grid
where
and
with
.
In case the bandwidth is already been selected then, next step is to determine the number of threshold excesses k. Different methods have been mentioned in literature and in this paper we adopted the method used by [8] as described as follow:
We started by creating the successive block of elements of the estimate in Equation (5) with
, for
, such that for each block has size
. Finally, we compute the standard deviation for the estimates in each block, the median of the estimates for a minimum standard deviation is the one will be taken as an optimal k.
Other thing to discuss is the selection of semi-metric distance because a semi-metric appears to be an important key for behavior of nonparametric statistics for functional data for more details can see [27]. Since the curves of
are smoothing curves according to [6], the semi-metric distance based on the derivative will be used to determine the distance between two curves which is defined as follows:
(7)
where q is the degree of derivative.
5.2. Estimation of Conditional Quantile
In order to check the finite sample performance of the extreme conditional quantile estimator in Equation (5), we have performed some simulation experiments, which are thoroughly described in the Section 5.1. Furthermore, to evaluate the impact of the order of derivative for the choice of semi-metric as [27] advice for the practical cases, we have calculated the extreme conditional quantile at different levels of derivative as presented in Table 1. Secondly, we examine the behavior of our estimator according to both of censored rate and the sample size. Finally, the accuracy of our estimators is measured by the Mean Square Error (MSE) and Mean Absolute Error (MAE) for each scenario with 500 replications as illustrated in Table 1.
To assess the performance of our estimator, we make a comparison with two simple estimation strategies. The first one is a complete-case procedure (“CC” for short): we remove all censored observations from the simulated samples. Then, we compute the tail index estimator proposed in [23] where the estimator does not take into consideration the censorship. While, the second strategy is the ignored case, where, we consider that
for
equally to one for all observations. We consider the observations
as if they were uncensored. That kind of strategy is called Ignored case (“CI” for Censoring-Ignored).
Now to illustrate the asymptotic normality result for our estimators, we use the Kolmogorov-Smirnov test to examine the asymptotic normality of the estimator as presented in Table 2.
The P-values of the Kolmogorov-Smirnov test are greater than 0.05 as illustrated in Table 2. Undoubtedly, the simulation results reveal that the behavior of the asymptotic normality is closely linked to the censored rate, degree of derivative and sample size. Thus, the present estimators are normally asymptotically distributed as the results in Table 2 confirmed our theoretical results.
5.3. Discussion
The performance of our estimator
defined in (5) is evaluated using Mean Squared Error (MSE) and Mean Absolute Error (MAE). We also provide the averaged value (over the N samples) of the number of threshold excesses
. The accuracy of our estimator depends on the censoring percentage and on the degree of derivative of the semi-metric
defined in Equation (7).
To demonstrate the accurate of the proposed estimator, we provide the comparison for complete case and ignored case as described in Section 5.2.
The proposed estimator of conditional extreme quantiles shows to be quite well performance at low rate of censored as the sample size becomes large enough as is illustrated in Table 1, where different value of empirical Mean Square Error (MSE) and empirical Mean Absolute Error (MAE) of estimators respectively are presented at different sample size with respect to various censored rate and degree of derivative respectively. Our estimator performs well at high level of derivative of semi-metric distance.
According to choice of the semi-metric distance, our simulation results shows
Table 1. Table of MSE and MAE of the estimators value with sample size
and
for
replications,
and
.
is the average of threshold excesses.
Table 2. Table of Kolmogorov-Smirnov P-value for distribution of Weissman quantile estimator at 10%, 20%, 30% and 40% censored rate for sample size 200 and 500 at 500 replications.
that the degree of derivative play a key role, since the functional curves are smooth, where the semi-metric distance with high degree of derivative is well perform compared to low derivative degree as is illustrated in Table 1 and interested reader can see [27].
Considering the results in Table 1, CI and CC estimators of
are quietly biased, even though when censoring is moderate. As result, our estimator in Equation (5) proved a significant result regarding the issues of estimating the functional conditional extreme quantile under censorship.
The Kolmogorov-Smirnov test has been performed to check the asymptotic normality of our proposed estimator, according to the results in Table 2. There is no doubt that the proposed estimators are asymptotically normal since all P-value are greater than 0.05 as the level of significance which confirmed the theoretical results.
6. Conclusions and Perspectives
We considered the estimation of the functional Weissman kernel type estimator when some functional random covariate (i.e. valued in some infinite-dimensional space) information is available and the scalar response variable is right-censored. Its asymptotic properties were established and its finite sample performance was illustrated in a simulation study. Also a comparison with two simple estimation strategies has been provided.
In future, work will be focused on the estimation of the conditional extreme value of Weibull distribution under random right censored in case the covariate is functional random variable, and established its asymptotic behavior.
Acknowledgements
The authors acknowledge an anonymous Associate Editor and an anonymous reviewer for their helpful comments that led to an improved version of this paper.
NOTES
1Two non-degenerated distribution functions I and J are of same type if and only if there exist
and
such that
for all
.