Functional Kernel Estimation of the Conditional Extreme Quantile under Random Right Censoring

The study of estimation of conditional extreme quantile in incomplete data frameworks is of growing interest. Specially, the estimation of the extreme value index in a censorship framework has been the purpose of many investi-gations when finite dimension covariate information has been considered. In this paper, the estimation of the conditional extreme quantile of a heavy-tailed distribution is discussed when some functional random covariate (i.e. valued in some infinite-dimensional space) information is available and the scalar response variable is right-censored. A Weissman-type estimator of conditional extreme quantiles is proposed and its asymptotic normality is established under mild assumptions. A simulation study is conducted to assess the fi-nite-sample behavior of the proposed estimator and a comparison with two simple estimations strategies is provided.


Introduction
Estimation of extreme quantile is one of the most important keys in many studies of rare events that happen occasionally but have a big impact on the behaviors of distribution of these rare events. The useful material for modeling those types of extreme events are provided by Extreme Value Theory (EVT), such as estimation of tailed index and associated extreme quantile. The study of extreme events is taking attention in numerous fields of applied statistics for example in hydrology where people are interested for example to estimate a maximum level reached by seawater along a coast over a given period or estimation of conditional quantile of rainfall for given region, see [1]. While in medicine the conditional quantile has been used to determine the probability of a patient with AIDS to survive within different age group, for more details see [2].
The main purpose of studying the problem of rare events is not in the estimation of "central" parameters of the random variable such as mean, mode and median fortunately researchers are interested on the understanding the properties or behaviors on its right tails. One of the known famous results in extreme value theory is the Fisher-Tippett-Gnedenko Theorem [3] then  belongs to the type 1 of one of the following three distribution functions: The three above distribution functions , α Λ Ψ and α Φ are the only possible limit laws of the normalized maximum of a sample of independent and identically distributed random variables. They are referred to as the Extreme Value Distribution (EVD). A parametrization of these three distributions into a single formula called Generalized Extreme Value Distribution (GEV) is given by: The parameter γ so-called the extreme-value index or the tail-index completely characterizes the behaviour of the tail of the distribution F. Its sign also determines the notion of domain of attraction.
The estimation of γ is a cornerstone when we deal with various problem in extreme value analysis such as estimation of conditional extreme quantile of random variable in presence of covariate. When some covariate information X is available and the distribution of Y depends on X, the problem is to estimate the conditional extreme-value index and conditional extreme quantiles. 1 Two non-degenerated distribution functions I and J are of same type if and only if there exist 0 a > and b ∈  such that ( ) ( ) X is available to the investigator, and the distribution of Y depends on X. We focus on the problem of estimating a conditional extreme-quantile of a heavy-tailed distribution when some functional covariate information X ∈ E is available, where E is an infinite dimensional space associated with a semi-metric ( ) , d ⋅ ⋅ . In the literature, many studies have conducted a research on estimating the conditional extreme quantiles of a random variables Y. Daouia et al. [5] introduced the kernel-type estimation of the extreme conditional quantile ( ) | n q x α from heavy-tailed distribution which belongs to Fréchet maximum domain of attraction, while [6] proposed new estimation procedure for estimating the conditional survival function ( ) ( ) under censoring and we establish their asymptotic normality. Fi-nally, the finite-sample performance of these estimators are assessed via simulations and compared with several alternative estimators.
The remainder of this paper is organized as follows. Section 2 consists of introduction of notations and describes the framework of the study. The construction of our estimator of functional conditional extreme quantile is summarized and the asymptotic normality of the proposed estimator is established in Section 3 and some proofs are given in Section 4. In Section 5, we assess via simulations the finite sample behavior of our estimator. The conclusion and some perspectives are presented in Section 6.

Framework
In this section, we are interested to describe the behaviors of the nonparametric estimator of the conditional quantile using the Kaplan-Meier estimator with covariate as functional random variable (infinite dimension) when the censored data are available, then for more details we can see [19].
be the independent copies of the random pairs ( ) , X Y , where Y is positive real random variable and X be a functional random variable, X ∈ E is an infinite dimensional space associated to semi-metric ( ) , d ⋅ ⋅ . We assume that the random variable Y can be a randomly right censored by a positive random variable C. Therefore, we now observe triple of the inde- is the indicator function of the event A. Regarding that the random variable C is defined on the some probability space ( ) , , Ω   as Y. We assume that Y and C are independent given X x = , where 1 , , n C C  are independent each other.
be the conditional cumulative distribution function of random variable Y and C given X x = respectively.
be the conditional survival function of random variable Y and C given X x = respectively. Since, here we are dealing with the case of heavy tails therefore, we assume that the following condition to be satisfied and  (1) and (2), we can state that the conditional distribution functions of Y and C given X x = are in Fréchet maximal domain where for x fixed, By condition of independence between Y and C, the conditional survival = is also a regularly varying function at in- as expressed as follow: is the ultimate proportion of uncensored observations among , 1, , i Z i n =  ; the proof of this statement is out of scope of presented paper (see [2] [10] for more details) and In the sequel, we further as- Normally, let ( ) d ⋅ ⋅ be semi-metric correspondent with the space E and suppose that now we observe a sequence of ( ) In this paper, we are interested on the problem of the estimation of condition- By considering the random right censored model ( ) , Z δ , Z is random variable and δ is indicator of censoring, then δ equal to one if Y C ≤ and zero otherwise, therefore we say that Y is right censored by C. Hence, conditional cumulative Hazard ratio is given by: x h is the Nadaraya-Watson weighted [20] [21] expressed as follows: where K is a kernel density and h is a bandwidth parameter such that From the Equation (3), estimator of survival function may be expressed as We denote the conditional moderated quantile, of the order 0 n α → as n → ∞ of random variable Y given X x = , by Therefore, a natural estimator of ( ) B t r be a ball centered at point t and the radius r for t ∈ E and defined by and let h be a positive sequence tending to zero as n → ∞ . The proposed method for moving windows adopted in [22] shows that the response of variable Y i 's correspond to the covariates x i 's belongs in ball ( ) , B t h , therefore such proposition is given by a small ball probability of X.

Estimation of Conditional Extreme Quantile
We now investigate the estimation of large conditional quantile ( ) This function may be rewritten as and zero otherwise where ( ) ( ) The term Some regularity conditions are needed for proving our results (these conditions are adapted from [5] and [23]). We also require some Lipschitz conditions to be fulfilled.
The behavior of the log quantile function with respect to the first derivative is controlled under the hypothesis (A3) which is a necessary and sufficient condition to obtain the heavy-tail property.
The largest oscillation of the log-quantile function with respect to its second variable is defined for all

Proof of Proposition 4
The proof is similar to proof of Theorem 1 in [24] and is therefore omitted.  x x

Main Result
Under the assumption in Theorem 1 and applying the result of Proposition 4 By using some notation, we see that Using the expression in Equation (6) and the hypothesis of Theorem 1 leads to 1, 0 n A → in probability as n goes to infinity. According to the assumption of the Theorem 1, 2,n A converges in distribution to a centered Gaussian distribution with a covariate matrix AV (see [25]). Finally, under Lemma 2(i) in [6], Lemma 3 and the assumption of Theorem 1 we have which goes to zero. This concludes the proof.

Simulation Design
In this part of simulation, the main purpose is to assess the performance of the proposed estimators. We will make a comparison of the results with two simple estimation approaches based on tail index of heavy tailed distribution under right random censored. By assuming that the theoretical distribution of Y given X and C given X are known. We consider the simulation of 500 N = replications of a sample size 200, 500 n n = = of random triple observation of ( )

{ }
, , , 1, , X Z δ to construct the estimates. Where the curve i X is given by the following expression of a functional covariate X ∈ E which is defined by with W is normally distributed on [ ] 0,1 and Ω in a random variable which follows a Bernoulli distribution with a probability equal to half as adapted in [26]. Figure 1 shows a sample of 300 curves representing a realisation of the functional random variable X.
In practice, there are some parameters to be fixed as kernel density K be an asymmetric linear kernel defined as x dependents on parameters k h h = . The bandwidth parameter h is chosen using the cross-validation method which was implemented in [6].
with ˆn i F − is the kernel conditional Kaplan-Meier estimator presented in Equation (4) adopted in [8], which depends on parameter h The aforementioned estimator is calculated on the sample ( ) In case the bandwidth is already been selected then, next step is to determine the number of threshold excesses k. Different methods have been mentioned in literature and in this paper we adopted the method used by [8] as described as follow: We started by creating the successive block of elements of the estimate in Equation (5) with k y , for 1, , 1 k n = −  , such that for each block has size max k     . Finally, we compute the standard deviation for the estimates in each block, the median of the estimates for a minimum standard deviation is the one will be taken as an optimal k.
Other thing to discuss is the selection of semi-metric distance because a semi-metric appears to be an important key for behavior of nonparametric statistics for functional data for more details can see [27]. Since the curves of ( ) X t are smoothing curves according to [6], the semi-metric distance based on the derivative will be used to determine the distance between two curves which is defined as follows: where q is the degree of derivative.

Estimation of Conditional Quantile
In order to check the finite sample performance of the extreme conditional Open Journal of Statistics quantile estimator in Equation (5), we have performed some simulation experiments, which are thoroughly described in the Section 5.1. Furthermore, to evaluate the impact of the order of derivative for the choice of semi-metric as [27] advice for the practical cases, we have calculated the extreme conditional quantile at different levels of derivative as presented in Table 1. Secondly, we examine the behavior of our estimator according to both of censored rate and the sample size. Finally, the accuracy of our estimators is measured by the Mean Square Error (MSE) and Mean Absolute Error (MAE) for each scenario with 500 replications as illustrated in Table 1.
To assess the performance of our estimator, we make a comparison with two simple estimation strategies. The first one is a complete-case procedure ("CC" for short): we remove all censored observations from the simulated samples.
Then, we compute the tail index estimator proposed in [23] where the estimator does not take into consideration the censorship. While, the second strategy is the ignored case, where, we consider that i δ for 1, , i n =  equally to one for all observations. We consider the observations , 1, , as if they were uncensored. That kind of strategy is called Ignored case ("CI" for Censoring-Ignored).
Now to illustrate the asymptotic normality result for our estimators, we use the Kolmogorov-Smirnov test to examine the asymptotic normality of the estimator as presented in Table 2.
The P-values of the Kolmogorov-Smirnov test are greater than 0.05 as illustrated in Table 2. Undoubtedly, the simulation results reveal that the behavior of the asymptotic normality is closely linked to the censored rate, degree of derivative and sample size. Thus, the present estimators are normally asymptotically distributed as the results in Table 2 confirmed our theoretical results.

Discussion
The performance of our estimator  (7). To demonstrate the accurate of the proposed estimator, we provide the comparison for complete case and ignored case as described in Section 5.2.
The proposed estimator of conditional extreme quantiles shows to be quite well performance at low rate of censored as the sample size becomes large enough as is illustrated in Table 1, where different value of empirical Mean Square Error (MSE) and empirical Mean Absolute Error (MAE) of estimators respectively are presented at different sample size with respect to various censored rate and degree of derivative respectively. Our estimator performs well at high level of derivative of semi-metric distance.
According to choice of the semi-metric distance, our simulation results shows  that the degree of derivative play a key role, since the functional curves are smooth, where the semi-metric distance with high degree of derivative is well perform compared to low derivative degree as is illustrated in Table 1 and interested reader can see [27].
Considering the results in Table 1, CI and CC estimators of ( ) The Kolmogorov-Smirnov test has been performed to check the asymptotic normality of our proposed estimator, according to the results in Table 2. There is no doubt that the proposed estimators are asymptotically normal since all P-value are greater than 0.05 as the level of significance which confirmed the theoretical results.

Conclusions and Perspectives
We considered the estimation of the functional Weissman kernel type estimator when some functional random covariate (i.e. valued in some infinite-dimensional space) information is available and the scalar response variable is right-censored.
Its asymptotic properties were established and its finite sample performance was illustrated in a simulation study. Also a comparison with two simple estimation strategies has been provided.
In future, work will be focused on the estimation of the conditional extreme value of Weibull distribution under random right censored in case the covariate is functional random variable, and established its asymptotic behavior.