Some Improvement on Convergence Rates of Kernel Density Estimator

In this paper two kernel density estimators are introduced and investigated. In order to reduce bias, we intuitively subtract an estimated bias term from ordinary kernel density estimator. The second proposed density estimator is a geometric extrapolation of the first bias reduced estimator. Theoretical properties such as bias, variance and mean squared error are investigated for both estimators. To observe their finite sample performance, a Monte Carlo simulation study based on small to moderately large samples is presented.


Introduction
Many efforts have been devoted to investigating the optimal performance of kernel density estimator since it has been the most widely used nonparametric method in the last decades.Suppose we use ( ) n f x to denote the kernel estimator of the true density function ( ) f x .Normally we use mean squared error (MSE) and its two components, namely bias and variance, to quantify the accuracy of an estimator.Note that the MSE of ( ) n f x is decomposed into two parts: There have been numerous literatures that discuss approaches to improving the performance of kernel estimators, while reducing the bias has been the most commonly considered one.Article [1] obtained the best asymptotic convergence rate of MSE for orthogonal kernel estimators.Article [2] introduced geometric extrapolation of nonnegative kernels, while [3] discussed the number of vanishing moments of kernel order using Fourier transformation.Variable kernel estimation in [4] successfully reduced the bias by employing larger smoothing parameters in low density regions, while [5] introduced the idea of inadmissible kernels which also results in reduced bias.On the other hand, [6] proposed an estimator using some probabilistic arguments which achieves the goal of bias reduction.Article [7] suggested a locally parametric density estimator, a semiparametric technique, which effectively reduces the order of bias.Article [8] proposed algorithms relevant to quadratic polynomial and β cumulative distribution function (c.d.f.) which accommodates possible poles at boundaries and in consequence reduces the bias at boundaries.Article [9] introduced a bias reduction method using estimated c.d.f.via smoothed kernel transformations.Article [10] introduced a two-stage multiplicative bias corrected estimator.Article [11] developed a skewing method to reduce the bias while the variance is only increased by a moderate constant factor.In addition, some recent works discussed approaches of obtaining smaller bias of the estimator via several other methods.Article [12] worked out a bias reduced kernel relative to the classical kernel estimator via Lipschitz condition.Article [13] introduced an adjusted kernel density estimator in which the kernel is adapted to the data but not fixed.This method naturally leads to an adaptive-choice of the smoothing parameters which can reduce the bias.Although the variance reduction method is not as approachable as the bias reduction method, there still have been a lot of scholars working on it.Article [14] suggested an approach to reduce the variance in local linear regression employing the idea of the skewing method.Article [15] also used the skewing method on bias reduction and variance reduction at the same time which in turn reduces the MSE.
Many of above mentioned bias reduction methods result in complex kernel density estimators.In this paper, we introduce a novel but intuitive and feasible bias reduced kernel density estimator.In Section 2, we present the bias reduced estimator and investigate its asymptotic bias, variance and MSE.A second estimator is proposed and studied in Section 3 as a geometric extrapolation of the bias reduced kernel.To examine the finite sample performance of both estimators, a simulation study is carried out in Section 4. Finally some remarks are given in Section 5.

A Bias Reduced Kernel Estimator
Kernel density estimator was first introduced in [16] and [17].Suppose 1 , , n X X  is a simple random sample from the unknown density function f.Let K be a function on real line, i.e. the "kernel", and let h be a positive value, i.e. the "bandwidth".Then the kernel density estimator of f is defined as ( ) .
To make the estimator meaningful, the kernel function is usually required to satisfy conditions . Both [18] and [19] pointed out that if Then from (2.2) and (2.3) we have We can easily see that the optimized bandwidth is  and then the optimal MSE is of the order .In order to reduce the bias of ordinary kernel density estimator, we can intuitively subtract the leading bias term ( ) ( ) 2) from it.Since the leading term of the bias is unavailable due to the unknown f, we can simply use its estimation, i.e.
( ) ( )  ( ) ( ) One could use any type of estimation of the bias term.We could simply replace f with the kernel estimator f n since it is readily available.As a result, our proposed estimator is From the way of construction, this new estimator should be able to reduce the bias and thus the MSE.To see whether this is the case or not, we next calculate the bias and the variance of . We make the following regularity condition on f, K and h: 1) 2) f is fourth differentiable in a neighbourhood of x.
3) 0 h → and nh → ∞ as n → ∞ .Theorem 2.1.Under 1), 2) and 3), ( Bias d 6 ( ) and the optimal MSE is of the order Proof.By Taylor expansion we have Thus we have ( n n On the other hand, ( ( ) Note that (2.7) gives

( ) ( )
, then all the odd mo- ments of K are zero and, as a result, the bias of ( ) n f x  will be improved to a higher order of ( )

4
O h .In this case, the optimal MSE is further reduced to From the definition of n f  in (2.4), this estimator could be possibly negative on some points x.In order to make it meaningful in practice, i.e. make it a positive density estimator, one can use the following variation of the proposed bias deducted estimator where A I is an indicator function that takes value one on set A and zero otherwise.Note that the first term on the right hand side of Equation (2.4) converges to ( ) f x in probability, while the second term is of the order

( )
O h , which goes to zero as n → ∞ under 3).Thus ( ) n f x  converges to ( ) f x in probability, and as a re- sult ( ) n f x  is positive in probabililty at any point x ∈ Ω with Ω the support of f.Therefore, ˆn f has similar performance and properties as n f  , especially when sample size is large.

A Geometric Extrapolated Kernel Estimator with Bias Reduction
Geometric extrapolation was introduced in kernel density estimation by [2].Consider the ordinary kernel density estimator with two different bandwidths h and 2h: ( ) Suppose the kernel function K above is symmetric so that all the odd moments of K are zero.Article [2] proposed the following estimator ( ) ( ) ( ) , which is a faster convergence rate than the rate of the ordinary kernel estimator.
Instead of using the ordinary kernel estimator, we propose to use the bias reduced kernel estimator, presented in Section 2, in the construction of geometric extrapolated kernel (3.1).Denote the bias reduced kernel estimator with two bandwidths h and 2h as ( ) ( ) ( ) Now the geometric extrapolated kernel estimator with bias reduction is proposed as ( ) ( ) ( ) Since the bias reduced kernel estimator has improved bias and MSE over the ordinary kernel estimator, especially when K is symmetric, we expect that with geometric extrapolation it will achieve further improvement.
Theorem 3.1.Under 1), 2) and 3), Var .7 7 ( ) and the optimal MSE is of the order Proof.We calculate ( ) ( ) where Here we want to construct a geometric extrapolated kernel estimator of the form ( ) ( ) ( ) and a series expansion for exponential function gives , where U and V are both of order ( ) ( )

−
, and have expectations zero and variances and covariances of order and then  .[2] proposed the geometric extrapolation of ordinary kernel estimator which results in optimal MSE of the order . Though here we achieve the same order of optimal MSE, we don't impose the assumption that K is symmetric while [2] does.Remark 3.2.When K is symmetric, we propose another estimator ( ) ( ) ( ) This estimator reduces the bias to ( ) O h and has improved optimal MSE of the order 12 13 O n

Simulation Study
In this section, we carry out a simulation study designed to demonstrate the finite sample performance of the proposed bias reduced kernel estimator (BRK) n f  given in (2.4) and the proposed geometric extrapolation of bias reduced kernel estimator (GEBRK) n f   given in (3.2).Particularly, we compare their bias and MSE with the ordinary kernel density estimator (OK) n f in (2.1) and the geometric extrapolation of ordinary kernel estimator (GEOK) n f  in (3.1).Without loss of generality, we suppose f is the standard normal density.We randomly select 1000 independent samples of size n = 20, 50, 100 or 200.We choose arbitrarily the points x = 0, 0.5, 1, 1.5, 2, 2.5 and 3 at which the kernel estimators are calculated and compared.Since the properties of kernel estimators do not depend much on which particular kernel is used, we choose the standard normal as the kernel function K without loss of generality.For the bandwidth h, we use the optimal one for each individual kernel estimator.In another word, since here K is symmetric, by Remarks 2.1 and 3.2, we choose h n − = for both BRK and GEOK and where 0 θ is the true parameter and ˆi θ is the estimate value θ based on the i-th sample.In our case, 0 θ = ( ) for fixed x = 0, 0.5, 1, 1.5, 2, 2.5 or 3.The simulation results are presented in Tables 1-7.
From Tables 1-7 we can see that BRK consistently has smaller bias and MSE than OK except for x = 1.This is simply due to the fact that ( ) f ′′ = which in turn reduces the bias of OK to ( ) 4 O h .Apparently this is of the same order as the bias of BRK, however this is a special case that is only true at point x = 1 here and the    conclusion cannot be generalized.When the two estimators with geometric extrapolation are compared, GEBRK generally has smaller bias and MSE than GEOK, especially when sample size is large.When BRK and GEBRK are compared, GEBRK tends to have smaller variance and MSE but larger bias than BRK.In terms of bias, BRK and GEBRK perform much better than OK and GEOK while BRK and GEBRK are very competitive.Geometric extrapolation reduces the variance and MSE in general, i.e.GEOK and GEBRK perform better than OK and BRK in terms of variance and MSE.When MSE is concerned, GEBRK performs best and then GEOK.These observations are somehow different at point x = 1 due to the fact that ( ) 1 0 f ′′ = as mentioned above.

Concluding Remarks
In this paper, we first propose a very intuitive and feasible kernel density estimator which reduces the bias and MSE significantly compared with the ordinary kernel density estimator.Secondly, we construct a geometric extrapolation of the bias reduced kernel estimator which further improves the convergence rates of both bias and MSE.Our simulation study shows that for finite sample size both estimators perform competitively well and better than the ordinary kernel estimator and its geometric extrapolation.
For the bias reduced kernel density estimator presented in Section 2, we may find that part of the curve is under zero, especially at the tails.Taking standard normal density as an example, at point x = 4 the estimator may give a negative value.Apparently, this is unreasonable.Though in Remark 2.2 we suggest a modified version of the estimator, further work is necessary to deal with this problem.

3 and 1 3 −
have integral one.In order to improve the MSE of order kernel estimator, one has to relax the constraint of integrating to one.The powers 4 are selected to reduce the bias of the ordinary kernel estimator to ( ) 4 O h .Consequently, the MSE of

Table 1 .
Bias, variance and MSE of different kernel density estimators evaluated at x = 0.

Table 2 .
Bias, variance and MSE of different kernel density estimators evaluated at x = 0.5.

Table 3 .
Bias, variance and MSE of different kernel density estimators evaluated at x = 1.

Table 4 .
Bias, variance and MSE of different kernel density estimators evaluated at x = 1.5.

Table 5 .
Bias, variance and MSE of different kernel density estimators evaluated at x = 2.

Table 6 .
Bias, variance and MSE of different kernel density estimators evaluated at x = 2.5.

Table 7 .
Bias, variance and MSE of different kernel density estimators evaluated at x = 3.