Nonparametric Regression Estimation with Mixed Measurement Errors

We consider the estimation of nonparametric regression models with predictors being measured with a mixture of Berkson and classical errors. In practice, the Berkson error arises when the variable X of interest is unobservable and only a proxy of X can be measured while the inaccuracy related to the observation of the proxy causes an error of classical type. In this paper, we propose two nonparametric estimators of the regression function in the presence of either or both types of errors. We prove the asymptotic normality of our estimators and derive their rates of convergence. The finite-sample properties of the estimators are investigated through simulation studies.


Introduction
Let ( ) ( ) , , , , denote a sequence of independent and identically distributed random vectors.In traditional non-parametric regression model analysis, one is interested in the following model ( ) where ( ) .g is assumed to be a smooth, continuous but unknown function; the random errors i e are assumed to be normally and independently distributed with mean 0 and constant variance 2 σ ; and 1, , i n =  .Here, the predictor X is usually assumed to be directly observable without errors.Both the direct observation and error-free assumptions are however seldom true in most epidemiologic studies.For the violation of the error-free assumption, [1] considered an environmental study which studied the relation of mean exposure to lead up to age 10 (denoted as X) with intelligence quotient (IQ) among 10-year-old children (denoted as Y) living in the neighborhood of a lead smelter.Each child had one measurement made of blood lead (denoted as W), at a random time during their life.The blood lead measurement (i.e., W) became an approximate measure of mean blood lead over life (X).However, if we were able to make many replicate measurements (at different random time points), the mean would be a good indicator of lifetime exposure.In other words, the measurements of X are subject to errors and W is a perturbation of X.In the measurement error literature, this is known as the classical error model and Model (1) becomes ( ) , , where ( ) , , , 1, , e X i n ε =  , are mutually independent and ε represents the classical measurement error variable.Various methods and approaches for analyzing Model (2) such as deconvolution kernel approaches (e.g., [2] [3] [4]), design-adaptive local polynomial estimation method (e.g., [5]), methods based on simulation and extrapolation (SIMEX) arguments (e.g., [6] [7] [8] [9]), and Bayesian approach (e.g., [10]) have been extensively studied in the literature.
In many studies, it is however too costly or impossible to measure the predictor X exactly or directly.Instead, a proxy W of X is measured.For the violation of the direct observation assumption, [1] modified the aforementioned environmental study in which the children's place of residence at age 10 (assumed known exactly) were classified into three groups by proximity to the smelter-close, medium, far.Random blood lead samples, collected as describe in the aforementioned design, were averaged for each group (denoted as W), and this group mean used as a proxy for lifetime exposure for each child in the group.Here, the same approximate exposure (proxy) is used for all subjects in the same group, and true exposures, although unknown, may be assumed to vary randomly about the proxy.This is the well-known Berkson error model.In other words, the predictor X are not directly observable and measurements on its surrogates W are available instead.The true predictor X is then a perturbation of W. The model of interest now becomes ( ) , , where ( ) , , are mutually independent.Model (3) was first considered by [11] and the estimation of the linear Berkson measurement error models was discussed in [12].Methods based on least squares estimation ( [13]), minimum distance estimation ([14] [15]), regression calibration ( [16]) and trigonometric functions ( [17]) have been studied.
The stochastic structure of Model (3) is fundamentally different from Model (2).
Here, the measurement error of Model (2) is independent of X, but dependent on W.
This distinctive feature leads to completely different procedures in estimation and inference for the models.In particular, nonparametric estimators that are consistent in Model (2) are no longer valid in Model (3), and vice versa.In most of the existing literature, the measurement error is supposed to be only one of the two types.In the Berkson model (3), it is usually assumed that the observable variable W is measured with perfect accuracy.However, this may not be true in some situations.In such cases, W is observed through V W ε = + , where ε is a classical measurement error.[18] presented a good discussion of the origins of mixed Berkson and classical errors in the context of radiation dosimetry.Under this mixture of measurement errors, we observe a random sample of independent pairs ( ) where and i e are mutually independent, and the respective error densities f δ and f ε are assumed to be known.Due to its potentially wide applications, statistical procedures for analyzing Model (4) has received more attention recently.For instance, a regression calibration approach was proposed by [19] and [20] in a parametric context of random exposure.[21] considered a bayesian approach for a semi-parametric regression function.[22] developed a nonparametric density estimation approach for contaminated data with a mixture of Berkson and classical errors but without further extending to estimate the regression function.[23] proposed a two-step nonparametric kernel method for estimating the regression function but its calculation is complicated.In this paper, we propose two nonparametric estimators for the regression function curve ( ) .g with the predictor X being measured with either classical error, Berkson error, or a combination of both.
The difficulty primarily depends on the relative smoothness of the error densities f δ and f ε .When f δ is smooth enough (relative to f ε ), we are able to construct a nonparametric estimator that converges to the target curve at the parametric n rate.
For less smooth density f δ , we propose a kernel estimator that converges at rates ranging from n to rates that are close to the deconvolution rates.This paper is organised as follows.In Section 2, we propose estimators for the regression function curve ( ) .g .We then derive the asymptotic normality of our estimators under some regularity conditions and give the rates of convergence in Section 3. Section 4 presents some numerical results from simulation studies.A brief discussion will be given in Section 5.All technical results and proofs are deferred to the Appendix.

Proposed Estimators
, , , , be a random sample from Models (4), and ( ) V and i ε , respectively.We have the following relationships: where ( ) ( ) ( ) As a result, we propose the following estimator for ( ) Example 1 Let the error densities f δ and f ε in Model (4) be normal densities with mean zero and variances 2 δ σ and 2 ε σ , respectively.It follows that ( ) ( ) is the characteristic function of another normal random variable.By (6), the estimator of ( ) g x can be written as where ( ) ( ) is the density of the ( ) is not integrable, and the estimators ( 5) and ( 6) can not be calculated.To overcome this issue, we propose an alternative approach for estimating ( ) g x .Using a kernel function ( ) K x with a bandwidth h, we consider the following kernel estimator for ∑ and an estimator for ( ) where ( ) φ is the characteristic function of the kernel function ( ) K x .Proceeding as above, we get an alternative estimator of ( ) , where Therefore, when ( 6) is no longer valid, we propose the following estimator for ( ) Remark 1 To ensure that the proposed estimator ( 9) is well-behaved, we need to make the following assumption.
( ) ( ) ( ) .In this case, to ensure (A2) to be valid, it is rather common to choose kernels that have a compactly supported characteristic function . For example, we choose the sinc kernel ( ) ( ) . From (8), we have The above two nonparametric estimators of ( ) X f x were given by [22]; 2. When the variance of ε in Models ( 4) is equal to 0, which is the Berkson error model, the estimator (6) becomes where ( ) is the density function of δ ; and; 3. When the variance of δ in Models ( 4) is equal to 0, which is the classical error model, ( ) given in (9) reduces to the estimator of [2].

Theoretical Properties
In this section, we study asymptotic properties of the estimators proposed in Section 2.
In particular, the properties of the estimator ( )

Asymptotic Results for g 
In this section, we investigate the large-sample properties of the estimator ( ) g x  at (9).For this purpose, we present the following regular conditions which are mild and can be found in [2].

( )
K x is a real and symmetric kernel and has finite moment of order k.Namely, is bounded for all u and some 0 η > .Let ( ) ( ) The mean squared error (MSE) of the estimator ( ) Theorem 1 ((MSCE)) Suppose that Conditions A and B hold.Then, for each x such that where Explicit rates of convergence of the estimator ( ) g x  can be found by examination of the asymptotic behaviour of the MSE.For the bias, using the Taylor expansion of the first term on the right-hand side of Equation (11), we have Bias , The second term on the right-hand side of Equation ( 11) describes the variance of ( ) The asymptotic behaviour of this term is more difficult to evaluate since it depends on the tail behaviour of the ratio ( ) ( ) , as [14] discussed, which can be classified into the following: 1.An exponential ratio of order β is ( ) ( ) ( ) ( ) 2. A polynomial ratio of order β is ( ) ( ) with 0 d , 1 0 d > and 1 β ≤ .

Asymptotic Mean Squared Error (AMSE)
In this section, we study the asymptotic behaviour of the MSE where ( ) ( ) t t δ ε φ φ behaves like an exponential or a polynomial.Theorem 2 Suppose that Conditions A and B hold and that the first half inequality of (12) is satisfied.Assume that ( ) . Then, for each x such that κ being some positive constant and When ( ) , we obtain a slower logarithmic rate which is similar to the deconvolution rate for supersmooth error given in [2].More precisely, the optimal bandwidth is of order ( ) ( ) and the estimator ( ) then converges at the rate of ( ) Theorem 3 Suppose Conditions A and B hold, and that ( ) under the polynomial ratio (13), for each x such that κ being some positive constant, and ( ) behaves like a polynomial ratio of order β in the tail, the convergence rates range from n to deconvolution rate of ordinary smooth error of [2].More precisely, the optimal bandwidth is of order ( ) , and the estimator ( ) g x  then converges at the rate of n .When 1 2

β <
, the optimal bandwidth is of order ( ) and the estimator ( ) converges at the rate of ( )

Asymptotic Normality
The theorem below establishes asymptotic normality in the exponential ratio case.Theorem 4 Under the conditions of Theorem (2), and for bandwidth ( ) ( ) The next theorem establishes asymptotic normality in the polynomial ratio case.
Theorem 5 Suppose that Conditions A and B hold and that the inequality of ( 13) is satisfied.Assume that The proofs of all theorems are postponed to the Appendix.

Unknown Measurement Error Distribution
When the error densities are unknown, they can be readily estimated from additional observations (e.g., a sample from the error densities, replicated data or external data) and these estimates can be substituted into ( 6) and ( 9) to produce the estimate of ( ) g x .For sufficiently large sample size, the rates of convergence of the estimates remain unchanged when δ φ and ε φ are replaced by their consistent estimators (e.g., [4] [17] [24]).

Simulation Studies
We study numerical properties of the estimators proposed in Section 2. Note that we have defined two estimators, at ( 6) and ( 9).The first exists when ( ) ( ) φ φ is integrable, and the estimator ( 9) otherwise.We use the notations ĝ and g  for the esti- mators ( 6) and ( 9) respectively.We use the notation ˆI g for the estimator that ignores the errors, that is, the estimator is the classical Nadaraya-Watson estimator of g based on direct data from ( ) In addition, we use ˆC g for the estimator of [23].
We apply the various estimators introduced above to some simulated examples (see, [23]):  ) ( ) , as follows.We generate a random sample 1 , , n δ δ where the errors i e are normally distributed with zero mean and variance 2  , where the subscript j − meant that the estimator was constructed without using the jth observation.
We report the Integrated Squared Error, is the estimator considered.In all graphs, to illustrate the performance of an estimator, we show the estimated curves corresponding to the first (Q1), second (Q2) and third (Q3) quartiles of the ordered ISEs.The target curve is always represented by a solid curve.In the tables we provide the average values, denoted by MISE, of the 200 calculated ISEs. Figure 1 and Table 1 illustrate the way in which the estimator improves as sample size increases.We compare, for various sample sizes, the results obtained for estimating curve (a) when For any nonparametric method for regression problem, the quality of the estimator also depends on the discrepancy of the observed sample.That is, for any given family of densities W f , f δ and f ε , and any given the noise-to-signal ratios ( ) σ σ σ σ , the performance of the estimator depends on the variances of W , δ and ε .Here, we compare the results obtained from estimating curve (c) for different values of The solid curve is the target curve., , W δ ε σ σ σ .As expected, Figure 2 shows that the best performance usually occur for smaller error variance (e.g., ( , , 0.5, 0.05, 0.15 ).It is noteworthy that the effect of the variances on the estimator performance is obvious in model ( 4).
Finally, we compare ĝ (or g  ), ˆI g and ˆC g .Figure 3 shows the boxplots of the quantities of ( ) , where ISE O is the ISE of our proposed estimator, ISE I is the ISE of the estimator that ignores the errors, and ISE C is the ISE of the estimator of [23].Here, each boxplot is constructed from 200 samples.Here, in panel The more a boxplot is located below the zero horizontal line, the better our method compared with the other two estimators.In the same situation, Table 2 and Table 3 report the average integrated square error (MISE) for estimating curves (b) and (c) respectively.As expected, our proposed estimator substantially outperformed the estimator that completely ignores any measurement errors.Our results show that our proposed estimator usually works better than the estimator proposed by [23] for estimating curves (a) and (b).It is noteworthy that the estimator proposed by [23] may perform better than our proposed estimator when curve (c) with

Discussion
In this paper, we propose a new method for estimating non-parametric regression models with the predictors being measured with a mixture of Berkson and classical errors.The method is based on the relative smoothness of δ φ and ε φ .When δ φ is  ( )  we propose a kernel estimator ( 9) that converges at rates ranging from n to rates that are close to the deconvolution rates.Numerical results show that the new estimators are promising in terms of correcting the bias arising from the errors-invariables.It generally preforms better than the approach proposed by [23].The methodology can be readily extended to the prediction problem of nonparametric errors-invariables regression (see, e.g., [16]).Extension of our method to the problems considered in [5] is of future research interest.

Proofs of the
We use the same model as in Example 1 with x at (6) are clear.It is easy to check that the numerator and the denominator are both unbiased estimators of ( ) ( ) X g x f x and ( ) X f x , respectively and that, ( ) ĝ x converges at the fast parametric n rate.Properties of the estimator ( ) g x  at (9) need to further explore and, in what follows, we derive them.

h
→ as n → ∞ , for each x such that Bias g x  is the same as given in Theorem (4) and ( ) ( ) var g x  is equal to the second term on the right-hand side of Equation (11).
each of the above regression functions, we generate 200 data sets of n randomly sampled vectors

(
. In our simulations we consider sample sizes 50,100, 250 n = , and in each case we generate 200 samples from the distribution of the random vector ( ) , V Y .Except if stated otherwise, we adopt the second order kernel K corresponding to ( ) ( ) [ ] ( ) necessary to calculate g  and ˆC g .For the bandwidth h, it is necessary to calculate g  , ˆI g and ˆC g , we select the value h that minimises the cross-validation (CV) criterion,

Figure 2 .
Figure 2. Estimation of function (c) for samples of size 250 n = , when

Figure 3 .
Figure 3. Boxplots of the quantities of log(ISEO/ISEI) (row 1) and log(ISEO/ISEC) (row 2) for estimating regression curve (a) when ( ) Normal 0, 2 W  and 250 n = , for various error , and below, C denotes a generic positive and finite constant.Proof.It follows from (A2) of Condition A that

∫
The proof for the other result is similar and requires Parseval's Theorem.From (14) and Lemma 1, we have

Table 3 .
MISE for estimation of curve (c) when φ ), we propose a nonparametric estimator (6) that converges to the target curve at the parametric n rate.For less smooth function δ φ ,