Estimation of Nonparametric Regression Models with Measurement Error Using Validation Data ()
1. Introduction
Consider the following nonparametric regression model of a scaler response Y on multi-covariates
(1)
where
is an unknown function and
is a noise variable with
and
. It is not uncommon that Z is measured exactly but X is measured with error and instead only its surrogate variable W can be observed. Throughout we assume
(2)
which is always satisfied if, for example, W is a function of X and some independent noise (see [1] ).
The relationship between the true variable and the surrogate variable can be rather complicated. Misspecification of this relationship may lead to a serious misinterpretation of the data. Common solution is to use the help of validation data to infer the missing information. To be specific, one observes independent replicates
,
, of
rather than
, where the relationship between
and
may or may not be specified. If not, the missing information for the statistical inference will be taken from a sample
,
, of so-call validation data independent of the primary (surrogate) sample. We aim at estimating the unknown function
by using the surrogate data
and the validation data
.
Recently, statistical inference based on surrogate data and a validation sample has attracted considerable attention (see [2] - [13] ), and the above referenced authors developed suitable methods for different models. However, all these works mostly are concerned with the parametric or semi-parametric relationships between covariates and responses, and these approaches are difficult to generalize to nonparametric regression model. [14] and [15] proposed two nonparametric estimators for nonparametric regression model with measurement error using validation data, but their methods are not applicable to our problem since [14] assumes the response rather than the covariable is measured with error, and the method proposed by [15] applies for one-dimensional explanatory variable only.
This article is organized as follows. In Section 2 we propose a regularization- based method. Under general regularity conditions, we give the convergence rate of our estimator in Section 3. Section 4 provides some numerical results from simulation studies, whereas proofs of the theorems are presented in Appendix.
2. Description of the Estimator
Recall model (1) and the assumptions below it. We assume that X, W and Z are all real-valued random variables. The extension to random vectors complicates the notation but does not affect the main ideas and results. Without loss of generality, let the supports of X, W and Z all be contained in
(otherwise, one can carry out monotone transformations of X, W and Z).
Let
and
denote respectively the joint density of
and marginal density of
. Then, according to (2), we have
(3)
Let
and
Define the operator
as
where
is any function on
. So that Equation (3) is equivalent to the operator equation
(4)
According to Equation (4), the function g is the solution of a Fredholm integral equation of the first kind, and this inverse problem is known to be ill- posed and needs a regularization method. A variety of regulation schemes are available in the literature (see e.g. [16] ) but we focus in this paper on the Tikhonov regularized solution:
(5)
where the penalization term
is the regularization parameter.
We define the adjoint operator
of
where
. Then the regularized solution (5) is equivalently:
(6)
To obtain the estimator of
, we consider the orthogonal series method and kernel method. Under some regularity conditions in Section 3, for each
,
and
may be approximated by a truncated orthogonal series,
where
and
Here,
is an orthonormal basis of
which may be trigonometric, polynomial, spline, wavelet, and so on. A discussion of different bases and their properties can be found in the literature (see e.g. [17] , [18] ). Only to be specific, here and in what follows we are considering the normalized Legendre polynomials on
, which can be obtained through the Rodrigues’ formula
(7)
The integer K is a truncation point which is the main smoothing parameter in the approximating series.
Let
where
is a kernel function and
is a bandwidth. We consider the following estimators,
and
Then, for each
, we have
The operators
and
can then be estimated by
Hence, for each
, the estimator of
is
(8)
Remark 2.1. Let
be the
matrix whose
element is
, and
. Then estimator (8) has the following form
(9)
where
is given by
.
3. Theoretical Properties
In this section, we introduce the assumptions that will be used below to study the statistical properties of the estimator. We shall consider the following assumptions:
(A1) (i) The support of
is contained in
; (ii) Conditioning on
, the joint density
of
is square integrable w.r.t the Lesbegue measure on
.
(A2) (i) The r order partial or mixed partial derivative of
with respect to
, and the r order partial derivative of
with respect to z, are both continuous in
and for each
; (ii) The s order partial derivative of
with respect to z, is continuous in
and for each
.
(A3)
is uniformly bounded in
.
(A4) The kernel function
is a symmetrical, twice continuously differentiable function on
, and
for
and
, with
being some finite constant.
Assumption (A1) is sufficient condition for
to be a Hilbert-Schmidt operator and therefore to be compact (see [19] , Theorem 2.34). As a result of compactness, there exists a singular values decomposition. For each
, let
be the sequence of singular values of
, then there exist the two orthonormal sequences
, and
such that:
Note that the regularization bias is
In order to control the speed of convergence to zero of the regularization bias
, we introduce the following regularity space
for
:
We then obtain the following result by applying Proposition 3.11 in Carrasco et al. (2007).
Proposition 3.1. Suppose Assumption (A1) hold, for each
, if
, then we have
, where
.
Therefore, when the regularization parameter
is pushed towards zero, the smoother the function g of interest (i.e.
for larger
) is, the faster the rate of convergence to zero of the regularization bias will be.
Let
and
, we then obtain the following convergence rate of
.
Theorem 3.1. Suppose Assumptions (A1)-(A4) are satisfied. Then, for each
, if
, we have
Specially, let
,
,
,
and
. If
or
, then, for each
, we have
The proofs of all the results are reported in the Appendix.
4. Simulation Studies
In this section, we briefly illustrate the finite-sample performance of the estimator discussed above. We compare our estimator to the standard Nadaraya-Watson estimator (denoted as
) base on the primary dataset
. In fact,
is a gold standard in the simulation study, even if it is practically unachievable due to measurement errors. Moreover, the performance of estimator
is assessed by using the square root of average square errors (RASE)
where
, are grid points at which
is evaluated.
We considered model (1) with the regression function being
and
being distributed as
. The covariate is generated according to
with
and the correlation coefficient between X and Z being 0.6, and
,
. Results for
,
and
are reported. Simulations were run with different validation and primary data sizes
ranging from
to
according to the ratio
and
, respectively. We generate 500 datasets for each sample size of
.
To calculate
, we used the normalized Legendre polynomials as basis and the standard normal kernel (denote
). For
, we used an product kernel
, and the bandwidth was selected by generalized cross-validation approach (GCV). For our estimator
, we used the cross-validation approach to choosing the four parameters
,
,
and
. For this purpose,
,
and
are selected separately as follows.
Define
and
Here, we adopt the cross-validation (CV) approach to estimate
by
where the subscript
denotes the estimator being constructed without using the jth observation. Similarly, we get
. After obtaining
and
, we then select
by
where the subscript
denotes the estimator being constructed without using the ith observation
.
We compute the RASE at
grid points of
. Table 1 presents
![]()
Table 1. The RASE (
) comparison for the estimators
and
.
the RASE for estimating curves
when
,
and
for various sample sizes. It is obvious that our proposed estimator
has much smaller RASE than
. As is expected, our proposed estimating method produces more accurate estimators than the Nadaraya-Watson estimators. Moreover, there is a drastic improvement in accuracy by using our estimator over the Nadaraya-Watson estimator; this improvement increases with
.
Acknowledgments
This work was supported by GJJ160927 and Natural Science Foundation of Jiangxi Province of China under grant number 20142BAB211018.
Appendix
Proof of Theorem 3.1:
Lemma 6.1. Suppose Assumptions (A1), (A2) (i) and (A4) hold. For each
, we have
Proof of Lemma 6.1.. For each
, by Assumptions (A2) (i) and (A4), we have
where
Note that
are orthonormal and complete basis functions on
. Under Assumptions (A2) (i), for each
, we have
. Then, using Cauchy-Schwarz inequality,
is bounded in absolute value for each
. Hence, we obtain that
Moreover, for each
, we have
where we have used the fact that
is uniformly bounded on
.
We conclude that
(10)
By the triangle inequality and Jensen inequality, we have
Under Assumption (A2) (i), we can show that
(see Lemma A1 of [20] ).
By construction of the estimator, we have
where the last equality is due to (10). The desired result follows immediately.,
Proof of Theorem 3.1. Define
and
. Notice that
where
. Then we have
It follows from Lemma 6.1 that
or
are
. Under Assumptions (A1), we have
and
. Moreover,
and
. The main task remained is to establish the order of the term
. By the triangle inequality and Jensen inequality, we have
Similar to the proof of Lemma 6.1, under Assumptions (A2)(ii), (A3) and (A4), it is easy to show that
Then, according to the Lemma 6.1, we have
Let
,
,
and
, if
or
, combining all these results, we complete the proof.,