A Multiplicative Bias Correction for Nonparametric Approach and the Two Sample Problem in Sample Survey

Let two separate surveys collect related information on a single population U. Consider situation where we want to best combine data from the two surveys to yield a single set of estimates of a population quantity (population parameter) of interest. This Article presents a multiplicative bias reduction estimator for nonparametric regression to two sample problem in sample survey. The approach consists to apply a multiplicative bias correction to an estimator. The multiplicative bias correction method which was proposed, by Linton & Nielsen, 1994, assures a positive estimate and reduces the bias of the estimate with negligible increase in variance. Even as we apply this method to the two sample problem in sample survey, we found out through the study of it asymptotic properties that it was asymptotically unbiased, and statistically consistent. Furthermore an empirical study was carried out to compare the performance of the developed estimator with the existing ones.


Introduction
Sometimes, it happens that two separate surveys gather related information on a variable of interest of a population, U, having perhaps distinct designs and mode of sampling. It becomes very important on how to combine the data from the two surveys.
Take as example, the students of the sub-regional institute of statistics and apply economics (ISSEA), and those of the polytechnic institute, both in different ways with different importances to collect data on unemployment in The approach to this problem have been in different ways; one of which involve getting estimates of the two surveys separately and using the inverse of the estimated variances as weights to weigh them together as seen in [1]. [2] went further by using empirical likelihood method to combine information from multiple survey. Another option to this consist of putting the two data sets in a single data set, taking into account the weight on individual sample units.
Developed in [3] are some of these methods which include; the pseudolikelihood, missing information principle and iterated post-stratified estimator.
After simulations on two different populations, it was concluded that, in neither population the design based ways of combining data yield best results. The iterated post-stratified estimator looks to be a very promising non-parametric way to combined data from two sources.
Just recently [4] used the Nonparametric regression, which is the model-based sampler's method of choice when there is a serious doubt about the suitability of a linear or other simple parametric models for the survey data at hand. The nonparametric regression supersedes the need for use of design weights and standard design-based weights. Recognition of this is especially helpful in confronting problems in sampling situations where design weights are missing or questionable.
This study made use of kernel smoothers, especially the Nadaraya Watson smoother. However, estimators based on Nadaraya Watson smoothing weights are normally biased in small samples and at boundary points.
There exist alternative techniques of reducing the bias. For a detailed review see [5]- [11]. These methods improve the performance of nonparametric regression at points of large curvature. But in this framework, we consider a multiplicative bias correction approach to nonparametric regression to have an estimate with a smaller bias than existing ones.

Outline of the Paper
The remaining part of this paper is organized as follows: In Section 2, a multiplicative bias corrected estimator ˆM BC T for the finite population totals is proposed. In Section 3, the asymptotic properties of the proposed estimator are derived. In Section 4, an empirical study of the derived properties is presented.
In Section 5 we give a conclusion to the paper.

Proposed Estimator
Consider a finite population, 1, 2, , U N =  and let 1 2 , , , n y y y  represent the combined random sample drawn from the population using different sampling where, s refers to the sample and r refers to the nonsampled part of the population. Since the values of the sample part is known, the process of estimating the finite population total is equivalent to predicting the nonsample part of the population.
To do this, the multiplicative bias corrected technique is employed in which case the proposed estimator of the population total is now defined as where i π is the inclusion probability ( ) i h x is the multiplicative bias corrected estimator.
The principal objective of the multiplicative bias corrected technique is to correct the insufficiences of the kernel smoother that is the bias problem at the boundaries. Given a pilot smoother of the regression function The inverse relative estimation error of the smoother at each of the observations is given by ( ) Above gives a better estimate for the inverse of the relative estimation error at each particular observation and can therefore be used as a multiplicative correction of the pilot smoother.
, we use the same weighting scheme; where h is the bandwidth K is a probability density function, symmetric about zero. n is the sample size
• Implements a rule-of-thumb for choosing the bandwidth of a Gaussian kernel density estimator (ndr0) • Can use a more common variation given by Scott (1992) (ndr)

Assumptions
The following assumptions are made in the estimation of ( ) i h x .
• The regression function is bounded and strictly positive, that is, The regression function is twice continuously differentiable everywhere.
•  has finite fourth moments and has a symmetric distribution around zero.
• The bandwidth h is such that, 0

Asymptotic Unbiasedness of the Proposed Estimator
We want to show that Now, we have the expected value of the proposed estimator for the finite population total given by; is obtained by analysing the individual terms of the stochastic approximation of ( ) h x . Let us then establish the stochastic approximatiom of ( ) h x as shown by (Hengartner 2009).
From (8), Through the series expansion, is an approximation of the quantity R.
Replacing both j Y and j R in (16), we obtain Using the assumption nh → ∞ the remainder term turns to zero in To solve Equation (16), we need to find The above expression can be reduced by considering a limited Taylor series of Now, substituting the first two terms in (18) gives

Asymptotic Variance of the Proposed Estimator
The variance of the finite population total is given by; Using the assumption nh → ∞ , the remainder terms converge to zero in probability. Therefore and Equation (22) reduces to Truncating the binomial expansion at the first term yields ( ) Thus the asymptotic variance is given by This implies that ˆM BC T is more efficient than the usual non-parametric regression estimator proposed by Dorfman (1992).

Asymptotic Mean Square Error
The asymtotic mean square error of the estimator ˆM BC T is given by As n → ∞ and h → ∞ , the ˆM BC MSE T     turns to 0 indicating that, the proposed estimator is statistically consistent.

Population
In this section, the theory developed in the previous section was tested using a set of simulation studies, with a mix of survey designs, and employing various approaches to selecting the best bandwidths. We employ a population U of countries in the world of size, N = 188, with auxiliary variable x = gross national product (GNI) and variable of interest y = human development index(HDI), of interest is the population total of the HDI,   in each, so that 2 32 n = . The total experiment consists of 500 runs of pairs of samples. Table 1 gives the estimators considered.
For an estimator T we considered three measures of relative success across the 500 runs:

Results
Results obtained are tabulated in Table 2.
From the results obtained, we observe that the unbiased cross validation approach is a viable means of selecting bandwidth as it gives the lowest bias and root mean square error across all the estimators. The proposed estimator to the two sample problem gives better estimates of the population total compared to those realized using the estimator proposed by [12], and [4] respectively.

Conclusion
The aim of this study was to develop an estimator with the lowest bias for the finite population total using the multiplicative bias corrected approach to non parametric regression. This study reveals that the proposed estimator is more efficient than the modified nonparametric estimator (NPT). With a suitable bandwidth selection (ucv), the proposed estimator has the smallest bias and root mean square error values. It has therefore proven to be efficient in resolving the boundary value problem that is associated with the existing nonparametric smoothers.