^{1}

^{1}

^{2}

^{2}

In this paper, auxiliary information is used to determine an estimator of finite population total using nonparametric regression under stratified random sampling. To achieve this, a model-based approach is adopted by making use of the local polynomial regression estimation to predict the nonsampled values of the survey variable y. The performance of the proposed estimator is investigated against some design-based and model-based regression estimators. The simulation experiments show that the resulting estimator exhibits good properties. Generally, good confidence intervals are seen for the nonparametric regression estimators, and use of the proposed estimator leads to relatively smaller values of RE compared to other estimators.

Sample surveys’ main objective is to obtain information about the population, and then use such information to make inference about some population quantities. The information that is mostly sought about the population is usually aggregate values of various population characteristics, total number of units, proportion of units having certain attributes. The information can be collected by either sampling methods or census. One of the approaches to using auxiliary information in construction of estimators is by assuming a working model that describes the relationship between the survey variable and the auxiliary variable. Estimators are then derived based on this model. At this stage, estimators are sought to have good efficiency given that the model is true. In most cases, a linear model is assumed. Generalized regression estimators by [

A variety of approaches exist for construction of more efficient estimators for population total or mean, and they include model-based and design-based methods. Model-based approach in sample surveys is based on superpopulation models, which assumes that the population under study is a realization of a random variable having a superpopulation model

In this paper, auxiliary information is used to determine an estimator of finite population total using nonparametric regression under stratified random sampling. To achieve this, a model-based approach is adopted by making use of the local polynomial regression estimation to predict the nonsampled values of the survey variable y. Stratified estimators for finite population total

Consider a population consisting of N units. Suppose this population is divided into H disjoint strata, each of size

Let

From each stratum, a simple random sample of size

Let

The population total is defined as

which can rewritten as

where

Once the sample has been observed, the problem of estimating Y becomes the problem of predicting the sum of the nonsampled

The first component in Equation (1) is known while the second requires prediction which is the focus in this paper. In this paper, local polynomial regression method will be used to predict the unknown

Suppose the distribution generating

where

Then it follows that

where

In practice, the values of

Then a model-based local polynomial regression estimator of the nonsampled

where

holds as long as

Now denoting the estimator for the finite population total by

and the estimator for the finite population total is

with

In this section, a study is carried out on various properties of estimator (8), which may be important in practice. In doing so, the following assumptions are made:

1) The regression function

2) The marginal density,

3) The conditional variance

4) The kernel density function

following:

for

These conditions on

Now consider the difference:

and taking expectation yields

since

i.e.

which is the bias associated with

Approximating

Letting

and applying expectations then

Theorem 3 of [

So that

It implies that

The estimator (8) has the MSE

which can be decomposed as

Theorem 1 of [

Observe that Equation (24) tends to zero if

This shows that

In this section, a study is carried out on the practical performance of several estimators (see

The first estimator is design-based, the second one is parametric and model-based while the last two are nonparametric and model-based.

The working model is taken to be

with

The errors are assumed to be independent and identically distributed (i.i.d) normal random variables having mean 0 and standard deviation

Horvitz-Thompson | [ | |
---|---|---|

Linear regression | [ | |

Mixed Ratio | [ | |

Local polynomial with | Equation (8) |

population values

Epanechnikov kernel,

is used for kernel smoothing on each of the populations. In each case, bandwidth values

Data simulations, the estimators and computations were obtained using R Software on a desktop.

To analyze the performance of the proposed estimator against some specified estimators, relative absolute bias (RAB) is computed as

and the relative efficiency (RE) with respect to the Horvitz-Thompson (HT) estimator is computed as

The relative efficiency (RE) is meant to examine the robustness of the various estimators against the proposed estimator.

The confidence intervals (CI) and the average lengths (AL) of the confidence intervals of various estimators are also computed as follows:

where

The results of this simulation study are summarized in

Estimator | Formulae |
---|---|

Horvitz-Thompson, | |

Linear regression estimator, | |

Mixed Ratio Estimator, | |

Proposed Model-based Local polynomial with |

The confidence intervals and average length of the intervals are also measured for each case. A smaller length is better because it implies that the true population total is captured within a smaller range and therefore results are more precise.

The estimators

In most scenarios,

When the model is completely misspecified as in the Sine and Jump populations, a greater efficiency can be achieved by the nonparametric regression estimators. This can be seen in

When the underlying superpopulation model is completely unknown, a reasonable choice for finite population total estimation would be the nonparametric estimators such as

In this study,

Despite

Population | b | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|

RAB | RE | RAB | RE | RAB | RE | RAB | RE | ||||

Linear | 0.3465724 | 0.03212401 | 1 | 0.005778929 | 0.03155733 | 0.03321496 | 1.067811 | 0.03201888 | 0.9959899 | ||

0.4 | 0.03212401 | 1 | 0.005778929 | 0.03155733 | 0.0335352 | 1.089573 | 0.0320533 | 0.9965037 | |||

1 | 0.03212401 | 1 | 0.005778929 | 0.03155733 | 0.03434122 | 1.144951 | 0.03210449 | 0.9991698 | |||

2 | 0.03212401 | 1 | 0.005778929 | 0.03155733 | 0.03272264 | 1.037753 | 0.03212023 | 0.9997907 | |||

Estimated Total | b = 0.3465724 | 1941.427 | 1943.161 | 1939.52 | 1941.248 | ||||||

b = 0.4 | 1941.427 | 1943.161 | 1938.807 | 1941.167 | |||||||

b = 1 | 1941.427 | 1943.161 | 1937.391 | 1941.419 | |||||||

b = 2 | 1941.427 | 1943.161 | 1940.336 | 1941.424 | |||||||

Population Total | 1943.052 | ||||||||||

Sine | 0.3465724 | 0.01855193 | 1 | 0.03836453 | 4.286723 | 0.02072086 | 1.243534 | 0.01657321 | 0.7990398 | ||

0.4 | 0.01855193 | 1 | 0.03836453 | 4.286723 | 0.02082649 | 1.255919 | 0.01685303 | 0.826246 | |||

1 | 0.01855193 | 1 | 0.03836453 | 4.286723 | 0.0201947 | 1.183826 | 0.01810882 | 0.9576443 | |||

2 | 0.01855193 | 1 | 0.03836453 | 4.286723 | 0.01895357 | 1.043951 | 0.0184607 | 0.9908383 | |||

Estimated Total | b = 0.3465724 | 4071.066 | 4114.031 | 4080.316 | 4056.493 | ||||||

b = 0.4 | 4071.066 | 4114.031 | 4081.685 | 4054.513 | |||||||

b = 1 | 4071.066 | 4114.031 | 4079.156 | 4066.007 | |||||||

b = 2 | 4071.066 | 4114.031 | 4073.04 | 4070.166 | |||||||

Population Total | 4071.383 | ||||||||||

Bump | 0.3465724 | 0.03109618 | 1 | 0.01449569 | 0.2130984 | 0.03243536 | 1.085912 | 0.03100986 | 0.9935966 | ||

0.4 | 0.03109618 | 1 | 0.01449569 | 0.2130984 | 0.03289121 | 1.116063 | 0.03319303 | 1.123072 | |||

1 | 0.03109618 | 1 | 0.01449569 | 0.2130984 | 0.03357809 | 1.165075 | 0.0321397 | 1.061732 | |||

2 | 0.03109618 | 1 | 0.01449569 | 0.2130984 | 0.03165829 | 1.036739 | 0.03106365 | 0.9988702 | |||

Estimated Total | b = 0.3465724 | 2186.49 | 2192.769 | 2188.266 | 2172.2 | ||||||

b = 0.4 | 2186.49 | 2192.769 | 2195.394 | 2151.329 | |||||||

b = 1 | 2186.49 | 2192.769 | 2200.689 | 2161.91 | |||||||

b = 2 | 2186.49 | 2192.769 | 2189.318 | 2182.232 | |||||||

Population Total | 2187.923 | ||||||||||

Jump | 0.3465724 | 0.004845022 | 1 | 0.02483609 | 26.07389 | 0.005616896 | 1.353566 | 0.007676967 | 2.274792 | ||

0.4 | 0.004845022 | 1 | 0.02483609 | 26.07389 | 0.0056205 | 1.35023 | 0.007750974 | 2.329744 | |||

1 | 0.004845022 | 1 | 0.02483609 | 26.07389 | 0.005181882 | 1.155266 | 0.005505162 | 1.259671 | |||

2 | 0.004845022 | 1 | 0.02483609 | 26.07389 | 0.004852543 | 1.006773 | 0.004872778 | 1.006966 | |||

Estimated Total | b = 0.3465724 | 3299.185 | 3321.699 | 3288.857 | 3322.128 | ||||||

b = 0.4 | 3299.185 | 3321.699 | 3288.415 | 3322.202 | |||||||

b = 1 | 3299.185 | 3321.699 | 3291.326 | 3309.116 | |||||||

b = 2 | 3299.185 | 3321.699 | 3297.485 | 3300.881 | |||||||

Population Total | 3300.252 | ||||||||||

Population | b | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

LCL | UCL | AL | LCL | UCL | AL | LCL | UCL | AL | LCL | UCL | AL | ||

Linear | 0.3465724 | 1905.431 | 1977.423 | 71.992 | 1919.139 | 1967.183 | 48.044 | 1934.86 | 1944.18 | 9.32 | 1936.249 | 1946.247 | 9.998 |

0.4 | 1905.431 | 1977.423 | 71.992 | 1919.139 | 1967.183 | 48.044 | 1934.250 | 1943.364 | 9.114 | 1936.169 | 1946.165 | 9.996 | |

1 | 1905.431 | 1977.423 | 71.992 | 1919.139 | 1967.183 | 48.044 | 1933.711 | 1941.071 | 7.360 | 1936.418 | 1946.420 | 10.002 | |

2 | 1905.431 | 1977.423 | 71.992 | 1919.139 | 1967.183 | 48.044 | 1936.733 | 1943.938 | 7.206 | 1936.424 | 1946.424 | 9.999 | |

Population Total | 1943.052 | ||||||||||||

Sine | 0.3465724 | 4026.580 | 4115.552 | 88.973 | 4044.296 | 4183.766 | 139.470 | 4074.654 | 4085.978 | 11.324 | 4050.937 | 4062.049 | 11.113 |

0.4 | 4026.580 | 4115.552 | 88.973 | 4044.296 | 4183.766 | 139.470 | 4076.156 | 4087.213 | 11.057 | 4049.014 | 4060.012 | 10.998 | |

1 | 4026.580 | 4115.552 | 88.973 | 4044.296 | 4183.766 | 139.470 | 4074.650 | 4083.661 | 9.012 | 4060.254 | 4071.760 | 11.506 | |

2 | 4026.580 | 4115.552 | 88.973 | 4044.296 | 4183.766 | 139.470 | 4068.589 | 4077.491 | 8.902 | 4064.498 | 4075.834 | 11.336 | |

Population Total | 4071.383 | ||||||||||||

Bump | 0.3465724 | 2146.545 | 2226.434 | 79.889 | 2156.490 | 2229.048 | 72.558 | 2183.234 | 2193.299 | 10.065 | 2166.839 | 2177.560 | 10.721 |

0.4 | 2146.545 | 2226.434 | 79.889 | 2156.490 | 2229.048 | 72.558 | 2190.473 | 2200.315 | 9.842 | 2145.980 | 2156.678 | 10.698 | |

1 | 2146.545 | 2226.434 | 79.889 | 2156.490 | 2229.048 | 72.558 | 2196.621 | 2204.758 | 8.137 | 2156.582 | 2167.238 | 10.656 | |

2 | 2146.545 | 2226.434 | 79.889 | 2156.490 | 2229.048 | 72.558 | 2185.320 | 2193.315 | 7.995 | 2176.909 | 2187.554 | 10.645 | |

Population Total | 2187.923 | ||||||||||||

Jump | 0.3465724 | 3290.027 | 3308.344 | 18.317 | 3127.463 | 3515.934 | 388.471 | 3287.902 | 3289.813 | 1.912 | 3321.078 | 3323.179 | 2.101 |

0.4 | 3290.027 | 3308.344 | 18.317 | 3127.463 | 3515.934 | 388.471 | 3287.47 | 3289.36 | 1.89 | 3321.172 | 3323.232 | 2.060 | |

1 | 3290.027 | 3308.344 | 18.317 | 3127.463 | 3515.934 | 388.471 | 3290.409 | 3292.244 | 1.835 | 3308.167 | 3310.065 | 1.898 | |

2 | 3290.027 | 3308.344 | 18.317 | 3127.463 | 3515.934 | 388.471 | 3296.569 | 3298.401 | 1.832 | 3299.932 | 3301.829 | 1.897 | |

Population Total | 3300.252 |

Additionally, a keen look at the estimated totals in

In this study, performance of the proposed estimator has been investigated against some design-based and model-based regression estimators. The RE values of the proposed estimator are in general close to one. It has been shown that for whichever bandwidth considered,

Special thanks to the African Union (AU) for the funding that saw the success of this research.

Syengo, C.K., Pyeye, S., Orwa, G.O. and Odhiambo, R.O. (2016) Local Polynomial Regression Estimator of the Finite Population Total under Stratified Random Sampling: A Model- Based Approach. Open Journal of Statistics, 6, 1085-1097. http://dx.doi.org/10.4236/ojs.2016.66088