A New Estimator Using Auxiliary Information in Stratified Adaptive Cluster Sampling ()
1. Introduction
Adaptive cluster sampling, proposed by Thompson [1], is an efficient method for sampling rare and hidden clustered populations. In adaptive cluster sampling, an initial sample of units is selected by simple random sampling. If the value of the variable of interest from a sampled unit satisfies a pre-specified condition
, that is
, then the unit’s neighborhood will also be added to the sample. If any other units that are “adaptively” added also satisfy the condition
, then their neighborhoods are also added to the sample. This process is continued until no more units that satisfy the condition are found. The set of all units selected and all neighboring units that satisfy the condition is called a network. The adaptive sample units, which do not satisfy the condition are called edge units. A network and its associated edge units are called a cluster. If a unit is selected in the initial sample and does not satisfy the condition
, then there is only one unit in the network. A neighborhood must be defined such that if unit
is in the neighborhood of unit
then unit
is in the neighborhood of unit
. In this paper, a neighborhood of a unit is defined as the four spatially adjacent units, that is to the left, right, top and bottom of that unit as shown in Figure 1.
Figure 1 illustrates the example of a network. The unit with a star is the initial unit selected. The condition to adaptively added units is a value greater than or equal to 1. Units that are to the left, right, top, and bottom of one another making up a neighborhood. The units in the gray shading form a single network. The units in bold numbers are edge units of the network. The network and its edge units make up a cluster.
Adaptive cluster sampling are applied in stratified random sampling. In adaptive cluster sampling, an initial stratified sample is selected from a population, and whenever the variable of interest for any unit is observed to satisfy the condition, the neighborhood of that unit is added in the sample. Sometimes other variables are related to the variable of interest y. We can obtain additional information for estimating the population mean. The use of an auxiliary variable is a common method to improve the precision of estimates of a population mean. In this paper, we will study the estimator of population mean in stratified adaptive cluster sampling using an auxiliary
![](https://www.scirp.org/html/8-1240213\42ccac75-6741-4717-9323-9bdeaa345112.jpg)
Figure 1. The example of network where a unit neighborhood is defined as four spatially adjacent units.
variable. Some comparisons are made using a simulation.
2. Stratified Adaptive Cluster Sampling
For stratified adaptive cluster sampling, the population consists of
units partitioned into
strata based on prior information about units that are similar, and it is assumed that the population ignores crossover between strata. The population in each stratum consists of
units
. The population mean of the variable of interest in stratum h is
. An initial sample of unit size
is selected by simple random sampling without replacement and for those units selected that satisfy the condition. Then the unit’s neighborhood is added to the sample.
Define
![](https://www.scirp.org/html/8-1240213\617a438f-bfdb-42f0-8a40-4a69c83681b3.jpg)
is the average of the y-values of the network to which
belongs.
is the network that include unit i in stratum h and
is the size of network that include unit
in stratum
. The estimator of the population mean based on Hansen-Hurwitz estimator (Thompson and Seber [2]) is
(1)
where
![](https://www.scirp.org/html/8-1240213\7b31165b-502c-465c-821a-368eab3e5556.jpg)
The variance of
is
(2)
where
![](https://www.scirp.org/html/8-1240213\8ae39343-1fc8-45ce-a0ce-b12ac76fad57.jpg)
and
.
The estimate of
is
(3)
where
.
3. Propose Estimators
The estimator of the population mean in stratified adaptive cluster sampling using two auxiliary variables (x,z) is (Walid A. Abu-Dayyeh, M. S. Ahmed, R. A. Ahmed and Hassen A. Muttlak, [3]),
(4)
α1 = 0 and α2 = 0 is called mean per unit, α1 = −1 and
is called multivariate ratio estimator,
and
is called multivariate ratio estimator, α1 = 1 and
is called multivariate product estimator,
and
is called ratio estimator using
,
and
is called ratio estimator using
,
and
is called product estimator using
and
and
is called product estimator using
.
Let
, ![](https://www.scirp.org/html/8-1240213\b60fa989-e90e-4e56-bc47-073ff471b75a.jpg)
and
![](https://www.scirp.org/html/8-1240213\4cbdff9d-5677-4f33-8d47-456ad33b9968.jpg)
So
![](https://www.scirp.org/html/8-1240213\da517653-56a4-4efc-a514-e65930891138.jpg)
![](https://www.scirp.org/html/8-1240213\56620aad-7f20-4710-ac48-58a1f22fe085.jpg)
,
![](https://www.scirp.org/html/8-1240213\cddc130c-ebe9-47c6-8432-ecfb0ba5a80e.jpg)
Thus
, ![](https://www.scirp.org/html/8-1240213\28c00eb7-33b2-4a1c-a312-7ed0f3097612.jpg)
,
,
![](https://www.scirp.org/html/8-1240213\3ec939af-2c0d-4d54-b0b3-cf96fdf71ccf.jpg)
![](https://www.scirp.org/html/8-1240213\1e0514cc-26db-4f80-8301-6a2a614bab3e.jpg)
and
.
So,
![](https://www.scirp.org/html/8-1240213\95f47d45-5d9f-4778-8eb7-98b84acba055.jpg)
![](https://www.scirp.org/html/8-1240213\69377433-3378-49f8-88ea-1f7abea2d671.jpg)
To find
and
which minimizes
take partial derivative of
with respect to
,
and set it equal to zero.
![](https://www.scirp.org/html/8-1240213\a6d882a1-1dee-4015-af8e-53f9ea99479e.jpg)
and
![](https://www.scirp.org/html/8-1240213\19f0e35c-5354-4494-8761-469c9e41297d.jpg)
So the optimum values of
and
are
![](https://www.scirp.org/html/8-1240213\a35f3c61-495e-40ae-857c-3abbcec55b7b.jpg)
and
![](https://www.scirp.org/html/8-1240213\1833bae9-f6e4-4ff0-a5a0-3170bb797e26.jpg)
The estimate of
is
![](https://www.scirp.org/html/8-1240213\de4d400b-0e6c-4fcf-92c6-859ba426f495.jpg)
and the estimate of
is
![](https://www.scirp.org/html/8-1240213\92ace0d8-9321-4832-85bf-681fb0b63fc9.jpg)
where
,
,
,
![](https://www.scirp.org/html/8-1240213\92734799-4933-4824-941e-424229bee351.jpg)
and
.
4. Simulation Study
This section, the simulation x-values z-values and y-values from Chutiman, N. and Kumphon, B. [4] were studied. The data partition into 4 stratum. The stratum size is 20 × 5 = 100 units. The populations were shown in Figures 2-4. Sample of units is selected by simple random sampling without replacement. The y-values are obtained for keeping the sample network. In each the sample network, the x-values and z-values are obtained. The condition for added units in the sample is defined by
.
For each estimator 5000 iterations were performed to obtain an accuracy estimate. Initial SRS sizes were varied
= 5, 10, 15, 20 and 30 were used. The estimated mean square error of the estimate mean is
where
is the value for the relevant estimator for sample
.
The estimate of the mean square of the estimators
are shown in Table 1, where
and
is called
of mean per unit,
and
is called
of multivariate ratio estimator,
and
is called
of multivariate ratio estimator,
and
is called
of multivariate product estimator,
and
is called
of ratio estimator using
,
and
is called
of ratio estimator using
,
and
is called
of product estimator using
and
and
is called
of product estimator using
.
5. Conclusion
Stratified adaptive cluster sampling is an efficient method for sampling rare and hidden clustered populations. The numerical study showed that if the variable of in-
![](https://www.scirp.org/html/8-1240213\d2b530ab-11ba-4dff-9ff2-d99247a18290.jpg)
Table 1. The estimate mean square error of the estimators.
terest
and the auxiliary variables
have high positive correlation then the estimate of the mean square error of the ratio estimators is less than the estimate of the mean square error of the product estimator. The estimators which use only one auxiliary variable were better than the estimators which use two auxiliary variables.
6. Acknowledgements
This research was supported by Faculty of science Mahasarakham University, Thailand. We would also like to profoundly thank Mr. Paveen Chutiman for his programming advice.
NOTES