Why Well Spread Probability Samples Are Balanced

When sampling from a finite population there is often auxiliary information available on unit level. Such information can be used to improve the estimation of the target parameter. We show that probability samples that are well spread in the auxiliary space are balanced, or approximately balanced, on the auxiliary variables. A consequence of this balancing effect is that the Horvitz-Thompson estimator will be a very good estimator for any target variable that can be well approximated by a Lipschitz continuous function of the auxiliary variables. Hence we give a theoretical motivation for use of well spread probability samples. Our conclusions imply that well spread samples, combined with the HorvitzThompson estimator, is a good strategy in a varsity of situations.


Introduction
In many fields there has been a great interest in selecting samples that are well spread or spatially balanced.Such samples are considered to produce good estimates for target variables that exhibit spatial trends, see e.g.[1,2].The focus in this paper is to explain the connection between a well spread sample and a balanced sample.Roughly speaking, a sample is well spread if the number of selected units is close to what is expected on average, in every part of the auxiliary space.A sample is balanced on a variable if the Horvitz-Thompson (HT) estimator of the total of that variable agree exactly with the known population total of the variable.In fact, with a short analysis, this paper clarifies why a well spread sample is approximately balanced.We also explain that, if the sample is well spread, the variance of commonly used estimators is usually low.
It is well known that samples that are balanced or approximately balanced on the auxiliary variables may be selected by using the cube method, see e.g.[3].Sampling methods for selection of well spread samples in a general auxiliary space, by utilizing a distance function, are more recent and less well known than the cube method.Two such methods are the local pivotal method (LPM) and spatially correlated Poisson sampling (SCPS).The LPM design, based on the pivotal method [4], was first introduced in [5].The other method, SCPS, was first introduced in [6] and it is a special case of the method described in [7].
In many areas, such as forest inventories, environmental studies, and even in official statistics, different forms of stratification are commonly used to obtain samples that are well spread geographically or in other available information.Often, stratification is used as a variance reduction technique without particular interest in the different strata.Constructing a stratified sampling design is often not straightforward, especially if several mixed auxiliary variables are available.It is not uncommon that statisticians try to stratify using several variables, but crossing all strata of all variables usually results in cells that are too small.In such situations it may be preferable and less complicated to define a distance measure in the auxiliary space, and then use a sampling method that in general avoid selection of nearby units, thus forcing the sample to be well spread.
In Section 2, a theoretical motivation for the balancing effect of well spread samples is given.In Section 3, we give arguments indicating that using well spread samples provides a small anticipated variance for the HT-estimator under a very general super-population model.Some sampling methods for selecting well spread samples are briefly discussed in Section 4. Final comments are provided in Section 5. estimate some characteristics of U. It is assumed that we have access to auxiliary information on unit level, i.e. the values of q auxiliary variables are known for each unit .We also assume that it is possible to calculate the distance j between two units i and j in the auxiliary space.Usually the total y π i of one or more target variables are the parameters we wish to estimate.It is assumed that each population unit i is included in the sample with a known probability , , with  , where n is the sample size.In this case the unbiased and commonly used HT-estimator [8] of Y is ˆ.
We are now ready to formalize what a well spread sample is.As suggested in [2], we use Voronoi polytopes to measure how well spread a sample is.The Voronoi polytope , for , includes all population units j satisfying for all sample units .Let n i denote the number of units in p i , with the correction that if a unit j is included in m j polytopes, then j is counted as Next, let v i be the sum of the inclusion probabilities in p i .Again, if a unit j is included in m j polytopes, then its inclusion probability is divided equally   π j j m to each of the m j polytopes.Hence, π .
i i 1 and . We are now ready to give the definition of a well spread sample.

i n N 
Definition 1 A sample is said to be well spread (or spatially balanced) with respect to the inclusion probabilities if each v i is equal or close to 1.
As a measure of how well spread a sample is, we may use see e.g.[2].A small value of B indicates a very well spread sample.The mean of B over repeated samples is an indicator of how well spread samples a design produces.We next define a balanced and an approximately balanced sample.Definition 2 We say that a sample s is balanced on the auxiliary x-variables if Moreover, a sample is said to be approximately bal- .
In order to show that well spread samples are balanced we start by making three quite strong assumptions on the sample and the population.Later we will relax these assumptions a bit and then show that a larger class of well spread samples are approximately balanced.We first assume the following.
are allowed to vary between polytopes.Under the three assumptions it follows that

 
Thus the sample is balanced on any function f x and in particular, it is balanced on the auxiliary variables x .The next step consists of introducing the following three new and less restrictive assumptions.

  .1 A
For each polytope p i , the inclusion probabilities satisfies π π for all and some 0 π .
In each polytope , we have , for all and some 0.
The target is a Lipschitz continuous function of the auxiliary variables, i.e.

     
, for all and some 0.
Remark 1 Concerning the validity of assumption   .1 A , the inclusion probabilities are (if unequal) supposed to be derived from the auxiliary x-variables, perhaps they are chosen proportional to one of the x-vari-ables, so they should not vary much within a polytope.Remember that the polytopes are constructed by grouping together units with similar x-values.
We are now ready to state and prove our main result.Theorem 1 Let s be a well spread sample satisfying 1 i i for all i and for some i   .Assume also that we obtain exact balance on the target since  .
, then we get a balanced sample, see Definition 2.Besides that Theorem 1 tells us that when i i are small, the sample will be approximately balanced on and , it also gives bounds for the target parameter .We can however do better than the bounds in Theorem 1.For instance, we have , ,


.1 A but these bounds are constructed by applying a worst case senario, within each polytope, so we cannot expect the bounds to be very good.
Proof of Theorem 1.By assumption   and since 1 The inequalities in (3) give Moreover, from assumptions and


The proof is complete.

Variance under a General Model
It is interesting to see how well spread samples perform under a general super-population model.Following [9], but here with a possibly non-linear model, we assume , and  and cov M are the expectation, variance and covariance under the model.The cor-  are supposed to be decreasing in function of the distance between the units i and j.
With some routine calculations, the anticipated variance of the HT-estimator under model ( 6) can be shown to be  is the expectation under the design.Now, if we study expression (7), it becomes evident that we want the samples to be as balanced as possible on   We also want to make sure that ij is small whenever  is large in order to minimize the term Copyright © 2013 SciRes.

OJS
If the samples are selected to be well spread (i.e.small joint inclusion probabilities for nearby units), then both terms in (7) becomes small.However, if the model standard deviation i  is known, it is possible to also choose the inclusion probabilities to minimize further.The diagonal term of ( 8) is dominant, i.e.
With the constraint of fixed sample size   and by using a Lagrangian function, it follows that the minimum in of the right hand side of ( 9) is As a result, a very efficient sam- pling design under this general model is to select samples that are well spread in the x-space with inclusion probabilities given by (10).The requirements needed in order for the samples to be approximately balanced on   are then fulfilled.The inclusion probabilities will not vary much within the polytopes since is supposed to be a Lipschitz continuous function of x.Hence, with this strategy, the anticipated variance of the HTestimator becomes small.


It is not possible to balance the sample directly on since the function is obviously not known in advance.Probably, the best we can do in practise is to make sure the samples are well spread in x to have a balancing effect on the unknown function , and hence also have small when π ij ij  may be large.
If we use well spread probability samples together with the HT-estimator, the estimator will be very efficient (i.e. have a small variance) if the population is close to a realization of the model (6).Note also that the approach is purely design based, and the estimator maintains design unbiasedness and design consistency even if the model is false.
Example 2, given in the next section, supports the above statements.In particular, the example compares different sampling methods with respect to variance and spatial balance.It is clear that methods obtaining well spread samples are more balanced and hence produce smaller variance.

Some Methods for Selecting Well Spread Samples
Besides spatial stratification, one of the first more novel designs for selecting a well spread sample is called generalized random tessellation stratified (GRTS), and was introduced in [2].The GRTS design uses a specific random mapping to map two (or more) dimensions to one dimension.Basically the units are re-ordered to a list and units close in the list tend to also be close in the auxiliary space.Then a systematic ps sample is selected from the list, making sure the sample becomes well spread in the list and hence also in the auxiliary space.A drawback of GRTS is that a lot of information is lost in the mapping, especially if the space has many dimensions (i.e.many auxiliary variables).However, for two dimensions, the GRTS produces rather well spread samples.
Another idea is to map dimensions to one by use of space-filling curves, and one such design was presented and evaluated in [10].However, we believe that mapping several dimensions to one is not the best way to achieve a well spread sample.Too much information is lost in such a mapping.
A more recent idea to achieve well spread samples is to first define a distance measure in the auxiliary space.To do so, let be all available auxiliary variables, where   1, , p  correspond to the quantitative variables and   to the qualitative variables.To measure the distance between unit i and j in this q-dimensional space, [11] propose the following definition of distance , 1 , where k  is the standardized version of k x x .By standardizing, the auxiliary variables are approximately of equal importance.However, the above distance function is just an example and in a particular situation some other distance function may be more appropriate.Given the distace measure, the design should create a negative correlation of the inclusion indicators for close units, so that two close units seldom appear in the sample together.Such a design is not necessarily complicated.For instance, the local pivotal method (LPM) introduced in [5] is quite simple.The LPM is based on the pivotal method [4].The main idea in LPM is to make similar units (i.e.nearby units) compete with each other for inclusion in the sample.The LPM successively updates the prescribed vector of inclusion probabilities to become a vector with zeros and ones, where the ones indicate inclusion in the sample.In one step of LPM, two close units i and j with  .Thus, if i j   π π 1   , then a = 1 and the winning unit will definitely be in the sample.If i j , then the looser will definitely not be in the sample (since b = 0).The Now, replace  with  i j .The final out- come is decided for at least one unit each update, and thus the procedure has at most N steps.In each update, unit i is chosen randomly (with equal probabilities among the units with 0 i ) and then its nearest neighbor (among the units with ) is chosen.
Another method, spatially correlated Poisson sampling (SCPS) was first described in [6] and it is a special case of the method introduced in [7].The SCPS algorithm is a bit more complicated than LPM, but is based on the same idea.Weights are used to create a negative correlation between the inclusion indicators of nearby units, forcing the sample to be well spread.For more on the above discussed methods, for selecting well spread samples, we refer the reader to the previously mentioned papers.
The two designs, LPM and SCPS were used in [11] to obtain well spread probability samples.The fact that LPM and SCPS produce well spread samples has been justified by both theoretical results and simulation results in the previously mentioned papers.Variance estimators for the HT-estimator under well spread samples was suggested in the papers [11,12].To our knowledge LPM and SCPS are the designs that in general produce the lowest mean value of the balance measure (1) in general auxiliary space with prescribed inclusion When it comes to efficiency of the HT-estimator for well spread samples, we can also make heuristic arguments that such samples produce a low variance of the HT-estimator.When the sample size is fixed, the variance of ˆπ i i i s X x    can be written as A property that e.g. the LPM and SCPS design have is that is small (minimum or close to minimum) when is small and is large (close to ) when is large), then small since  s  .As a result the va becomes small.For well spread samples, the balancing prop riance (11) erty can on es U m and fe ly be shown to hold exactly in very specific situations, i.e. under assumptions (A.0)-(A.2),see (2).For a categorical auxiliary variable, the sample will be balanced if the design produces stratification with fixed sample size for each category.A simple example follows.
Example 1.Let U be a population of mal males U f .Let x be the only auxiliary variable and let 0 i x  if male and 1 i x  if female.Also, let π m i  m m n N and π f f N be the inclusio re n m a egers.In this special case, we have that e.g. the LPM and SCPS automatically produces stratification with fixed sample sizes.Hence we have where s m and s f are the sampled males and females re-

2.
We compare the different sampling method spectively.
Example s LPM, SCPS, GRTS and simple random sampling (SRS) using a model satisfying (6).In particular, the population is generated from The population size is N = 200 and the x-v lues are ge a nerated from a uniform distribution on the unit square.Using Euclidean distance, the covariance function for  is defined as , which is a simple co y fields [13].The  -values are generated in two steps.First random independent and identically distributed data  

Final Comments
It has been shown that in general there is a significant spread samples.Usually, well icate balancing effect for well spread samples are not as balanced on the auxiliary x-variables as samples selected by the cube method, but nearly so if the sample size is not too small.However, for target variables that are non-linear in x, well spread samples are likely to be more balanced on the target variables than samples selected by the cube method.In that way, well spread samples are good for more general situations.Hopefully, the fact that a significant balancing effect has been shown will increase the interest of using well spread probability samples when auxiliary x-variables are available.
There also exists a possibility combine the cube method with a similar idea as used in the LPM, to have a local cube method.Then samples that are both well spread (spatially balanced) and balanced on the auxiliary variables can be selected.Such a method was developed in [9].
In [1,14], properties of spatial total estimators are studied under a tessellation stratified design in a continues universe.With similar assumptions on the target function, as used in this paper, they show that the convergence rate of the variance of the total estimator is for such a design.Even though our setting is different and does not imply a strict stratification, this ind s that spreading the sample locations well probably gives a small variance when there are spatial trends.
In the setting of Voronoi polytopes used in this paper, we may consider the nearest neighbor estimator (NNestimator) in place of the HT-estimator.The NN-estimator of Y is, if i n is the number of units in polytope i p , .n  and the NN-estimator s equal to the HT-estibe ap-i mator.This implies that the NN-estimator will proximately design unbiased for well spread samples.Moreover, the NN-estimator can probably adjust for some minor spatial imbalance in the sample by using the realized polytope sizes i n instead of 1 π i , which can be viewed as the estimated polytope sizes.The possible benefit of using the NN-estimator in place of the HTestimator will be investigated in a future paper. i compete.The winner takes as much probability mass as possible from the other unit.Hence, the winner receives the new probability and the looser gets the new probability i j

Results for Example 2. Empirical variance V of
able 1.