A Note on the Precision of Stratified Systematic Sampling

Conflicting views had greeted the use of systematic sampling for sample selection and estimation in stratified sampling in terms of the precision of the population mean base on the inherent characteristics of the population. These conflicting views were analyzed using Cochran data (1977, p. 211) [1]. When the population units are ordered, variance of systematic sampling for all possible systematic samples provides equal, non-negative and most precise estimates for all the variance functions considered i.e. ( ) ( ) ( ) 1 2 3 sy sy sy V y V y V y = = , unlike when a single systematic sample is used and when variance of simple random sampling is used to estimate selected systematic samples.


Introduction
describes systematic sampling thus: suppose that N units in the population are numbered 1 to N in some order.To select a sample of n units, we take a unit at random from the first k units and every k th unit thereafter.The selection of the first k th units determined the whole sample.This is called an every k th systematic sample.
Murthy (1967) [2] states that systematic sampling is operationally more convenient and at the same time saves time while ensuring equal probability of inclusion of each unit in the sample.He describes technique of systematic sampling as consisting of selecting every k th unit starting with the unit corresponding to a number r chosen at random from 1 to k, where k is taken as the integer nearest to N n .The random number r chosen from 1 to k is known as random start and the constant k is termed the sampling interval.
A sample selected by this procedure is termed a systematic sample with a random start r.Therefore, the value of r determines the whole sample.In other words, this procedure amounts to selecting with equal probability one of the k possible groups of units (samples) into which the population can be divided in a systematic manner.
Same view was expressed by Raj and Chandhok (1998) [3].They described systematic sampling as a more convenient method of sample selection when the units were serially numbered from 1 to N with the assumption that N = nk, where n is the sample size desired, and k is an integer.A number is taken at random from the numbers 1 to k (using a table of random number/random number generator).Suppose the random number is i, then the sample contains n units with serial numbers ( ) . Thus, the sample consists of the first unit selected at random and every k th unit thereafter.It is therefore called a 1-in-ksystematic sample.
Early studies on the development of theory of systematic sampling was as reported by Murthy (1967, p.134) [2] while Cochran (1977) [1] reported that Madow (1953) [4] had carried systematic sampling to its logical conclusion with his recommendation that a systematic sample be chosen at or near the center of the interval, i.e. instead of starting the sequence by a random number chosen between 1 and k, we take the starting number as ( ) [5] investigated the efficiency of single and multiple random start systematic sampling in population exhibiting different characteristics and reported that when the population was in random order ( ) ( ) V y V y = and ( ) ( ) for a population with linear trend, while in a periodic population ( ) ( ) equality results when M S = .He, however, concluded that, with an exponential correlelogram, single random start ( ) S was more precise than multiple random starts ( ) Murthy (1967) [2], Cochran (1977) [1], Raj and Chandhok (1998) [3] and Okafor (2002) [6] have all mentioned that systematic sampling can be looked into in another way in relation to cluster sampling.They explained that in a population with N = nk, the population can be divided into k large systematic sampling units each containing n of the original n units.The operation of choosing a randomly located systematic sample is just the operation of choosing one of these large sampling units at random.Thus, systematic sampling amounts to selecting of a simple random sample of one cluster unit from a population of k cluster units with probability 1 k .
Thus for a population of Y units 1 2 , , , N Y Y Y  divided into k possible clusters, the k possible samples with their means are as shown in Table 1 below.
Considering all the k possible samples, the sample mean sy y is obtained thus: ( ) ( ) Showing that when N = nk, sy y is unbiased for Y .It should also be noted that systematic sampling has no repetition of sampling unit and therefore related to simple random sampling without replacement (SRSWOR).Above is the applicable systematic sampling in a situation in which N = nk.In practice, it is common to encounter situations in which N nk ≠ , and various suggestions have been made on how to handle such a situation.

Approaches When N nk ≠
2.1.Circular Systematic Sampling (CSS) Lahiri (1952) [7] suggests the Circular Systematic Sampling (CSS) which consists of taking a random number from 1 to N and selecting the unit corresponding to this random start and every th k unit thereafter in a cyclical manner until a sample of n units is obtained, k being the nearest integer to N n , i.e.If r is a random number se- lected from 1 to N, the sample consists of the units corresponding to the number.
therefore, that the usual procedure of selecting a random start r from 1 to k and including in the sample the units corresponding to r jk N + ≤ for ( ) reflected above may be termed Linear Systematic Sampling (LSS).

Murthy's Approach
Murthy (1967) [2] suggested that when N nk ≠ , i.e., the population units N cannot be divided into k clusters of equal size, therefore we choose the interval k to be the nearest integer to N n resulting in n′ which may not necessarily be equal to n, the required sample size.He stated further that if N nk ≠ , if q and r′ were the quotient and remainder obtained respectively on dividing N n , then, N can be written as N nq r′ = + and the sampling interval k can be taking as: Then, the units' n′ that can be expected in the sample would be given by: [ ] ( ) This approach is suitable in situations in which the sample size n is not fixed or predetermined and the sampler is free to adjust the sample to suit the above application.Therefore, Murthy's approach to handle N nk ≠ is not suitable for fixed sample size or when stratum sample sizes are determined using the standard procedures for allocating samples into the strata.

Fractional Interval Approach
Another approach when N nk ≠ is the use of fractional interval reported by Murthy (1967) [2].This approach called for taking k N n = as k without rounding it off to the nearest integer, i.e., the th i unit is selected in the sample if 1 i r jk i − < + ≤ for any ( ) . It is equivalent to associating different numbers with each unit such that the first gets the number 1 to n, the second gets from 1 n + to 2n and so on and thus selecting units corresponding to a LSS sample of n numbers selected from 1 to Nn with N as the sampling interval.This approach involves a long process of iteration to satisfy the equation; hence it wastes time.

New Partially Systematic Sampling (NPSS)
Leu and Tsui (1996) [8] developed the New Partially Systematic Sampling (NPSS) in order to derive an unbiased estimator of the variance of systematic sampling ( ) sy V y .The population size N need not be a multiple of sample size n; therefore, it is a suitable procedure when N nk ≠ .The procedure entails selection of SRS of size a and the remaining sample of size (n-a) systematically, these samples are combined to derive an unbiased estimate of ( ) sy V y .Thus, NPSS combines SRS with systematic sample to obtain its estimates thereby deviat- ing from the objective of this study as we intend to observe performances when systematic sampling is employed as a choice scheme within strata and not when SRS is combined with systematic samples.

Remainder Linear Systematic Sampling (RLSS)
Also reviewed in this section is Remainder Linear Systematic Sampling (RLSS) due to Chang and Huang (2000) [9].This procedure is a modification of the LSS.It is developed for situation when N nk ≠ , and depends only on the remainder.It involves dividing the population into two strata, the sampling interval k is taken as the nearest integer to N n such that N nk r = + , where r is the remaining population units, where 0 r n ≤ < ; N, n, k, and rare integers.When the remainder r is zero, 0 r = the procedure reduces to LSS.Procedures for the RLSS are: a) Divide the population units into two strata with the first stratum containing the front ( ) n r k − units and second stratum housing the remaining ( ) From stratum I, a random start 1 k is selected from the interval 1 to k and every th k units thereafter, from the ( ) n r − group of units forming stratum I. Thus samples from stratum I contained in a sample space S′ are:  units and every ( ) units thereafter from the r group forming the second stratum.Samples from stratum II are contained in the sample space S′′ are: Sample of size n is the combination of S′ and S′′ units.Therefore, in stratified systematic sampling when h h h

N n k ≠
, competing methods are: CSS, NPSS, and RLSS.Due to its greater efficiency over the CSS and NPSS as reported by Chang and Huang (2000) [9], RLSS was used by Kareem et al (2015) [10]

in stratum where
. The mean and variance of RLSS is as given below (see relation 2.2 and 2.3, p. 251 of Chang and Huang (2000) [9]).

Estimation Procedures in Systematic Sampling
Estimation of the population mean of a systematic sample over all possible samples is as given by relation (1).For the variance of the population mean, Murthy (1967) [2], while assuming N nk = for a sample of size n and k sampling interval, states that there are k possible samples and i y be the sample mean of th i possible sample ( )  .The sampling variance of the systematic sample is given as: where ∑ is the sum of systematic sample in the th i group, ij y is the th j variate of the th i systematic sample.
Note that ( ) Other expressions for the estimation of variance of the mean of systematic samples by various authors are reported by Murthy (1967, Section 5.8, pp.153-155) [2] and Cochran (1977, pp. 213-226) [1].Cochran, however, remarked "that no dearth of formulae for the estimated variance, but all appeared to have limited applicability".
On the efficiency of systematic sampling in relation to other sampling scheme, literature agreed that efficiency of systematic sampling was strongly anchored on the arrangement of the population units.Cochran (1977) [1] stated that it greatly depended on the properties of the population.For some population, systematic sampling is extremely precise and for others, SRS is more precise than systematic sampling, not even with increase in sample size n.Thus, it is difficult to give general advice about the situation in which systematic sampling is to be recommended.However, the knowledge of the population structure is necessary for its most effective use.
Same view was expressed by Murthy (1967) [2], that a good arrangement of the population units may yield a better estimate while a bad arrangement may lead to inefficient estimate and therefore, warned that one had to be careful with the use of systematic sampling and to ensure first, that the existing arrangement did not lead to inefficient estimates before using it.One way suggested is to ensure that the units are arranged either in increasing or decreasing order and this directly suits our investigation in this study, since application of methods of strata construction requires that the population units be arranged in order of magnitude to avoid overlapping of units.Cochran (1977) [1] stated that several formulae had been developed for ( ) sy V y .Three of such formulae given by Cochran under the assumption that N nk = and could be applied to any kind of cluster sampling in which the clusters contain n elements, and the sample consists of one cluster, are stated below.
1) The variances of the mean of systematic sample given by Cochran are: where ( ) ( ) This can further be expressed as ( ) which is the weighted variance over all possible systematic samples generated by random start ( ) 2) The second one is given as where w ρ is the correlation coefficient between pairs of units that are in the same systematic sample, other ref- erences referred to it as intra-class correlation coefficient and denoted by c ρ ( )( ) ( ) where the numerator is averaged overall ( ) The two expressions of ( ) sy V y above are expressed in terms of 2 S , hence it relates to the variance of SRS.
3) The third is expressed in terms of variance of stratified random sample in which the strata are composed of the first k units, the second k units and so on.
The subscript j in ij y denotes the stratum and the stratum mean is written as .j y .
( ) . 1 1 1 1 − ∑∑ This is the variance among units that lie in the same stratum.The divisor ( ) is used because each of the n strata contributes ( ) .
This quantity is the correlation between the deviations from the stratum means of pairs of units that are in the same systematic sample.
It implies therefore from relation (9) above that a systematic sample has the same precision as that of a stratified random sampling sample with one unit per stratum if 0 wst ρ = , thus relation (9) reduces to ( ) Thus, we have examined systematic sampling in terms of procedure and estimation process.But our concern is taking a systematic sample of fixed sample size n from each stratum for estimation purpose.

Estimation in Stratified Systematic Sampling
Much have been said in Section 2 on the significance of the arrangement of the population units on the precision ( ) sy V y , while Cochran (1977, p. 208) [1] has given a corollary that the mean of a systematic sample will be more precise than that of SRS if and only if  is the weighted variance of all possible systematic samples as defined by relation (6) above and 2  S is the variance of the population mean.Notations Cochran (1977, p. 91) [1] has stated that expressions for the mean and variance of stratified sampling applied generally to all classes of stratified sampling and are not restricted to stratified random sampling.Therefore, all notations in Cochran (1977, p. 90) [1] are also valid for stratified systematic sampling.
The subscript h denotes the stratum and i the unit within the stratum.The subscript "sy" in this section denotes systematic sample.
is the mean of systematic sample in stratum h, equivalent to relation (1).
is the population mean of the stratified systematic sample.
( ) is the variance of stratified systematic samples in stratum h when h h h N n k = .Therefore, ( ) V y variance of systematic samples given by Cochran in relation ( 5) above when h h h N n k = , is adopted for our sample estimation and modified for the stratified systematic samples as shown in relation ( 14) above.However, it should be noted that each of ( ) V y , and ( )

V y would yield the same estimates when
( ) ( ) is the variance of the population mean of stratified systematic samples.
( ) ( ) ( ) is the MSE of the population mean of stratified systematic samples.The mean and the variance of RLSS are given below (see relation 2.2 and 2.3, p. 251 of Chang and Huang (2000) [9]).
To suite our applications, expression (17) and (18) are modified as follows: ) ( ) It should be noted that that expressions ( ) ) ( ) V y .

Empirical Investigation
Systematic samples are easy to draw and to execute but may not be simple in term of estimation as there are competing estimators.This drew our attention for an empirical investigation to ensure the right choice of estimator in the face of conflicting reports.Murthy (1967, section 5.8, p. 153) [2] stated that "it is not possible to estimate unbiasedly the variance of the population mean and total on the basis of a single sample, but it is possible to build up some biased but useful variance estimators on the basis of systematic samples".shown in Table 2 and Table 3 above.Therefore, when systematic sampling is the choice design within strata, estimates for all possible systematic samples should be used and the sampling units arranged in order of magnitude within the stratum.Kareem et al. (2015) [10] used this procedure and reported higher efficiency of systematic sampling within stratum over the popularly used SRS.It is hereby recommended that ( ) V y given by Cochran (1977) [1] should be used for estimation purpose when h h h N n k = and that of Chang and Huang (2000) [9] ( ) ( ) when systematic sampling is employed within strata.

σ and 2 wσ
−∑∑ is the population variance of SRS and can be written as the sum of 2 b , which are the between and the within sample variances, respectively.

Table 1 .
Compositions of systematic samples of k clusters (such that N = nk).

Table 3 .
Variance of single systematic samples using Cochran's data when sampling units are arranged in order of magnitude.

Table 3 ,
g 5 indicates the center for systematic sample estimates when Madow's procedure is used while the subscript i = 1, •••, k = 10 is the random start in the interval 1 to 10.