Multivariate Ratio Estimator of the Population Total under Stratified Random Sampling

Olkin [1] proposed a ratio estimator considering p auxiliary variables under simple random sampling. As is expected, Simple Random Sampling comes with relatively low levels of precision especially with regard to the fact that its variance is greatest amongst all the sampling schemes. We extend this to stratified random sampling and we consider a case where the strata have varying weights. We have proposed a Multivariate Ratio Estimator for the population mean in the presence of two auxiliary variables under Stratified Random Sampling with L strata. Based on an empirical study with simulations in R statistical software, the proposed estimator was found to have a smaller bias as compared to Olkin’s estimator.


Introduction
Auxiliary variables have been used to increase precision of estimators especially in regression and ratio estimators [2].This is particularly so in cases of complex surveys, more so in situations where some information on the survey variable might be missing [3].
These classical methods of estimation are based on direct estimators, i.e., those which use the response variable, y and information provided by an auxiliary variable, x, highly correlated with the main variable [4].

Review of Multivariate Ratio Estimators
Olkin [1] proposed a multivariate generalization of the ratio estimator.Olkin proposed an estimator for the population total, denoted by ˆMR Y , and defined as which in other context can also be written as; is the component of the population total ratio estimate affiliated to the auxiliary variable are the weights which maximize the precision of .This estimate of population total also will be accurate if i is a straight line going through the origin.The population totals for the auxiliary variables X must be explicitly known.

The Proposed Estimator
Consider a population which has been divided into L strata, with the strata being disjoint, the sample elements from each stratum are sampled and when the measurement hi is done, measurement for the unit in the stratum, two auxiliary variables, say, where the individual components are defined as follows: This can further be represented in a single equation as follows;  are the various strata.where 

Variance of the Proposed Estimator
To compute the values of the weights, the general Equation (2.4) is used and this will cater for each stratum by just changing the value of h in respective strata.Subtracting h to the right hand side and left hand side of equation (2.4) yields But it is known that the sum of the weights in each stratum is 1, so .This implies that (2.6) Replacing Equation (2.6) to the right hand side of Equation (2.5), yields Collecting the like terms with respect to weights yields

ˆ2
(2.7) Squaring each side and taking Expectation on either side, assuming negligible bias, Equation (2.7) leads to Equation (2.8) can be written in notation as follows, We then proceed to find the values of the weights 1 h and that minimize the variance To achieve this, we form a function which has the variance and the linear constraint mentioned above.
with  being the Lagrange's Multiplier.

2 22 h h h h h h h
To minimize this function with respect to the weights 1 h and 2 h W , we differentiate partially the function  with respect to these weights each at a time. (2.11) (2.12) For optimization, we equate the partial derivative Equations (2.11) and (2.12), each to zero.These yields; The 2 is common and can be cancelled out.We proceed to collect like terms with respect to the weights and this yield Then it follows, by making W the subject of the formula, Opening the brackets in the denominator yields To get the value of weight , we use the linear constraint which may be written as, (2.17) Equations (2.16) and (2.17) give the weights that mini- , and y x x pulation total.The ten strata were again joined together to form one huge stratum, index-wise sample of size 1000, was selected and then using Olkin's model, the population total was estimated.The procedure above was repeated for 1000 samples and the population totals using each model was recorded.
These weights can now be substituted in the proposed model to get the population total.

Empirical Study
An empirical study was carried out to estimate the population total of a simulated population and compare the performance of the proposed model to that of Olkin [1].

Description of the Study Population
The population total estimates of the two methods were compared to that of the true population (simulated) total.The True population total is 28,235,645.Table 1 summarizes the statistics corresponding to each estimator.Figures 1 and 2 show the plotted values of the population total estimates of proposed model and Olkin's model, respectively, repeated for 1000 simulations each.
In this section we simulated a population (y i , x 1i and x 2i ), which has 10 strata in which each stratum differs from others.This difference was achieved by using different error terms i while generating the using The coefficients i and i are randomly generated from a uniform distribution while

, and x y x
In order to show the difference in variability between the two methods, the two plots above are now combined into one graph using a common scale in the Figure 3.

Computational Procedure 9. Conclusions
A sample of size 300 was selected randomly from the simulated population index-wise, that is if index i is selected then the sample elements will have 1 2 i i i .This was repeated for all the ten strata, the selected sample was used in the proposed model to estimate the po-From the summary table above, it can be seen that the proposed estimator gives a total with a very small bias as compared to the Olkin's.Also, the proposed model can be seen to have a small Root Mean Square Error (RMSE)    as compared to Olkin's estimator.The combined graph also shows that the population total estimate is more variable in Olkin's as compared to the proposed model.
The limiting condition to allow the use of this estimator is the requirement of existence of linear relationship through the origin between the variable of interest, y, and the auxiliary variables.
i MR , subject to a linear constraint 1 2 p

i
are randomly gene-rated from normal distribution with different parameters.y a x bx  

Figure 1 .
Figure 1.Plot of the population totals with proposed model for the 1000 samples.

Figure 2 .
Figure 2. Plot of the population totals without stratification for the 1000 samples.

Figure 3 .
Figure 3. Figures 1 and 2 plotted on a common scale.