Open Journal of Statistics, 2012, 2, 300-304
http://dx.doi.org/10.4236/ojs.2012.23036 Published Online July 2012 (http://www.SciRP.org/journal/ojs)
Multivariate Ratio Estimator of the Population Total
under Stratified Random Sampling
Oscar O. Ngesa1, George O. Orwa2, Romanus O. Otieno2, Henry M. Murray2
1Ministry of State for Planning, National Development and V2030, Nairobi, Kenya
2Department of Statistics and Actuarial Science, Jomo Kenyatta University of Agriculture and Technology, Nairobi, Kenya
Email: oscanges@yahoo.com
Received March 7, 2012; revised April 10, 2012; accepted April 30, 2012
ABSTRACT
Olkin [1] proposed a ratio estimator considering p auxiliary variables under simple random sampling. As is expected,
Simple Random Sampling comes with relatively low levels of precision especially with regard to the fact that its vari-
ance is greatest amongst all the sampling schemes. We extend this to stratified random sampling and we consider a case
where the strata have varying weights. We have proposed a Multivariate Ratio Estimator for the population mean in the
presence of two auxiliary variab les under Stratified Random Sampling with L strata. Based on an empirical study with
simulations in R statistical software, the proposed estimator was found to have a smaller bias as compared to Olkin’s
estimator.
Keywords: Ratio Estimator; Stratification; Auxiliary Variables; Lagrange’s Multiplier
1. Introduction
Auxiliary variables have been used to increase precision
of estimators especially in reg ression and ratio estimators
[2]. This is particularly so in cases of complex surveys,
more so in situations where some information on the
survey variable might be missing [3].
These classical methods of estimation are based on di-
rect estimators, i.e., those which use the response vari-
able, y and information provided b y an auxiliary variable,
x, highly correlated with the main variable [4].
2. Review of Multivariate Ratio Estimators
Olkin [1] proposed a multivariate generalization of the
ratio estimator. Olkin proposed an estimator for the
population total, denoted by ˆ
M
R
Y, and defined as
112 2
ˆ
12
M
R
yy
YWXWX pp
p
y
WX
xx x

12
12
ˆ ˆ
+
(2.1)
which in other contex t can also be written as;
ˆˆ
p
M
RR RpR
YWYWY WY  (2.2)
where ˆi
R
i
i
y
YX
x
th
i
W
ˆ
is the component of the population
total ratio estimate affiliated to the auxiliary variable
are the weights which maximize the precision of
i
M
R, subject to a linear constraint 12 p.
This estimate of population total also will be accurate if
the regression line of Y on 12
Y1W
,,,
WW
p
X
XX
i
is a straight
line going through the origin. The population totals for
the auxiliary variables
X
must be explicitly known.
3. The Proposed Estimator
Consider a population which has been divided into L
strata, with the strata being disjoint, the sample elements
from each stratum are sampled and when the measure-
ment hi is done, measurement for the unit in the
stratum, two auxiliary variables, say,
yth
i
th
h1hi
and 2hi
x
are also measured for that i unit. Let
th ˆ
M
RE
ˆ
Y denote
the proposed multivariable estimator under the stratified
random sampling scheme for the population total.
M
RE
Y
1
ˆˆ
L
is therefore defined as;
M
RE MRi
i
YY
11 12
111 12
ˆˆˆ
(2.3)
where the individual components are defined as follows:
M
RR R
YWYWY
21 22
221 22
ˆˆ
ˆR
MR R
YWYWY
2
1
12
ˆˆ
ˆ
··· for the 1st stratum.
··· for the 2nd stratum.
R
L
L
MRLLL R
YWYWY
2
1
12
ˆˆ
ˆ
··· for Lth the stratum.
This can further be represented in a single equation as
follows;
R
h
h
M
Rhhh R
YWYWY
1, 2,,hL
(2.4)
are the various strata. where
C
opyright © 2012 SciRes. OJS
O. O. NGESA ET AL. 301
4. Variance of the Proposed Estimator
To compute the values of the weights, the general Equa-
tion (2.4) is used and this will cater for each stratum by
just changing the value of h in respective strata. Sub-
tracting h to the right hand side and left hand side of
equation (2.4) yields
Y
12
12
ˆˆˆ
hh
M
RhhhR
YYWY h Rh
WYY 
12
1
hh
WW

12
=
hhhh
YWWY

12
ˆˆˆ
(2.5)
But it is known that the sum of the weights in each
stratum is 1, so . This implies that
(2.6)
Replacing Equation (2.6) to the right hand side of
Equation (2.5), yields
12
12
hh
M
RhhhRhR
YYWYWY 
ˆˆˆ
h
hh
WWY
12
12
12
hh
M
RhhhRhRh hhh
WY WY

12
2
ˆˆˆ
hh
h
YYWYWY
Collecting the like terms with respect to weights yields
1MRh hh
R
hRh
YYWYYWYY 
 
112
22
ˆˆˆˆ
2,
hhh
R R
VYW VYWWCovYY

22
122 22
ˆ2
(2.7)
Squaring each side and taking Expectation on either
side, assuming negligible bias, Equation (2.7) leads to

2
11
22ˆh
MRhhRh h
hR
WV
Y (2.8)
Equation (2.8) can be written in notation as follows,
11
11 2
M
Rhhhhhhh h
VYW VWWVWV 

1
ˆ
Variance h
hR
VY

2
ˆ
Variance h
hR
VY

12
ˆˆ
Covariance ,
hh
hRR
VYY
W
2h
W

ˆ
(2.9)
where
11
22
,
and
12
We then proceed to find the values of the weights
1h
and that minimize the variance
M
Rh
1


12
ˆ1h h
VYW W
 
VY subject
to the linear constraint .
12hh
To achieve this, we form a function which has the
variance and the linear constraint mentioned above.
WW
MRh (2.10)
with
being the Lagrange’s Multiplier.
From Equation (2.9),

22
2122 22
ˆ2
11
1 1
M
Rhh hhhhhh
VYWVWWV WV 

22
1 2
21
h h
WVWWVWVW W
 
W
replacing this into Equation (2.10) yields;
11
112122 22hhhh hh h
To minimize this function with respect to the weights
1h and 2h
W, we differentiate partially the function
with respect to these weights each at a time.
111 212
1
22
hhh h
h
WV WV
W

(2.11)
1122 22
2
22
hhhh
h
WV WV
W

111 212
22
hhh h
WV WV
(2.12)
For optimization, we equate the partial derivative
Equations (2.11) and (2.12), each to zero. These yields;
1122 22
22
hhhh
WV WV
(2.13)
1112121122 22
22 22
hhh hhhhh
WV WVWVWV
(2.14)
It follows that Equations (2.13) and (2.14) are equal,
then

The 2 is common and can be cancelled out. We pro-
ceed to collect like terms with respect to the weights and
this yield
 
1 111222212hhhhhh
WV VWVV 
1WW
(2.15)
It is known that12hh
21
1
hh
, hence WW
.
From this Equation (2.15) will reduce to
 
1 111212212
1
hhhhhh
WVVW VV 
and

11112 22122212hhhh hhh
WV VVVVV
1h
Then it follows, by making W the subject of the
formula,


22 12
1
11 122212
hh
h
hhh h
VV
WVV VV

Opening the brackets in the denominator yields

22 12
11112 22
2
hh
hhhh
VV
WVVV

2h
W
21
1
hh
WW
(2.16)
To get the value of weight , we use the linear
constraint

22 12
21112 22
12
hh
hhhh
VV
WVVV
 
which may be written as,





11122222 12
21112 221112 22
11 12
21112 22
2
22
2
hhh hh
hhhh hhh
hh
hhhh
VVV VV
WVVV VVV
VV
WVVV
 

 

(2.17)
Equations (2.16) and (2.17) give the weights that mini-
ˆ
mize the variance
M
Rh
VY for stratum h.
Copyright © 2012 SciRes. OJS
O. O. NGESA ET AL.
Copyright © 2012 SciRes. OJS
302
1, 2,,10i
ei
y
e
a b
,andyx x
pulation total. The ten strata were again joined together
to form one huge stratum, index-wise sample of size 1000 ,
was selected and then using Olkin’s model, the popula-
tion total was estimated. The procedure above was re-
peated for 1000 samples and the population totals using
each model was recorded.
These weights can now be substituted in the proposed
model to get the population total.
5. Empirical Study
An empirical study was carried out to estimate the popu-
lation total of a simulated population and compare the
performance of the proposed model to that of Olkin [1]. 8. Simulation Results
6. Description of the Study Population The population total estimates of the two methods were
compared to that of the true population (simulated) total.
The True population total is 28,235,645. Table 1 sum-
marizes the statistics corresponding to each estimator.
Figures 1 and 2 show the plotted values of the popula-
tion total estimates of proposed model and Olkin’s model,
respectively, repeated for 1000 simulations each.
In this section we simulated a population (yi, x1i and x2i),
which has 10 strata in which each stratum
differs from others. This difference was achieved by us-
ing different error terms i while generating the us-
ing 12iiiiii
. The coefficients i and i are
randomly generated from a uniform distribution while
12ii i
are randomly gene-rated from normal dis-
tribution with different para meters.
yaxbx
,andx yx In order to show the difference in variability between
the two methods, the two plots above are now combined
into one graph using a common scale in the Figure 3.
7. Computational Procedure 9. Conclusions
A sample of size 300 was selected randomly from the
simulated population index-wise, that is if index i is se-
lected then the sample elements will have 12ii i
.
This was repeated for all the ten strata, the selected sam-
ple was used in the proposed model to estimate the po-
From the summary table above, it can be seen that the
proposed estimator gives a total with a very small bias as
compared to the Olkin’s. Also, the proposed model can
be seen to have a small Root Mean Square Error (RMSE)
Table 1. Summary statistics for each method.
Min. Median 3rd Qrt Max Mean Bias RMSE
Proposed Method 2,821,006 2823185 2,823,565 2823987 2,825,123 2,823,579 144.53
Olkin’s Method 2,746,765 2805085 2,822,892 2840866 2,903,358 2,822,799 7659.34
Figure 1. Plot of the population totals with proposed model for the 1000 samples.
O. O. NGESA ET AL. 303
Figure 2. Plot of the population totals without stratification for the 1000 samples.
Figure 3. Figures 1 and 2 plotted on a common scale.
as compared to Olkin’s estimator.
The combined graph also shows that the population
total estimate is more variable in Olkin’s as compared to
the proposed model.
The limiting condition to allow the use of this estima-
tor is the requirement of existence of linear relationship
Copyright © 2012 SciRes. OJS
O. O. NGESA ET AL.
304
through the origin between the variable of interest, y, and
the auxiliary variables.
REFERENCES
[1] I. Olkin, “Multivariate Ratio Estimation for Finite Popula-
tions,” Biometrika, Vol. 45, No. 1-2, 1956, pp. 154-165.
[2] W. G. Cochran , “Sampling Techniques,” 3rd Editio n, Wiley,
New York, 1977.
[3] L. Y. Deng and R. S. Chikura, “On the Ratio and Regres-
sion Estimation in Finite Population Sampling,” Ameri-
can Statistician, Vol. 44, No. 4, 1990, pp. 282-284.
[4] P. V. Sukhatme and B. V. Sukhatme, “Sampling Theories
of Survey with Applications,” Iowa State University Pre ss,
Ames, 1970.
Copyright © 2012 SciRes. OJS