Estimation of Population Variance Using the Coefficient of Kurtosis and Median of an Auxiliary Variable under Simple Random Sampling ()
1. Introduction
It is notable that the appropriate use of auxiliary information in probability sampling designs yields considerable reduction in the variance of the estimators of population parameters namely, population mean, median,variance,regression coefficient and population correlation coefficient. [1] was the first to show the contribution of known auxiliary information in improving the efficiency of the estimator of the population mean
in survey sampling.
Survey samplings now touch almost every field of scientific study, including demography, education, energy, transportation, health care, economics, forestry, sociology, politics and so on. In fact it is not an exaggeration to say that much of the data that are statistically analyzed are collected in surveys. It is imperative to note that as the demand in use of surveys increase, the need for more effective methods of analyzing and interpreting the resulting data is inevitable. Measure of precision being a prime requirement of good surveys and appear now in most analysis hence the need to be obtained for almost each estimate derived from the survey data.
On regular instances we encounter surveys in which an auxiliary variable x is relatively cheap (with regard to time and money) to monitor than the study variable y. Use of auxiliary information can increase the precision of an estimator when the study variable y is highly correlated with auxiliary variable x. In reality such situations do occur when information is available in the form of auxiliary variable, which is highly correlated with study variable, for example, number of trees in an orchard and the yield of fruits.
The most common and widely used measure of precision is the variance of the survey estimator. In reality population variances are always not known but must be estimated from the survey data themselves. In this study we are interested in the estimation of population variance using known auxiliary information under simple random sampling without replacement (SRSWOR) sampling scheme. The precision of estimators under this situation is always increased, the ratio, product and regression estimators gives better outcome than those of simple random samplings.
Consider a finite population
of N distinct identifiable units. Let Y be our study variable and X be its corresponding auxiliary variable. Suppose we take a random sample of size n from this bivariate population
that is
, for
using a Simple Random Sampling Without Replacement (SRSWOR) method. Let
and
be the population means of the study and auxiliary variable respectively and their corresponding sample means be
and
.
This study focuses on improving the efficiency in the estimation of
(1)
using the coefficient of kurtosis and median.
We define the following notations that we will use throughout the article. For the population observations we have;
,
,
,
,
.
Also we define the following from the sample observations:
,
,
,
,
.
In general, we define the following parameters:
(2)
(3)
Thus we note the following;
,
, and
;
,
such that;
is the coefficient of variation for the study variable y,
is the coefficient of variation for the auxiliary variable x and
coefficient of correlation between x and y,
coefficient of kurtosis for the study variable,
coefficient of kurtosis for the auxiliary variable and
population median of the auxiliary variable.
Many authors have come up with more precise estimators by employing prior knowledge of certain population parameter(s). [2] for example attempted use of the coefficient of variation of study variable but prove inadequate for in practice, this parameter is unknown. Motivated by [2] work, [3] [4] and [5] used the known coefficient of variation but now that of the auxiliary variable for estimating population mean of study variable. Reasoning along the same path [6] used the prior value of coefficient of kurtosis of an auxiliary variable in estimating the population variance of the study variable y.
Kurtosis in most cases is not reported or used in many research articles, in spite of the fact that fundamentally speaking every statistical package provides a measure of kurtosis. This maybe attributed to the likelihood that kurtosis is not well understood or its importance in various aspects of statistical analysis has not been explored fully. Kurtosis can simply be expressed as
(4)
where
―the expectation operator,
―the mean,
―the fourth moment about the mean and
―the standard deviation.
Median being the middlemost value in a distribution (when the values are arranged in ascending or descending order) has the advantage of being less affected by the outliers and skewed data, thus is preferred to the mean especially when the distribution is not symmetrical. We can therefore utilize the median and the coefficient of kurtosis of the auxiliary variable to derive a more precise ratio type population variance.
2. Existing Population Variance Estimators
In this section we have reviewed some finite population variance estimators existing in literature which will help in the construction and development of the proposed estimator. Notably, when auxiliary information is not available the usual unbiased estimator to the population variance is
(5)
The bias and MSE of
(6)
(7)
where
Population variance,
estimation using auxiliary information was considered by [7] , and proposed ratio type population variance estimator, given by
(8)
The bias and Mean Squared Error of Isaki’s estimator,
(9)
(10)
where
[6] initiated the use of coefficient of kurtosis in estimating population variance of a study variable y. Later, the coefficient of kurtosis was used by [3] [5] [8] in the estimating the population mean.
[9] using the known information on both
and
suggested modified ratio type population variance estimator for
as
(11)
The estimator,
bias and MSE obtained as
(12)
(13)
where
[10] suggested four modified ratio type variance estimators using known values of
and
,
(14)
(15)
(16)
(17)
The biases and MSE of their estimators,
(18)
(19)
(20)
(21)
(22)
(23)
(24)
(25)
where;
;
;
;
.
[11] utilizing population median
came up with a modified ratio type population variance estimator as
(26)
The bias and MSE of their estimator
,
(27)
(28)
where,
.
[12] using the known quartiles (upper and lower quartile
and
respectively) of the auxiliary variable x suggested
(29)
(30)
The biases and MSE of their estimators
and
as follows
(31)
(32)
(33)
(34)
where
and
. Motivated by [10] and [11] [13]
considered the estimation of finite population variance using known coefficient of variation and median of an auxiliary variable, proposed an estimator.
(35)
The bias and MSE obtained to be,
(36)
(37)
where
.
3. Proposed Estimator
Motivated by the works of [14] [9] [15] [13] [10] and [16] in the improvement of the performance of the population variance estimator of the study variable using known population parameters of an auxiliary variable. We propose the following modified ratio type population variance estimator using a known value of population coefficient of kurtosis
and median
of an auxiliary variable.
(38)
To calculate the bias and the MSE of
,
We let
and
or
and
so that
and to the first degreee of approximations
(39)
(40)
(41)
The expectations are obtained following the works of [17] [18] [19] and [20] .
Now expressing
in terms of
we have
(42)
where
, we assume that
so that
is expandable.
Expanding the right hand side of (42) and multiplying out we have
(43)
Neglecting terms of
having power greater than two we have
(44)
Taking the expectation on both sides of (44)
(45)
we have our bias
(46)
Squaring both sides of (44) and neglecting terms of
having power greater than two we have
(47)
Taking the expectation on both sides of (47)
(48)
We get the
estimator’s Mean Squared Error as
(49)
4. Theoretical Comparison
The theoretical conditions under which the proposed modified ratio type estimators
is more efficient than the other existing estimators
, from MSE of
given to the first degree of approximation in general as
(50)
Using Equations ((49) and (50)) we have that
,
if
(51)
5. Empirical Studies
Using the data from Population I (Source:[ [21] , 228]) and Population II (source: [22] ). We assess the performance of the proposed estimator when simple random sampling without replacement (SRSWOR) scheme is used with that of sample variance and existing estimators.We apply the proposed and existing estimators to this data set and the data statistics are given below:
Population I
= Fixed capital
= output of 80 factories
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Population II
= acreage under wheat crop in 1973
= acreage under wheat crop in 1974,
,
,
,
,
,
,
,
,
,
,
,
,
,
.
Using the above summary values we have the results in Table 1 below. From the table Mean Squared Errors it is clear that our proposed modified ratio type population variance estimator
has the least Mean Squared Error (MSE).
The efficiency of our proposed estimator
is examined numerically by its Percentage Relative Efficiency (PRE(s)) in comparison with those of existing
Table 1. Bias and Mean Squared Errors (MSE).
estimators using real populations from [ [21] , p.228] and [22] .
We have computed the PRE(s) of the estimators
using the formulae
(52)
(53)
(54)
Then PRE for our proposed estimator is subsequently,
(55)
(56)
(57)
Using formula (54) and (57) we compute the Percent Relative Efficiencies and tabulate the results in Table 2.
Percentage Relative efficiency being a robust statistical tool that is used to
Table 2. Percent Relative Efficiencies (PRE).
measure and ascertain the efficiency of one estimator over another. From the findings summarized in the table above it is clear that our proposed estimator
performed best, that is it has the highest PRE among all the other estimators. This therefore implies that we can apply our proposed estimator to appropriate practical situations and obtain better and more efficient results than the traditional and other existing population variance estimators.
6. Conclusions
In this study we have proposed a modified ratio type population variance estimator using known population parameters the coefficient of kurtosis and the median of the auxiliary variable x.
We have analyzed the performance of our proposed estimator against the usual unbiased variance estimator and existing estimators using two natural populations by comparing their PRE(s).
Based on the results of our studies, it is evidenced that our proposed estimator works better than the other existing estimators having the highest Percentage Relative Efficiency hence can be applied to practical applications, where knowledge of population parameters of auxiliary variable is available. We also recommend that our proposed estimator can be further improved by extending the number of Taylor’s series terms to be more than two.
Acknowledgements
We give much appreciation to the authors for the numerous and valuable contribution to this work.
Conflicts of Interest
The authors declare that there is no conflict of interest in the publication of this paper.