Improved Estimation of Rare Sensitive Attribute in a Stratified Sampling Using Poisson Distribution

In this study, we propose a two stage randomized response model. Improved unbiased estimators of the mean number of persons possessing a rare sensitive attribute under two different situations are proposed. The proposed estimators are evaluated using a relative efficiency comparison. It is shown that our estimators are efficient as compared to existing estimators when the parameter of rare unrelated attribute is known and in unknown case, depending on the probability of selecting a question.


Introduction
The collection of data through direct questioning on rare sensitive issues such as extramarital affairs, family disturbances and declaring religious affiliation in extremism condition is far-reaching issue.Warner [1] introduced the randomized response procedure to procure trustworthy data for estimating π , the proportion of respondents in the population belonging to the sensitive group.Greenberg et al. [2] suggested an unrelated question randomized response model in which each individual selected in the samples was asked to reply "yes" or "no" to one of two statements: (a) Do you belong to Group A? (b) Do you belong to Group Y? with respective probabilities P and ( ) responses 0 θ , defined by them is ( ) . Mangat and Singh [3] proposed a two stage randomized response procedure which required the use of two randomization devices.The random device 1 R consists of two statements namely (a) I belong to the sensitive group, and (b) Go to random device 2 R , with probabilities T and ( ) 1 T − respectively.The random device 2 R which uses two statements (a) I belong to the sensitive group, and (b) I do not belong to the sensitive group with known probabilities P and ( ) 1 P − respectively.Then 0 θ , the probability of yes responses is Later on, different modifications have been made to improve the methodology for collection of information.Some of them are Lee et al. [4], Chaudhuri and Mukerjee [5], Mahmood et al. [6], Land et al. [7], Bhargava and Singh [8].
Land et al. [7] proposed the estimators for the mean number of persons possessing the rare sensitive attribute using the unrelated question randomized response model by utilizing a Poisson distribution.Recently, Lee et al. [4] extended the Land et al.'s [7] study to stratify sampling and propose the estimators when the parameter of rare unrelated attribute is known and unknown.
In this study, we propose improved estimators for the mean and its variance of the number of persons possessing a rare sensitive attribute based on stratified sampling by using Poisson distribution.The estimators are proposed when the parameter of the rare unrelated attribute is known and unknown.The proposed estimators are evaluated using a relative efficiency comparing the variances of the estimators reported in Lee et al. [4].

Improved Estimation of a Rare Sensitive Attribute in Stratified
Sampling-Known Rare Unrelated Attributes Consider the population of size N individuals which is divided into L subpopulations (strata) of sizes ( ) . All the subpopulations are disjoint and together comprise the whole population.In stratum h, h n respondent are selected by simple random sampling with replacement (SRSWR) and asked to use the pair of randomization devices  respectively.By this randomized device, the probability of a yes response in stratum h is given by where hA π and hY π are the population proportions of individuals possessing rare sensitive and rare unrelated attributes in the th h stratum, respectively.Here hY π is assumed to be known.Since A and Y are very rare attributes, be an h n random sample in stratum h from a Poisson distribution with parameter 0 h λ .
Then the maximum likelihood estimator for the mean number of persons who have the rare sensitive attribute in stratum h, ( ) where hY h hY n λ π = is (known) mean of persons who have rare unrelated attribute in stratum h.The parameter A λ , is the mean number of persons possessing rare sensitive attribute A, in a population of size N and its esti- mator ˆA λ is given by ( ) { } ( )( ) where The variance of the estimator ˆhA λ in each stratum is given by ( ) ) Thus, the variance expression of the estimator ˆA λ may be derived as ( ) THEOREM 1. ˆA λ is an unbiased estimator of A λ .
( ) Now, we consider the proportional and optimal allocations of the total sample size n into different strata.The method of proportional allocation is used to define sample sizes in each stratum depending on each stratum size.Since the sample size in each stratum is defined as h h n nN N = , the variance of the estimator ˆA λ , under pro- portional allocation of sample size is given by ( ) However, the optimal allocation is a technique to define sample size to minimize variance for a given cost or to minimize the cost for a specified variance.The h n is proportionate to the standard deviation, h S of the variable.In stratified sampling, let cost function is defined as , where 0 c is the fixed cost and h c is the cost for the each individual stratum.Within each stratum the cost is proportional to the size of sample, but the cost h c may vary from stratum to stratum.For fixed cost, using the Cauchy Schwarz inequality, the sample size h n to minimize ( ) So the minimum variance of the estimator for the specified cost C under the optimum allocation of sample size is given by ( )

Improved Estimation of a Rare Sensitive Attribute in Stratified Sampling-Unknown Rare Unrelated Attributes
In this section, the estimators for the mean number of rare sensitive attribute are proposed under the assumptions that the sizes of stratum are known; however, hY h hY n λ π = , the mean of the rare unrelated attribute is unknown.In this case each selected respondent from stratum h is asked to use the sequential pair of randomization devices.That in the h th stratum, h n , respondents are asked to use the randomization devices .After using the first pair of randomized devices, respondent is asked to use the same pair of devices  T , ( ) , respectively.The probabilities of the yes responses for the first and second use of pair of randomization devices are respectively given by ( ) ( ) ) where hA π and hY π are the respective population proportions of rare sensitive and rare unrelated attribute in the stratum h.As h n is large and ( ) , 0 hA hY π π → , therefore ( ) x and  ) be the pair of responses from the ith respondent selected in h th stratum.We have Following the expression given in Equations ( 12) and (13), we have the sample means for both set of responses as ( ) ( ) and By solving (15) and ( 16), we get estimators of hA λ and hY λ as where Puttinng ( 12), ( 13) and ( 14) in (19) we get where The stratified estimators of A λ and Y λ are defined as , and THEOREM 3. ˆA λ is an unbiased estimator for A λ . Proof.
( ) Putting the values of 1 h λ and 2 h λ in Equation ( 22), we get the result.
THEOREM 4. The variance of ˆA λ is given by where On putting (20) in (24) we have the theorem.Corollary 1: An unbiased estimator for the variance of rare sensitive attribute is given by ( ) It can be proved easily.THEOREM 5. ˆY λ is an unbiased estimator of Y λ .Proof.From (18), we have ( ) Corollary 2: An unbiased estimator for ( ) where Now under proportional allocation of sample size, the variance of ˆA λ is given by However, in optimum allocation, the sample size in stratum h is ( ) ( ) and the variance of ˆA λ is given by ( )

Relative Efficiency
Lee et al. [4] proposed variance of ˆA λ for rare sensitive attribute based on Poisson distribution when the rare unrelated attribute known and unknown respectively is: For comparison of the proposed estimator with ( ) V λ , the relative efficiency is given by ( ) ( ) Large samples are required to estimate means of rare sensitive attribute.So we consider large hypothetical population, in order to study the relative efficiency, setting 10000 n = with two strata having 1 4000 n = and 2 6000 n = . We choose values of the parameters ( ) , and we let the value 12 P range from 0.3 to 0.7, and let that of 11 P range from 0.6 to 0.9 when the weights 1 0.

Relative Efficiency When Rare Unrelated Attribute Is Known
Let ( ) 1 ˆA V λ be the variance of the proposed estimator ˆA λ for the rare sensitive attribute when the parameter of rare unrelated attribute is known.The relative efficiency of proposed estimator with respect to ( ) From Equation (29) it evident that the relative efficiency of proposed estimator is free from the sample size n.We set the design probabilities as 11 21 P P = and 12 22 P P = . In Table 1, the relative efficiencies are given with parameter values ( ) , A Y λ λ as ( ) ( ) ( ) 0.5,1.5 , 1.5, 0.5 , 1.5,1.5 and ( ) 0.5, 0.5 , 12 P varies from 0.3 to 0.7, and 11 P from 0.6 to 0.9 having weights 1 0.4, 0.6 W = ( ) It is evident that the proposed estimator has efficiency greater than 1 in all cases, and is always better than the ( ) estimator.A study of Figure 1 confirms this.

Relative Efficiency When Rare Unrelated Attribute Is Unknown
Let ( )

ˆA
V λ be the variance of the proposed estimator ˆA λ for the rare sensitive attribute when the parameter of rare unrelated attribute is unknown.The relative efficiency of proposed estimator with respect to ( ) V λ estimator is defined as The relative efficiency of proposed estimator is free from the sample size n.For the analysis, the design probabilities are fixed as , , A Y λ λ as ( ) ( ) ( ) 0.5,1.5 , 1.5, 0.5 , 1.5,1.5 and 11 0.6 P = , 11 0.3, 0.4 T = , T 12 = 0.2, 0.3, 0.4, 0.5 and 1 0.4, 0.5 W = ( ) The relative efficiencies are given in W increasesthe relative efficiency of proposed estimator increases.

Conclusion
In this study, a two stage randomized response model is proposed with improved estimators for the mean and its variance of the number of persons possessing a rare sensitive attribute based on stratified sampling by using Poisson distribution.It is shown that our proposed method have better efficiencies than the existing randomized response model, when the parameter of rare unrelated attribute is known and in unknown case, depending on the probability of selecting a question.For future work, we can obtain more sensitive information from respondents by using stratified double sampling with the proposed model.

R
consists of two statements: (i) "I possess rare sensitive attribute A" (ii) "I possess rare unrelated attribute Y"

Figure 1 .
Figure 1.Relative Efficiency (RE) of the proposed model with respect to Lee et al. [4] for W 1 = 0.4 and P 12 = 0.3 to 0.8.

Figure 2 .
Figure 2. Relative Efficiency (RE) of the proposed model with respect to Lee et al. [4] for indicated values.

Table 1 .
Relative efficiency of the proposed estimator with Lee et al. (2013).