^{1}

^{*}

^{1}

^{*}

^{1}

^{*}

Group testing is a method of pooling a number of units together and performing a single test on the resulting group. It is an appealing option when few individual units are thought to be infected leading to reduced costs of testing as compared to individually testing the units. Group testing aims to identify the positive groups in all the groups tested or to estimate the proportion of positives (p) in a population. Interval estimation methods of the proportions in group testing for unequal group sizes adjusted for overdispersion have been examined. Lately improvement in statistical methods allows the construction of highly accurate confidence intervals (CIs). The aim here is to apply group testing for estimation and generate highly accurate Bootstrap confidence intervals (CIs) for the proportion of defective or positive units in particular. This study provided a comparison of several proven methods of constructing CIs for a binomial proportion after adjusting for overdispersion in group testing with groups of unequal sizes. Bootstrap resampling was applied on data simulated from binomial distribution, and confidence intervals with high coverage probabilities were produced. This data was assumed to be overdispersed and independent between groups but correlated within these groups. Interval estimation methods based on the Wald, the Logit and Complementary log-log (CLL) functions were considered. The criterion used in the comparisons is mainly the coverage probabilities attained by nominal 95% CIs, though interval width is also regarded. Bootstrapping produced CIs with high coverage probabilities for each of the three interval methods.

Group testing started or originated with [

Group tests save resources since many units are tested without testing them individually. One of their advantages as a method of estimation is they are time efficient. In most of the studies carried out, the experimental unit/group is a litter. It has been observed that there is a tendency of littermates to respond more alike than animals from different litters, the “litter effect”. This litter effect is also known as the extra-dispersion (over/under- dispersion) or the intra-litter correlation. These litters may be of equal or unequal sizes. The concern here is with methods of establishing the confidence interval for the proportion of defective units p in unequal groups with adjustment for overdispersion using bootstrap technique.

Overdispersion is the phenomenon of having greater variability than predicted by the random component of the model; this is common in the modeling of binomial distribution for group testing [

Maximum likelihood estimation gives a unified approach to estimation, which is well-defined, in binomial distribution and many other problems. The maximum likelihood as an estimator has been studied and seconded as an approach for using the proposed extended Beta-Binomial (BB) model to analyze over/under-dispersed proportions [

Studies on point estimation in terms of bias and efficiency and the test for presence of overdispersion for both counts and proportions data have been done [

The bootstrap method was introduced by [

To understand bootstrap, suppose it were possible to draw repeated samples (of the same size) from the population of interest, a large number of times. Then, one would get a fairly good idea about the sampling distribution of a particular statistic from the collection of its values arising from these repeated samples. But, that does not make sense as it would be too expensive and defeat the purpose of a sample study. The purpose of a sample study is to gather information cheaply in a timely fashion. The idea behind bootstrap is to use the data of a sample study at hand as a “surrogate population”, for the purpose of approximating the sampling distribution of a statistic, that is, to resample (with replacement) from the sample data at hand and create a large number of “phantom samples” known as bootstrap samples. These samples can be used to obtain more improved estimation of unknown parameter (s) of a probability model.

Recently, study has been done on the construction of bootstrap confidence intervals for the overdispersion parameter in equal proportions in Beta Binomial using Maximum Likelihood Estimator, Method of Moments Estimator and Quasi-likelihood [

Overdispersion causes one to underestimate the variance of parameter estimates. A quasi-likelihood approach can be employed to correct for the overdispersion phenomenon which occurs with binary data [

Suppose for

When groups of unequal size occur and overdispersion occurs together, the variance of the quasi-score function of p is

The maximum quasi-likelihood estimate of p denoted as

where

And

Under the usual limiting conditions on the

Then the

where

Asymptotic-likelihood methods to construct the

Let

tiable, then the information that Y contains about

For the logit transformation

Thus the information of logit (p) is

After adjusting for overdispersion, the

where

Hepworth also presented the CLL parameter transformations.

Suppose

And

The information of

Then the

where

A lot of attention has been devoted in studies to problems involving the group sizes, k. The question being what values should be chosen for k, the number of units in each group. If k is too large, π is close to 1 and all groups are likely to test positive, also if k is too small, π is closer to 0 [

Each Interval was examined for its Interval width and coverage probability. Confidence Intervals based on MLE ordering and Sterne’s technique for a dilution assay with 64 outcomes was compared [

Coverage probability gives the proportion of the time that the interval contains p. Bootstrapping enabled us to estimate the coverage of the CIs with respect to the number of hypothetical repetitions of the entire procedure. The nominal coverage will be set at 0.95. The actual coverage of the three methods will be compared to the nominal coverage, greater coverage than the nominal is preferable and hence conservative.

First, positive groups, _{i}’s and then computing the estimate of p. Coverage probabilities of the three interval methods for the estimate of p for 500 bootstrap simulations are obtained, and the program is run repetitively in order to see how the three method’s coverage probabilities vary.

The nominal coverage probability is set at 0.95.

This is then compared to estimating p from the same data that is the Y_{i}’s without bootstrapping and the program run repetitively so as to calculate the coverage probability.

The number of bootstraps B’s were varied for the same combination of p,

Next p was varied for the same N_{i}, k_{i} and the same number of bootstrap simulations (500), and coverage probabilities are obtained.

Lastly, interval widths for the data are calculated for bootstrapping technique for the three interval methods and compared to those intervals got from each of the three methods without bootstrapping. This is in order to investigate whether bootstrapping gives more precise intervals.

Group testing was primarily used to inspect individual members from a large population [

In

Method | Wald | CLL | Logit |
---|---|---|---|

0.938 | 0.9461 | 0.946 | |

0.954 | 0.948 | 0.948 | |

p=0.02 | 0.942 | 0.9441 | 0.943 |

0.938 | 0.952 | 0.952 | |

0.95 | 0.958 | 0.958 | |

0.932 | 0.9481 | 0.948 |

Method | Wald | CLL | Logit |
---|---|---|---|

0.812 | 0.862 | 0.862 | |

0.822 | 0.873 | 0.87 | |

0.832 | 0.912 | 0.91 | |

p = 0.02 | 0.81 | 0.852 | 0.85 |

0.802 | 0.844 | 0.844 | |

0.804 | 0.812 | 0.81 |

Number of | |||
---|---|---|---|

Simulations (B) | Wald | CLL | Logit |

200 | 0.937 | 0.945 | 0.94 |

400 | 0.936 | 0.9486 | 0.948 |

600 | 0.9283 | 0.9417 | 0.9416 |

800 | 0.935 | 0.9512 | 0.951 |

1000 | 0.932 | 0.945 | 0.945 |

1500 | 0.94 | 0.951 | 0.952 |

2000 | 0.944 | 0.9556 | 0.9556 |

2500 | 0.9525 | 0.9565 | 0.956 |

3000 | 0.951 | 0.9562 | 0.9561 |

3500 | 0.93514 | 0.95457 | 0.9544 |

4000 | 0.93425 | 0.94525 | 0.9451 |

5000 | 0.9416 | 0.9556 | 0.9554 |

7000 | 0.9405 | 0.9521 | 0.952 |

10,000 | 0.9512 | 0.967 | 0.967 |

bootstrap simulations increased, all the three methods became more precise, that is, their coverage probabilities hovered slightly above and below the nominal coverage probability.

Group testing works best when the prevalence rate or the proportion is sufficiently small. According to

From

According to

Coverage Probability | |||
---|---|---|---|

Probability | Wald | CLL | Logit |

0.01 | 0.904 | 0.922 | 0.922 |

0.03 | 0.92 | 0.934 | 0.93 |

0.04 | 0.982 | 0.986 | 0.984 |

0.06 | 0.988 | 0.99 | 0.99 |

0.08 | 0.99 | 0.992 | 0.99 |

0.1 | 0.988 | 0.992 | 0.99 |

0.2 | 0.842 | 0.857 | 0.857 |

Interval Method | Y = (2,2,4,3,3) | ||
---|---|---|---|

Upper Confidence Limit | Lower Confidence Limit | Interval Width | |

Wald | 0.02654614 | 0.01587489 | 0.01067125 |

CLL | 0.02726701 | 0.01648786 | 0.01077915 |

Logit | 0.02725691 | 0.01648266 | 0.01077426 |

Interval Method | Y = (2,2,4,3,3) | ||
---|---|---|---|

Upper Confidence Limit | Lower Confidence Limit | Interval Width | |

Wald | 0.02673085 | 0.016212005 | 0.0106108 |

CLL | 0.02743541 | 0.01672070 | 0.01071471 |

Logit | 0.02742548 | 0.01671553 | 0.01070995 |

Bootstrap methods produce good confidence intervals by an order of magnitude upon the accuracy of standard intervals [

Olivia WanjeriMwangi,AliIslam,OrawoLuke, (2015) Bootstrap Confidence Intervals for Bootstrap Confidence Intervals for Adjusted for Overdispersion. Open Journal of Statistics,05,502-510. doi: 10.4236/ojs.2015.56052