Re-Testing in Batch Testing Model Based on Quality Control Process for Proportion Estimation

Abstract

The quality of products manufactured or procured by organizations is an important aspect of their survival in the global market. The quality control processes put in place by organizations can be resource-intensive but substantial savings can be realized by using acceptance sampling in conjunction with batch testing. This paper considers the batch testing model based on the quality control process where batches that test positive are re-tested. The results show that re-testing greatly improves the efficiency over one stage batch testing based on quality control. This is observed using Asymptotic Relative Efficiency (ARE), where for values of p computed ARE > 1 implying that our estimator has a smaller variance than the one-stage batch testing. Also, it was found that the model is more efficient than the classical two-stage batch testing for relatively high values of proportion.

Share and Cite:

Wanyonyi, R. , Mwangi, O. and Mwangi, C. (2021) Re-Testing in Batch Testing Model Based on Quality Control Process for Proportion Estimation. Open Journal of Statistics, 11, 123-136. doi: 10.4236/ojs.2021.111007.

1. Introduction

Batch testing, also known as group testing or pool screening, is a method that combines individual items into several batches of a certain size [1]. Then the batches are tested instead of individual items. If a batch tests positive, it is assumed that at least one item in the batch is positive; otherwise it is assumed that all items in the batch are negative. Batch testing provides cost-effective method of screening a population for a given trait when the proportion of the trait in the population is low and the tests involved are expensive and sensitive.

Batch testing, which has two main objectives; classification and estimation, has had early major contributions from [2] and [3]. It first appeared in statistics literature in the context of screening blood samples obtained from military inductees for the presence of a disease [2]. However, this method is currently applied in different fields such as safety of blood products [4], disease screening in humans [5], animals [6] or plants [7], genetics [8] and purity of seeds [9].

In implementing quality control procedures, a sample of items is taken from the produced or procured lot and pooled in batches. Each batch is tested with the appropriate test and rejected if the number of items with the trait in the batch exceeds a predetermined cut off value, l. Otherwise, the batch is accepted [10]. Ideally, the tests are normally assumed to be perfect but occasionally these tests are prone to misclassification errors due to dilution [11], blockers [5] and [5] or non-representative sample [12]. But misclassification is usually kept at the minimal level. Depending on the cost and benefits, an organization concerned with the production or procurement of the items may decide to re-test batches that initially test positive. Re-testing enables organization to reduce misclassification and also facilitates the investigation of dilution effect [13] and [14]. Therefore, it is in this background that we develop this article and the article is arranged as follows; Section 2 gives the literature review, that is, outlines the estimator of a proportion obtained using batch testing in quality control with perfect and imperfect tests. Section 3 illustrates the proposed batch testing model in quality control with re-testing and in Section 4; the proposed model is compared with the models proposed by [15] and [16]. Lastly, Section 5 gives brief discussion and conclusion.

2. Literature Review

The quality of products procured from or released to the market is a major concern to most if not all organizations. Thus, many of the organizations set up quality control processes to ensure quality standards of their products are met. However, quality control processes are sometimes cost-intensive, destructive and/or may take much time to get the results. To mitigate against these issues acceptance sampling is usually employed where a lot of items are either accepting or rejecting without inspecting each and every item in it. A lot is classified as positive if the number of its items with a trait of interest is greater than a cut-off value otherwise it is marked negative [10]. Further savings on testing resources can be realized if the items are tested in a batch instead of individual testing [17].

Application of batch testing in quality control is similar to threshold batch testing that was first introduced by [18], where he considered two cut-off values; lower and upper, and derived an algorithm that identifies the defective items with minimum number of tests. Other authors also studied the threshold batching testing with the view of improving on the identification algorithm and among them are [19].

Apart from identification of defectives, batch testing is also concerned with estimation of proportion of a trait in a population [20]. This was extended to quality control process by [15], in which they considered batch testing model with only the lower cut-off value with perfect tests. The maximum likelihood estimator of the proportion was derived and its properties investigated. It was found out that by introducing the cut-off value, the estimator of proportion became more efficient than existing batch testing for large values of the proportion.

An aspect of concern when using batch testing is misclassifications of items due to blockers [20] or dilution effect [5]. But it is [5] who first study estimation of a proportion by batch testing with imperfect tests. This concept of imperfect tests was later applied to batch testing in quality control process by [21].

Much of the recent studies have focused on determining more efficient estimator and optimal group sizes when the tests are not 100% perfect. For instance, [22] developed multistage adaptive batching testing model and [23], proposed a combination of experiments. These batch testing strategies produced estimators of the proportion that performed better than existing models in the presence test errors. [24] suggested a batch testing procedure that incorporates re-testing which is noted to recover lost sensitivity or specificity.

This paper considers batch testing model in quality control processes with re-testing and compares it to one stage batch testing in quality control and the usual two-stage batch testing.

2.1. Estimation of a Proportion Using Batch Testing Model Based on Quality Control

Consider a finite population with N items and each item can be classified as good or bad depending on the presence or absence of a trait and let p, be the unknown proportion of items with the trait in the population. Suppose that the population can be divided into n batches each of size k. A batch is rejected if the number of items with the trait in the batch is greater than a predefined threshold or cutoff value l. The probability of rejecting a batch is

π ( p ) = 1 F ( l ) (1)

where

F ( l ) = d = 0 l ( k d ) p d ( 1 p ) k d

Suppose X out of the n batches test positive on the test. Here X is a random variable. According to [2], X follows a binomial (n, π ( p ) ). The Maximum Likelihood Estimator of p is obtained by solving Equation (2);

d = 0 l ( k d ) p d ( 1 p ) k d = 1 x n (2)

Equation (2) has no solution in closed form except when l = 0 , which leads to the results obtained by [3] among others as

p ^ = 1 ( 1 x n ) 1 k (3)

Hence, the proposed estimator generalizes [3] results. When l > 0 , Equation (2) can be solved iteratively. The maximum likelihood estimator of p is found to be positively biased for p and that the bias is negligible for small values of p but can be very high for large values of p [15].

The asymptotic variance is obtained from the Fisher’s information given by;

{ E ( 2 log L ( . ) p 2 ) } 1 (4)

which gives;

var ( p ^ ) = π ( p ) [ 1 π ( p ) ] p 2 ( 1 p ) 2 n [ E p k F ( l ) ] 2 (5)

where

E = d = 0 l d ( k d ) p d ( 1 p ) k d

Notice that if the cut off value l = 0, the variance of the estimator becomes;

var ( p ^ T ) = 1 ( 1 p T ) k n k 2 ( 1 p T ) k 2 (6)

as obtained by [3] and other authors. Thus, the maximum likelihood estimator of p for batch testing based on quality control is a generalization of Thompson’s original estimator in the [2] model.

Utilizing the delta method p ^ is asymptotically normally distributed [25]. That is, for fixed k and l and n we have,

n ( p ^ p ) d N o r m a l ( 0 , π ( p ) [ 1 π ( p ) ] p 2 ( 1 p ) 2 [ E p k F ( l ) ] 2 ) (7)

2.2. Estimation of a Proportion Using Batch Testing Model Based on Quality Control with Imperfect Tests

The batch testing model based on quality control procedure with imperfect tests is applied. The probability that a batch is classified as positive is;

π 0 ( p ) = η ( 1 F ( l ) ) + ( 1 ϕ ) F ( l ) (8)

where are the sensitivity and specificity of the tests assumed to be constant in the course of testing. The sensitivity of a test means the probability of correctly detecting a positive batch while specificity is the probability of correctly identifying a negative batch. If n batches of size k are tested andX0 batches test positive, then the likelihood function is;

L ( p / n , x 0 , η , ϕ ) = ( n x 0 ) [ π 1 ( p ) ] x 0 [ 1 π 1 ( p ) ] n x 0 (9)

Using maximum likelihood method estimation method, the estimator of p is obtained by solving;

d = 0 l ( k d ) p d ( 1 p ) k d = η x n η + ϕ 1 (10)

For l = 0, Equation (10) reduces to

p ^ 0 = 1 { η x n η + ϕ 1 } 1 k

This is a result obtained by [16]. However, Equation (10) has no solution in closed form when l > 0 . Therefore, the equation can be solved iteratively using an R or MATLAB code that is easily developed.

The asymptotic variance is obtained from the Fisher’s information given by Equation (4) to get;

var ( p ^ 0 ) = π 0 ( p ) ( 1 π 0 ( p ) ) p 2 ( 1 p ) 2 n [ { η + ϕ 1 } { E p k F ( l ) } ] 2 (11)

Notice that if the cut off value l = 0 and η = ϕ = 1 then the variance of the estimator reduces to Equation (6). Thus the estimator obtained in this section generalizes both the estimator obtained in Section (2) and Thompson’s estimator.

3. Estimation of a Proportion in the Proposed Model

In our proposed testing model, the finite population with N items is pooled into n batches each of size k. The batches are then tested and a batch that contains more than l items with a trait of interest is classified as positive; otherwise it is negative. Further, batches that test positive are given a re-test and a batch that has more than l items with a trait is labeled positive and testing is stopped. The probability of declaring a batch negative on initial test;

π 1 ( p ) = ( 1 η ) ( 1 F ( l ) ) + ϕ F ( l ) (12)

The probability of declaring a batch negative on re-test after initially testing positive;

π 2 ( p ) = η ( 1 η ) ( 1 F ( l ) ) + ϕ ( 1 ϕ ) F ( l ) (13)

Suppose that X1 and X2 are batches that test negative initially and test negative on re-test out of n batches respectively, then the likelihood function for the proposed model is;

L ( p / n , x _ , η , ϕ ) α [ π 1 ( p ) ] x 1 [ π 2 ( p ) ] x 2 [ 1 π 1 ( p ) π 2 ( p ) ] n x 1 x 2 (14)

The maximum likelihood estimator of p is obtained by solving

L ( . ) q = 0 (15)

Equation (15) has no solution in closed form except when l = 0. Therefore, the equation is solved iteratively with the aid codes developed using R or MATLAB software.

Next, we consider the calculation of the asymptotic variance of our estimator. This variance is used to calculate the asymptotic relative efficiency of this estimator with respect to other estimators. The variance is obtained by computing Equation (4) to get;

var ( p ^ 1 ) = π 1 ( p ) π 2 ( p ) ( 1 π 1 ( p ) π 2 ( p ) ) p 2 ( 1 p ) 2 n [ { E p k F ( l ) } ] 2 Y (16)

where

Y = ( η + ϕ 1 ) 2 π 2 ( p ) ( 1 π 2 ( p ) ) + ( ϕ ( 1 ϕ ) η ( 1 η ) ) 2 π 1 ( p ) ( 1 π 1 ( p ) ) + 2 ( η + ϕ 1 ) ( ϕ ( 1 ϕ ) η ( 1 η ) ) π 1 ( p ) π 2 ( p ) (17)

The derivation of the variance is discussed in the Appendix. This variance is used to determine the Wald interval of p as

p ^ 1 ± z α / 2 var ( p ^ 1 ) (18)

where is ( 1 α ) the confidence level.

4. Model Comparison

In this section we compare our proposed model with the model outlined in Section (3) and that proposed by [16]. This is accomplished by computing Asymptotic Relative Efficiency (ARE). If the other estimators are denoted by p ^ B and our estimator is denoted by p ^ 1 , then

ARE = var ( p ^ B ) var ( p ^ 1 ) (19)

Therefore, ARE > 1 implies that the proposed model is more efficient than the other models. First, we check whether re-testing improves efficiency by comparing the proposed model with the one-stage testing procedure outlined in Section (3). The sensitivity and specificity of the tests are set at 99% and 95%. When sensitivity and specificity are set to 99% the results are shown in Figure 1 and Table 1 below.

For l = 0, the comparison is between re-testing in batch testing with one-stage [2] model. Clearly, from Table 1 and Table 2 re-testing improves efficiency and this just confirms what other authors have noted [16] and [14]. When l is more than 0, we compare one stage batch testing model based on quality control and re-testing in the quality control model. It is noted that re-testing substantially improves efficiency. For instance, if p = 0.01, k = 10 and η = ϕ = 95 % , re-testing improves efficiency by 18 times over one stage batch testing in quality control. This increases to about 47 times when η = ϕ = 99 % .

Next, we compare the proposed model with the one developed by [16] in which batches that test negative initially are given a re-test. He demonstrated that the model was more efficient than [2] model with imperfect test for relatively higher values ofp. The results are presented in Table 3 and Table 4.

Figure 1. The figures show a plot of Asymptotic Relative Efficiency (ARE) against p ^ 1 for varying values of k.

Table 1. ARE of p ^ 1 with respect to one-stage quality control procedure with η = ϕ = 99 % .

Table 2. ARE of p ^ 1 with respect to one-stage quality control procedure with η = ϕ = 95 % .

Table 3. ARE of p ^ 1 with respect to Nyongesa (2011) model with η = ϕ = 99 % .

Table 4. ARE of p ^ 1 with respect to Nyongesa (2011) model with η = ϕ = 95 % .

It is evident from the tables above the proposed model is more efficient that the model proposed by [16] when rate of defectiveness is relatively high. For example, if l = 1, k = 15, p = 0.01 and η = ϕ = 99 % , the proposed model is 26 times more efficient than [16] model.

5. Discussion and Conclusion

Batch testing model applied in quality control where batches that test positive initially are re-tested has been developed. The maximum likelihood estimator of p and its asymptotic variance derived. The asymptotic variance of the estimator is used to compare our model with two other models; one stage batch testing model based on quality control [15] and one proposed by [16]. This is accomplished by computing the asymptotic relative efficiency of the estimator.

The results show that our estimator generally has a small variance as compared to the estimators obtained from the other two models for relatively high values of p for given cut off values, batch sizes, sensitivity and specificity.

This work advances the field of batch testing by introducing cut off value greater than zero and further generalizing the model first introduced by [2] in which the cut off value is strictly equal to zero. This case is particularly encountered in quality control. We recommend that a model in which negative batches are re-tested is considered. If some of the negative batches test positive on re-test, then it indicates the presence of test errors.

Appendix. Derivation of Asymptotic Variance

We consider the computation of the asymptotic variance of our estimator as presented in Equation (16). The variance is computed by solving for

{ E ( 2 log L ( . ) p 2 ) } 1

But

log L ( . ) p = x 1 π 1 ( p ) π 1 ( p ) + x 2 π 2 ( p ) π 2 ( p ) + x 3 π 3 ( p ) π 3 ( p ) (20)

and

π 1 ( p ) = ( η + ϕ 1 ) ( E p k F ( l ) p ( 1 p ) )

π 2 ( p ) = [ ϕ ( 1 ϕ ) η ( 1 η ) ] ( E p k F ( l ) p ( 1 p ) )

Note that

π 3 ( p ) = 1 π 1 ( p ) π 2 ( p )

Hence

π 3 ( p ) = [ π 1 ( p ) + π 2 ( p ) ]

Getting the second derivative

2 L ( . ) p 2 = x [ π 1 ( p ) π 1 ( p ) ( π 1 ( p ) ) 2 π 1 2 ( p ) ] + x 2 [ π 2 ( p ) π 2 ( p ) + ( π 2 ( p ) ) 2 π 2 2 ( p ) ] + [ π 3 ( p ) π 3 ( p ) ( π 3 ( p ) ) 2 π 3 2 ( p ) ] (21)

Taking expectations of Equation (21), we obtain

E ( 2 log L ( . ) p 2 ) = n [ ( π 1 ( p ) ) 2 π 1 ( p ) + ( π 2 ( p ) ) 2 π 2 ( p ) + ( π 3 ( p ) ) 2 π 3 ( p ) ] = n π 1 ( p ) π 2 ( p ) π 3 ( p ) [ π 2 ( p ) ( 1 π 2 ( p ) ) ( π 1 ( p ) ) 2 + π 1 ( p ) ( 1 π 1 ( p ) ) ( π 2 ( p ) ) 2 + 2 π 1 ( p ) π 2 ( p ) π 1 ( p ) π 2 ( p ) ] (22)

Implying that

var ( p ^ 1 ) = π 1 ( p ) π 2 ( p ) ( 1 π 1 ( p ) π 2 ( p ) ) p 2 ( 1 p ) 2 n [ { E p k F ( l ) } ] 2 Y (23)

where

Y = ( η + ϕ 1 ) 2 π 2 ( p ) ( 1 π 2 ( p ) ) + ( ϕ ( 1 ϕ ) η ( 1 η ) ) 2 π 1 ( p ) ( 1 π 1 ( p ) ) + 2 ( η + ϕ 1 ) ( ϕ ( 1 ϕ ) η ( 1 η ) ) π 1 ( p ) π 2 ( p )

completing the proof as demanded in the text.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] Mwangi, O.W., Islam, A. and Luke, O. (2015) Bootstrap Confidence Intervals for Proportions of Unequal Sized Groups Adjusted for Overdispersion. Open Journal of Statistics, 5, 502-510.
https://doi.org/10.4236/ojs.2015.56052
[2] Dorfman, R. (1943) The Detection of Defective Members of Large Populations. Annals of Mathematical Statistics, 14, 436-440.
https://doi.org/10.1214/aoms/1177731363
[3] Thompson, K.H. (1962) Estimation of the Proportion of Vectors in a Natural Population of Insects. Biometrics, 18, 568-578.
https://doi.org/10.2307/2527902
[4] Gastwirth, J.L. and Johnson, W.O. (1994) Screening with Cost-Effective Quality Control: Potential Applications to HIV and Drug Testing. Journal of the American Statistical Association, 89, 972-981.
https://doi.org/10.1080/01621459.1994.10476831
[5] Tu, X., Litvak, E. and Pagano, M. (1995) On the Informativeness and Accuracy of Pooled Testing in Estimating Prevalence of a Rare Disease: Application to HIV Screening. Biometrika, 82, 287-297.
https://doi.org/10.1093/biomet/82.2.287
[6] McV Messam, L.L., Branscum, A.J., Collins, M.T. and Gardner, I.A. (2008) Frequentist and Bayesian Approaches to Prevalence Estimation Using Examples from Johne’s Disease. Animal Health Research Reviews, 9, 1-23.
https://doi.org/10.1017/S1466252307001314
[7] Swallow, W.H. (1985) Group Testing for Estimating Infection Rates and Probabilities of Disease Transmission. Phytopathology, 75, 882-889.
https://doi.org/10.1094/Phyto-75-882
[8] Chick, S.E. (1996) Bayesian Models for Limiting Dilution Assay and Group Test Data. Biometrics, 52, 1055-1062.
https://doi.org/10.2307/2533066
[9] Montesinos-López, O., Montesinos-López, A., Crossa, J., Eskridge, K. and Sáenz, R.A. (2011) Erratum: Optimal Sample Size for Estimating the Proportion of Transgenic Plants Using the Dorfman Model with a Random Confidence Interval. Seed Science Research, 21, 235-245.
https://doi.org/10.1017/S0960258511000055
[10] Montgomery, D.C. (2020) Introduction to Statistical Quality Control. John Wiley & Sons, Hoboken.
[11] Hwang, F.K. (1976) Group Testing with a Dilution Effect. Biometrika, 63, 671-680.
https://doi.org/10.1093/biomet/63.3.671
[12] Remund, K.M., Dixon, D.A., Wright, D.L. and Holden, L.R. (2001) Statistical Considerations in Seed Purity Testing for Transgenic Traits. Seed Science Research, 11, 101-119.
http://europepmc.org/abstract/AGR/IND23232809
[13] Kennedy, N.L. (2004) Multistage Group Testing Procedure (Group Screening). Communications in Statistics—Simulation and Computation, 33, 621-637.
https://doi.org/10.1081/SAC-200033231
[14] Brookmeyer, R. (1999) Analysis of Multistage Pooling Studies of Biological Specimens for Estimating Disease Incidence and Prevalence. Biometrics, 55, 608-612.
https://doi.org/10.1111/j.0006-341X.1999.00608.x
[15] Wanyonyi, R.W., Nyongesa, K. and Wasike, A.A.M. (2015) Estimation of Proportion of a Trait by Batch Testing Model in a Quality Control Process. American Journal of Theoretical and Applied Statistics, 4, 619-629.
https://doi.org/10.11648/j.ajtas.20150406.34
[16] Kennedy, N.L. (2011) Dual Estimation of Prevalence and Disease Incidence in Pool-Testing Strategy. Communications in Statistics—Theory and Methods, 40, 3218-3229.
https://doi.org/10.1080/03610926.2010.493257
[17] Kline, R.L., Brothers, T.A., Brookmeyer, R., Zeger, S. and Quinn, T.C. (1989) Evaluation of Human Immunodeficiency Virus Seroprevalence in Population Surveys Using Pooled Sera. Journal of Clinical Microbiology, 27, 1449-1452.
https://doi.org/10.1128/JCM.27.7.1449-1452.1989
[18] Damaschke, P. (2006) Threshold Group Testing. In: Ahlswede, R., Baumer, L., Cai, N., Aydinian, H., Blinovsky, V. and Deppe, C., Eds., General Theory of Information Transfer and Combinatorics, Springer, Berlin, 707-718.
[19] Chen, H.-B. and De Bonis, A. (2011) An Almost Optimal Algorithm for Generalized Threshold Group Testing with Inhibitors. Journal of Computational Biology, 18, 851-864.
https://doi.org/10.1089/cmb.2010.0030
[20] Xie, M., Tatsuoka, K., Sacks, J. and Young, S.S. (2001) Group Testing with Blockers and Synergism. Journal of the American Statistical Association, 96, 92-102.
https://doi.org/10.1198/016214501750333009
[21] Wanyonyi, R.W., Nyongesa, K.L. and Wasike, A. (2015) Estimation of Proportion of a Trait by Batch Testing with Errors in Inspection in a Quality Control Process. International Journal of Statistics and Applications, 5, 268-278.
http://article.sapub.org/10.5923.j.statistics.20150506.02.html
https://doi.org/10.11648/j.ajtas.20150406.34
[22] Okoth, A.W., et al. (2017) Multi-Stage Adaptive Pool Testing Model with Test Errors, Improved Efficiency. IOSR Journal of Mathematics, 13, 43-55.
https://doi.org/10.9790/5728-1301024355
[23] Matiri, G., Nyongesa, K. and Islam, A. (2017) Sequentially Selecting Between Two Experiment for Optimal Estimation of a Trait with Misclassification. American Journal of Theoretical and Applied Statistics, 6, 79-89.
https://doi.org/10.11648/j.ajtas.20170602.12
[24] Lk, N. (2018) Multiple-Test Pool-Testing Strategy for Estimating HIV/AIDS-Prevalence and Its Extension to Multi-Stage.
[25] Lehmann, E.L. and Casella, G. (1998) Theory of Point Estimation (Springer Texts in Statistics). Second Edition, Springer, Berlin.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.