_{1}

This paper presents four methods of constructing the confidence interval for the proportion p of the binomial distribution. Evidence in the literature indicates the standard Wald confidence interval for the binomial proportion is inaccurate, especially for extreme values of p. Even for moderately large sample sizes, the coverage probabilities of the Wald confidence interval prove to be erratic for extreme values of p. Three alternative confidence intervals, namely, Wilson confidence interval, Clopper-Pearson interval, and likelihood interval , are compared to the Wald confidence interval on the basis of coverage probability and expected length by means of simulation.

Estimation of a binomial proportion p is one of the most commonly encountered statistical problems, with important application in areas such as clinical medicine, business, politics and quality control. For instance, politicians are certainly interested in knowing what fraction of voters would favor them in the next election. The binomial data is obtained from a binomial experiment which consists of a fixed number n of independent Bernoulli trials, each of which can result in either a success or a failure. The success probability p is assumed fixed. The binomial probability distribution is used to model the total numberx of success resulting from the Binomial experiment. Once data are available, then information about p can be summarized by the likelihood function and on the basis of this summary, a point estimate for the Binomial proportion, denoted by

p ^ is obtained by the method of maximum likelihood as p ⌢ = x n . A number of

two-sided confidence intervals for p have been proposed by several authors. The Wald method is the most commonly used technique since it is based normal approximation to the binomial distribution. However, the approximation is inaccurate whenever the sample size is small (n < 30) or when the proportion p is close to zero or one; the Wald confidence interval may have low coverage probability even if p is not close to zero or one, and confidence limits outside the interval ( 0 , 1 ) . Matiri et al. [

Alternative methods for constructing confidence interval for p have been proposed, such as the Wilson Score, Clopper-Pearson and Agresti-Coull confidence intervals among others. Just like the Wald confidence intervals, the validity of the Wilson Confidence interval heavily depends on large sample approximation. The Clopper-Pearson interval is an exact two-sided confidence interval derived from the binomial probability mass function. Past studies indicate that the Clopper-Pearson confidence interval is very conservative for small to moderate n [

This paper considers an alternative method, called the likelihood method, for constructing the approximate confidence interval for the binomial proportion. The likelihood intervals are determined from the graph of the relative likelihood function or its logarithm for a fixed likelihood level [

In order to identify the best confidence interval for the binomial proportion p, the Wald, Wilson score, Pearson-Clopper and Likelihood methods of interval estimation are compared on the basis of coverage probability and interval width using simulated data. The four intervals are also applied to a real data example. The resulting confidence intervals for the binomial proportion are compared in terms interval width and plausibilities of the parameter values in them.

The paper is organized as follows: in Section 2, the four methods of interval estimation are described. In Section 3, the simulation results regarding coverage probability and expected length of the different intervals are presented and discussed. Section 4 applies the four intervals to a real-life data from a clinical study and compares them in terms of interval length and plausibilities of the parameter values inside them. Section 5 is devoted to concluding remarks.

Let X 1 , ⋯ , X n be IID Bernoulli (p) random variables, where the parameter

p ∈ ( 0 , 1 ) is unknown. Then the sum X = ∑ i = 1 n X i of the n Bernoulli random

variables is a binomial random variable with parameters n andp. If the unknown proportion p is not too close to 0 or 1, then by the Central Limit

Theorem, for n sufficiently large, the MLE p ^ = X n is approximately normally distributed with mean μ p ^ = p and variance σ p ^ 2 = p ( 1 − p ) n . The Wald confidence interval is based on the normal approximation to the binomial distribution and is given by p ^ ± z α 2 p ^ ( 1 − p ^ ) n , where z α 2 is the 1 − α 2 percentile of

the standard normal distribution. The Wald method should be used only when n ∗ min ( p , 1 − p ) is at least 5 (or 10), otherwise it will produce unreliable interval estimates.

Clopper-Pearson [

Theorem 1

If X ~ B e t a ( α , β ) then Z = 1 − X ~ B e t a ( β , α )

Proof

The density function of X is given by f ( x ) = Γ ( α + β ) Γ ( x ) Γ ( β ) x α − 1 ( 1 − x ) β − 1 . By change of variable technique the density function of Z is obtained as

f z ( z ) = f x ( 1 − z ) | d x d z | = Γ ( α + β ) Γ ( x ) Γ ( β ) ( 1 − z ) α − 1 z β − 1 ,

which is the density function of a beta distribution with parameters β and α. Implying that Z ~ B e t a ( β , α ) .

Theorem 2

If X ~ B i n ( n , p ) then P p [ X ≥ x ] = P [ Y ≤ p ] , where Y ~ B e t a ( x , n − x + 1 )

Proof

Consider the identity

∑ k = 0 x ( n k ) p k ( 1 − p ) n − k = ( n − x ) ( n x ) ∫ 0 1 − p t n − x − 1 ( 1 − t ) x d t , (i)

We use the above identity to obtain

P [ X ≥ x ] = 1 − P [ X ≤ x − 1 ]

= 1 − ∑ k = 0 x − 1 ( n k ) p k ( 1 − p ) n − k

= 1 − ( n − x + 1 ) ( n x − 1 ) ∫ 0 1 − p t n − ( x − 1 ) − 1 ( 1 − t ) x − 1 d t

= 1 − Γ ( n + 1 ) Γ ( x ) Γ ( n − x + 1 ) ∫ 0 1 − p t n − ( x − 1 ) − 1 ( 1 − t ) x − 1 d t

= 1 − P [ T ≤ 1 − p ] ,

where

T ~ B e t a ( n − x + 1 , x )

= P [ T ≥ 1 − p ]

= P [ − T ≤ p − 1 ]

= P [ 1 − T ≤ p ]

= P [ Y ≤ p ]

where Y = 1 − T .

Hence it follows by Theorem 1 that Y ~ B e t a ( x , n − x + 1 ) .

Theorem 3

If X has an F distribution with u and v degrees of freedom, then the random variable Y = u v X 1 + u v X has a B e t a ( u 2 , v 2 ) distribution.

Proof

Let y = u v x 1 + u v x . Then x = y 1 − y v u and d x d y = 1 ( 1 − y ) 2 v u . By the change of variable technique the density function of Y is obtained as

f Y ( y ) = f X ( y 1 − y v u ) | d x d y |

= Γ ( u + v 2 ) ( u v ) u 2 ( y 1 − y v u ) u 2 − 1 Γ ( u 2 ) Γ ( v 2 ) ( 1 + y 1 − y ) u + v 2 1 ( 1 − y ) 2 v u

= Γ ( u + v 2 ) y u 2 − 1 ( 1 − y ) v 2 − 1 Γ ( u 2 ) Γ ( v 2 )

which is the density function of a B e t a ( u 2 , v 2 ) distribution. Hence Y ~ B e t a ( u 2 , v 2 ) .

The above three theorems are now applied in the derivation of the closed forms of the lower and upper confidence limits of the Clopper-Pearson interval for the binomial proportion p as follows: Suppose that Y ~ B e t a ( x , n − x + 1 ) , where x is the observed value of a B i n ( n , p ) random variable X, then by

Theorem 3 the random variable n − x + 1 x Y 1 − Y has an F distribution with 2x

and 2 ( n − x + 1 ) degrees of freedom. Therefore for a fixed α ∈ ( 0 , 1 ) , the lower limit of a two-sided exact Clopper-Pearson interval is obtained by solving the equation,

α 2 = P [ X ≥ x ]

By Theorem 2 we have

α 2 = P [ X ≥ x ] = P [ Y ≤ p ]

where Y ~ B e t a ( x , n − x + 1 )

= P [ n − x + 1 x Y 1 − Y ≤ n − x + 1 X p 1 − p ]

= P [ F 2 x , 2 ( n − x + 1 ) ≤ n − x + 1 x p 1 − p ] ,

where F 2 x , 2 ( n − x + 1 ) is an F random variable with 2x and 2 ( n − x + 1 ) degrees of freedom. This implies that f 1 − α 2 , 2 x , 2 ( n − x + 1 ) = n − x + 1 x p 1 − p and solving for p we get 1 1 + n − x + 1 x f α 2 , 2 ( n − x + 1 ) , 2 x as the lower limit.

Similarly, the upper limit is obtained by solving the equation

α 2 = P [ X ≤ x ]

Equivalently, we write

α 2 = P [ X ≤ x ]

= 1 − P [ X ≥ x + 1 ]

= 1 − P [ T ≤ 1 − p ] ,

where

T ~ B e t a ( n − x , x + 1 )

= P [ T ≥ 1 − p ]

= P [ 1 − T ≥ p ]

= P [ Y ≥ p ] ,

where

Y ~ B e t a ( x + 1 , n − x )

= P [ n − x x + 1 Y 1 − Y ≥ n − x x + 1 p 1 − p ]

= P [ F 2 ( x + 1 ) , 2 ( n − x ) ≥ n − x x + 1 p 1 − p ] .

Solving this equation for p yields x + 1 n − x f α 2 , 2 ( x + 1 ) , 2 ( n − x ) 1 + x + 1 n − x f α 2 , 2 ( x + 1 ) , 2 ( n − x ) as the upper limit.

Therefore, the 100 ( 1 − α ) % exact Clopper-Pearson confidence interval for p becomes

1 1 + n − x + 1 x f α 2 , 2 ( n − x + 1 ) , 2 x ≤ p ≤ x + 1 n − x f α 2 , 2 ( x + 1 ) , 2 ( n − x ) 1 + x + 1 n − x f α 2 , 2 ( x + 1 ) , 2 ( n − x ) .

Let x be the observed value of a B i n ( n , p ) random variable X. The likelihood function of p is defined as

L ( p ) = k P [ X = x ; p ] ,

where k is any positive constant not depending on p. We choose k to simplify the expression for L ( p ) and a natural choice is k = 1 ( n x ) . Then binomial likelihood function is

L ( p ) = p x ( 1 − p ) n − x for 0 < p < 1 .

The log-likelihood function is now

l ( p ) = x log ( p ) + ( n − x ) log ( 1 − p ) , for 0 < p < 1 .

The relative likelihood function of p, denoted by R ( p ) is given by

R ( p ) = L ( p ) L ( p ^ ) = p x ( 1 − p ) n − x ( x n ) x ( 1 − x n ) n − x = ( n p x ) x ( n ( 1 − p ) n − x ) n − x

The log-relative likelihood function of p, denoted by r ( p ) is

r ( p ) = log R ( p ) = l ( p ) − l ( p ^ ) = x log ( p ) + ( n − x ) log ( 1 − p ) − l ( p ^ ) .

The likelihood intervals may be determined from a graph of R ( p ) or its logarithm, r ( p ) although it is more convenient to work with r ( p ) . The set of p values for which R ( p ) ≥ c is called a 100 c % likelihood interval (LI). The maximum likelihood estimate (MLE) p, of p ^ is the most plausible value of p in that it makes the observed sample most probable. The relative-likelihood function measures the plausibility of any specific value of p relative to that of p ^ . The end points of the 100 c % likelihood interval (LI) are obtained as the roots of the equation r ( p ) − log ( c ) = 0 . The use of a numerical procedure is usually necessary to solve this equation. In repeated samples from the parent distribution B i n ( n , p ) using arbitrary value of p, the resulting population of level c likelihood intervals will contain this value of p with known frequency. They are therefore also confidence intervals and so are likelihood confidence intervals.

The Wilson score method for constructing confidence interval for binomial proportion p was developed by Edward B. Wilson [

quadratic inequality − z α 2 ≤ p ︷ − p p q / n ≤ z α 2 for p. This confidence interval is of the form ( 2 n p ︷ + z α 2 2 ) ± z α 2 2 + 4 n p ︷ ( 1 − p ︷ ) 2 ( n + z α 2 2 ) . The score confidence interval is

asymmetric and does not suffer from problems of overshoot and zero width confidence intervals associated with Wald confidence interval.

In this section the simulation studies are carried out and finite-sample comparisons of the performances of the Wald, Cloper-Pearson, Wilson score and Likelihood intervals on the basis of coverage probability and expected length. For any confidence interval method for estimating of p, the actual coverage probability at a fixed value of p is

C p ( p , n ) = ∑ k = 0 n I ( k , n ) ( n k ) p k ( 1 − p ) n − k ,

where I ( k , n ) equals 1 if the interval contains p when X = k and equals 0 if it does not contain p. Denote by L ( X ) and U ( X ) the lower and upper confidence limits, respectively. The expected length of this interval

E L ( p , n ) = ∑ k = 0 n ( n k ) p k ( 1 − p ) n − k [ U ( x ) − L ( x ) ]

The coverage probability and expected length were computed for 1000 values of p, equally spaced in the interval (0.2, 0.8) for sample sizes n = 15, 30, 50 and 100, and for nominal 95% Clopper-Pearson, Wilson score, Wald and likelihood confidence intervals. For each sample size and for each method summary values for coverage probability and expected length are obtained by averaging over the values of p used in the simulation.

Method | n = 15 | n = 30 | n = 50 | n = 100 |
---|---|---|---|---|

Clopper-Pearson | 0.974 (0.479) | 0.969 (0.347) | 0.965 (0.269) | 0.961 (0.190) |

Wilson score | 0.957 (0.416) | 0.951 (0.313) | 0.950 (0.2487) | 0.950 (0.180) |

Wald | 0.913 (0.456) | 0.931 (0.328) | 0.939 (0.256) | 0.945 (0.182) |

Likelihood | 0.951(0.431) | 0.948(0.319) | 0.950 (0.216) | 0.949 (0.81) |

for Wilson interval are very close to the nominal level and has the smallest mean expected length for all n. On the other hand, the traditional Wald interval has mean coverage probabilities which are smaller than the nominal level. Finally, the mean coverage probabilities for likelihood interval are very close to the nominal level for all the sample sizes.

For a large sample n = 50, the same pattern is observed but there is a remarkable improvement in terms convergence of coverage probabilities and reduced expected lengths. Clopper-Pearson is still conservative and show convergence to a value above the nominal level. Most coverage probabilities for Wald interval are still below nominal level and show poor convergence. The Wilson and Likelihood interval again are better than Clopper-Pearson and Wald interval in terms of the two performance measures (

The four methods of interval estimation are applied in a clinical study about the effectiveness of hyperdynamic therapy in treating cerebral vasospasm [

The likelihood interval looks optimal by evidence presented in

Clopper-Pearson interval is conservative for both small and large samples; however, it is always wider than it should. The Wald interval is well known and frequently used in statistical practice. Unfortunately, according to the above simulation study, its coverage probabilities are lower than the nominal level and are associated with problem of overshoot. Therefore, the inferential comparisons and judgements based on them might be misleading. On the other hand, Wilson and Likelihood intervals have coverage probabilities near the nominal level and shorter lengths. Wilson interval for the real data application is wider than the likelihood interval and includes implausible values of the parameter. In summary, the Wilson and Likelihood intervals are recommended to be used in practice. It is worth noting the Likelihood interval looks superior to Wilson interval in that it is shorter and includes plausible values of the parameter p. The likelihood method has one drawback in the sense that it does not produce an interval when the number of successes x is 0 or n.

The author declares no conflicts of interest regarding the publication of this paper.

Orawo, L.A. (2021) Confidence Intervals for the Binomial Proportion: A Comparison of Four Methods. Open Journal of Statistics, 11, 806-816. https://doi.org/10.4236/ojs.2021.115047