Confidence Intervals for the Binomial Proportion: A Comparison of Four Methods ()
1. Introduction
Estimation of a binomial proportion p is one of the most commonly encountered statistical problems, with important application in areas such as clinical medicine, business, politics and quality control. For instance, politicians are certainly interested in knowing what fraction of voters would favor them in the next election. The binomial data is obtained from a binomial experiment which consists of a fixed number n of independent Bernoulli trials, each of which can result in either a success or a failure. The success probability p is assumed fixed. The binomial probability distribution is used to model the total numberx of success resulting from the Binomial experiment. Once data are available, then information about p can be summarized by the likelihood function and on the basis of this summary, a point estimate for the Binomial proportion, denoted by
is obtained by the method of maximum likelihood as
. A number of
two-sided confidence intervals for p have been proposed by several authors. The Wald method is the most commonly used technique since it is based normal approximation to the binomial distribution. However, the approximation is inaccurate whenever the sample size is small (n < 30) or when the proportion p is close to zero or one; the Wald confidence interval may have low coverage probability even if p is not close to zero or one, and confidence limits outside the interval
. Matiri et al. [1] applied the Wald method to obtain interval estimates for the prevalence rate and encountered the problem of overshoot with negative lower confidence limits. Poor performance of the Wald confidence interval has been pointed out by many authors [2] [3] [4] [5] [6].
Alternative methods for constructing confidence interval for p have been proposed, such as the Wilson Score, Clopper-Pearson and Agresti-Coull confidence intervals among others. Just like the Wald confidence intervals, the validity of the Wilson Confidence interval heavily depends on large sample approximation. The Clopper-Pearson interval is an exact two-sided confidence interval derived from the binomial probability mass function. Past studies indicate that the Clopper-Pearson confidence interval is very conservative for small to moderate n [3]. Panatiogis & Konstantinos [7] present a bootstrap method for estimating the binomial proportion and compare it with Wald confidence interval and Agresti-Coull interval.
This paper considers an alternative method, called the likelihood method, for constructing the approximate confidence interval for the binomial proportion. The likelihood intervals are determined from the graph of the relative likelihood function or its logarithm for a fixed likelihood level [8]. They are fully conditioned on the shape of the likelihood function and hence are optimal. The likelihood method can be used to construct confidence interval for the proportion in situations where the traditional methods based on asymptotic normality are inaccurate.
In order to identify the best confidence interval for the binomial proportion p, the Wald, Wilson score, Pearson-Clopper and Likelihood methods of interval estimation are compared on the basis of coverage probability and interval width using simulated data. The four intervals are also applied to a real data example. The resulting confidence intervals for the binomial proportion are compared in terms interval width and plausibilities of the parameter values in them.
The paper is organized as follows: in Section 2, the four methods of interval estimation are described. In Section 3, the simulation results regarding coverage probability and expected length of the different intervals are presented and discussed. Section 4 applies the four intervals to a real-life data from a clinical study and compares them in terms of interval length and plausibilities of the parameter values inside them. Section 5 is devoted to concluding remarks.
2. Interval Methods
2.1. Wald Interval
Let
be IID Bernoulli (p) random variables, where the parameter
is unknown. Then the sum
of the n Bernoulli random
variables is a binomial random variable with parameters n andp. If the unknown proportion p is not too close to 0 or 1, then by the Central Limit
Theorem, for n sufficiently large, the MLE
is approximately normally distributed with mean
and variance
. The Wald confidence interval is based on the normal approximation to the binomial distribution and is given by
, where
is the
percentile of
the standard normal distribution. The Wald method should be used only when
is at least 5 (or 10), otherwise it will produce unreliable interval estimates.
2.2. Clopper-Pearson Interval
Clopper-Pearson [9] proposed a method of constructing an exact two-sided confidence interval for the binomial proportion p using the equal-tail rule. The derivation of the two-sided
Clopper-Pearson confidence interval for the binomial proportion p is based on the relationships between the binomial, beta and F distributions. The relationships are stated in the following three theorems.
Theorem 1
If
then
Proof
The density function of X is given by
. By change of variable technique the density function of Z is obtained as
,
which is the density function of a beta distribution with parameters β and α. Implying that
.
Theorem 2
If
then
, where
Proof
Consider the identity
, (i)
We use the above identity to obtain
,
where
where
.
Hence it follows by Theorem 1 that
.
Theorem 3
If X has an F distribution with u and v degrees of freedom, then the random variable
has a
distribution.
Proof
Let
. Then
and
. By the change of variable technique the density function of Y is obtained as
which is the density function of a
distribution. Hence
.
The above three theorems are now applied in the derivation of the closed forms of the lower and upper confidence limits of the Clopper-Pearson interval for the binomial proportion p as follows: Suppose that
, where x is the observed value of a
random variable X, then by
Theorem 3 the random variable
has an F distribution with 2x
and
degrees of freedom. Therefore for a fixed
, the lower limit of a two-sided exact Clopper-Pearson interval is obtained by solving the equation,
By Theorem 2 we have
where
,
where
is an F random variable with 2x and
degrees of freedom. This implies that
and solving for p we get
as the lower limit.
Similarly, the upper limit is obtained by solving the equation
Equivalently, we write
,
where
,
where
.
Solving this equation for p yields
as the upper limit.
Therefore, the
exact Clopper-Pearson confidence interval for p becomes
.
2.3. Likelihood Interval
Let x be the observed value of a
random variable X. The likelihood function of p is defined as
,
where k is any positive constant not depending on p. We choose k to simplify the expression for
and a natural choice is
. Then binomial likelihood function is
for
.
The log-likelihood function is now
, for
.
The relative likelihood function of p, denoted by
is given by
The log-relative likelihood function of p, denoted by
is
.
The likelihood intervals may be determined from a graph of
or its logarithm,
although it is more convenient to work with
. The set of p values for which
is called a
likelihood interval (LI). The maximum likelihood estimate (MLE) p, of
is the most plausible value of p in that it makes the observed sample most probable. The relative-likelihood function measures the plausibility of any specific value of p relative to that of
. The end points of the
likelihood interval (LI) are obtained as the roots of the equation
. The use of a numerical procedure is usually necessary to solve this equation. In repeated samples from the parent distribution
using arbitrary value of p, the resulting population of level c likelihood intervals will contain this value of p with known frequency. They are therefore also confidence intervals and so are likelihood confidence intervals.
2.4. Wilson Interval
The Wilson score method for constructing confidence interval for binomial proportion p was developed by Edward B. Wilson [10] and is based on inverting the z-test for p. The endpoints of the
is obtained by solving the
quadratic inequality
for p. This confidence interval is of the form
. The score confidence interval is
asymmetric and does not suffer from problems of overshoot and zero width confidence intervals associated with Wald confidence interval.
3. Simulations
In this section the simulation studies are carried out and finite-sample comparisons of the performances of the Wald, Cloper-Pearson, Wilson score and Likelihood intervals on the basis of coverage probability and expected length. For any confidence interval method for estimating of p, the actual coverage probability at a fixed value of p is
,
where
equals 1 if the interval contains p when
and equals 0 if it does not contain p. Denote by
and
the lower and upper confidence limits, respectively. The expected length of this interval
The coverage probability and expected length were computed for 1000 values of p, equally spaced in the interval (0.2, 0.8) for sample sizes n = 15, 30, 50 and 100, and for nominal 95% Clopper-Pearson, Wilson score, Wald and likelihood confidence intervals. For each sample size and for each method summary values for coverage probability and expected length are obtained by averaging over the values of p used in the simulation. Table 1 below shows the mean of the actual coverage probabilities for the four methods of interval estimation at various sample sizes. The Clopper-Pearson interval is very conservative but has the highest mean interval length for all the values of n. The mean coverage probabilities
Table 1. Mean coverage probabilities and mean expected lengths (in parentheses) of nominal 95% confidence intervals for the binomial parameter p.
for Wilson interval are very close to the nominal level and has the smallest mean expected length for all n. On the other hand, the traditional Wald interval has mean coverage probabilities which are smaller than the nominal level. Finally, the mean coverage probabilities for likelihood interval are very close to the nominal level for all the sample sizes.
Figure 1 and Figure 2 show, respectively, plots of coverage probability and expected length against the values of p for the four intervals when n = 15. It can be noted from Figure 1 that for the Wald interval most coverage probabilities are below the nominal level and are extremely low for values of p near 0.2 or 0.8. This may be due to poor normality approximation when the sample is small and p not close to 0.5. Clopper-Interval has coverage probabilities above the nominal level and with short spikes, but has the largest expected lengths. The Wilson and likelihood intervals are not conservative but their coverage probabilities are close to the nominal level and have smaller expected lengths than Clopper-Pearson and Wald intervals (see Figure 2).
For a large sample n = 50, the same pattern is observed but there is a remarkable improvement in terms convergence of coverage probabilities and reduced expected lengths. Clopper-Pearson is still conservative and show convergence to a value above the nominal level. Most coverage probabilities for Wald interval are still below nominal level and show poor convergence. The Wilson and Likelihood interval again are better than Clopper-Pearson and Wald interval in terms of the two performance measures (Figure 3 and Figure 4).
4. Application to Real Example
The four methods of interval estimation are applied in a clinical study about the effectiveness of hyperdynamic therapy in treating cerebral vasospasm [11]. The success of the therapy was defined as clinical improvement in terms of neurological deficits. The study reported 16 successes out of 17 patients. On the basis of this data the four 95% confidence intervals are computed as 1) (0.7131, 0.9985) for the Clopper-Pearson interval, 2) (0.7302, 0.9895) for Wilson interval, (0.8289, 1.053) for Wald interval, and 3) (0.7658, 0.9965) for likelihood interval. Each of these four confidence intervals is plotted on the graph of relative likelihood function as shown in Figure 5. It is observed that the Clopper-Pearson and Wilson intervals include implausible values of the parameter p whereas the Wald interval excludes plausible values and has an upper limit greater than 1.
Figure 1. Coverage probabilities for n = 15.
Figure 3. Coverage probabilities for n = 50.
Figure 5. Confidence intervals plotted on the graph of relative likelihood function.
The likelihood interval looks optimal by evidence presented in Table 1. With these four confidence intervals we can conclude that hyperdynamic therapy is an effective method for treating ischaemic neurological symptoms due to vasospasm.
5. Conclusion
Clopper-Pearson interval is conservative for both small and large samples; however, it is always wider than it should. The Wald interval is well known and frequently used in statistical practice. Unfortunately, according to the above simulation study, its coverage probabilities are lower than the nominal level and are associated with problem of overshoot. Therefore, the inferential comparisons and judgements based on them might be misleading. On the other hand, Wilson and Likelihood intervals have coverage probabilities near the nominal level and shorter lengths. Wilson interval for the real data application is wider than the likelihood interval and includes implausible values of the parameter. In summary, the Wilson and Likelihood intervals are recommended to be used in practice. It is worth noting the Likelihood interval looks superior to Wilson interval in that it is shorter and includes plausible values of the parameter p. The likelihood method has one drawback in the sense that it does not produce an interval when the number of successes x is 0 or n.