Inferences on the Difference of Two Proportions: A Bayesian Approach ()
1. Introduction
For two independent proportions
and
, their difference is frequently encountered in the frequentist statistical literature, where tests, or confidence intervals, for
are well accepted notions in theory and in practice, although most frequently, the case under study is the equality, or inequality of these proportions. For the Bayesian approach, Pham-Gia and Turkkan ( [1] and [2] ) have considered the case of independent, and dependent proportions for inferences, and also in the context of sample size determination [3] .
But testing
is only a special case of testing
, with
being a positive constant value, which is much less frequently dealt with. In Section 2 we recall the unconditional approaches to testing
based on the maximum likelihood estimators of the two proportions and normal approximations. A new exact approach not using normal approximation has been developed by our group and will be presented elsewhere. Fisher’s exact test is also recalled here, for comparison purpose. The Bayesian approach to testing the equality of two proportions and the computation of credible intervals are given in Section 3. The Bayesian approach using the general beta distributions is given in Section 4. All related problems are completely solved, thanks to some closed form formulas that we have established in earlier papers.
2. Testing the Equality of Two Proportions
2.1. Test Using Normal Approximation
As stated before, taking
we have a test for equality between two proportions. Several well-known methods are presented in the literature. For example, the conditional test is usually called Fisher’s exact test, and is based on the hypergeometric distribution. It is used when the sample size is small. Pearson’s Chi-square test using Yates correction is usually used for intermediary sample size while Pearson’s Chi-square is used for large samples. Their appropriateness is discussed in D’Agostino et al. [4] . Normal approximation methods are based on formulas using estimated values of the mean and the variance of the two populations. For example, we have
, and the pooled version
, both being
approximately
under
. Cressie [5] gives conditions under which
is better than
, in terms of power. Previously, Eberhardt and Fligner [6] studied the same problem for a bilateral test.
Numerical Example 1
To investigate its proportions of customers in two separate geographic areas of the country, a company picks a random sample of 25 shoppers in area A, in which 17 are found to be its customers. A similar random sample of 20 shoppers in area B gives 8 customers. We wish to test the hypothesis that
against
.
We have here the observed value of
and of
which lead, in both cases, to the rejection of
at significance level 5% (the critical value is 1.64) for
.
2.2. Fisher’s Exact Test
Under
the number of successes coming from population 1 has the
distribution. The argument is that, in the combined sample of size
, with
successes from population 1 out of the total number of successes
, the number of x successes coming from population 1 is a hypergeometric variable.
To compute the significance of the observation we have to compute several tables corresponding to more extreme results than the observed table. It is known that the conditional test is less powerful than the unconditional one.
Numerical Example 2
We use the same data as in numerical example 1 to test
vs
i.e. the proportion of customers in area A is significantly higher than the one in area B. We have Table 1:
the observed data
, and also cases more extreme, which means
. The p-value of the test is hence
.
Although technically not significant at the 5% level, this result shows that the proportion of customers in area B can practically be considered as lower than the one in area A, in agreement with the frequentist test.
REMARK: The problem is often associated with a 2 ´ 2 table where there are three possibilities: constant column sums and row sums, one set constant the other variable and both variables. Other measures can then be introduced (e.g. Santner and Snell [7] ). A Bayesian approach has been carried out by several authors, e.g. Howard [8] and also Pham Gia and Turkkan [2] , who computed the credible intervals for several of these measures.
3. The Bayesian Approach
In the estimation of the difference of two proportions the Bayesian approach certainly plays an important role. Agresti and Coull [9] provide some interesting remarks on various approaches.
Again, let
. Using the Bayesian approach will certainly encounter some serious computational difficulties if we do not have a closed form expression for the density of the difference of two independently beta distributed random variables. Such an expression has been obtained by the first author some time ago and is recalled below.
3.1. Bayesian Test on the Equality of Two Proportions
Let us recall first the following theorem:
Theorem 1: Let
be two independent beta distributed random variables with parameters
and
, respectively. Then the difference
has density defined on
as follows:
(1)
is Appell’s first hypergeometric function, which is defined as
(2)
where
. This infinite series is convergent for
and
, where, as shown by Euler, it can also be expressed as a convergent integral:
(3)
which converges for
,
. In fact, Pham-Gia and Turkkan [1] established the expression of the density of the difference using (3) directly and not the series. Hence, the infinite series (5) can be extended outside the two circles of convergence, by analytic continuation, where it is also denoted by
.
Here, we denote the above density (1) by
.
Proof: See Pham-Gia and Turkkan [1] .
The prior distribution of
is hence
, obtained from the two beta priors. Various approaches in Bayesian testing are given below.
Bayesian Testing Using a Significance Level
While frequentist statistics frequently does not test
, for
and limits itself to the case
, Bayesian statistics can easily do it.
a) One-sided test:
Proposition 1: To perform the above test at the 0.05 significance level, using the two independent samples
and
, we compute
, where
and
,
. This expression of the posterior density of
, obtained by the conjugacy of binomial sampling with the beta prior, will allow us to compute
and compare it with the significance level
.
For example, as in the frequentist example of Section 2.1, we consider
,
,
,
and use two non-informative beta priors, that is,
.
We note first that
, giving
.
We obtain the prior and posterior distributions of
and
(Figure 1). We wish to test:
(4)
We have
:
has posterior probability
, and we fail to reject
at the 0.05% level. This means that data combined with our judgment is not enough to make us accept that the difference of these proportions exceeds 0.35. Naturally, different informative, or non-informative, priors can be considered for
and
separately, and the test can be carried out in the same way.
b) Point-null hypothesis:
The point null hypothesis
to be tested at the significance level
in Bayesian statistics has been a subject of study and discussion
(a)(a)
Figure 1. (a) Prior
and posterior
of
and (b) Prior
and posterior
of
.
in the literature. Several difficulties still remain concerning this case, especially on the prior probability assigned to the value
(see Berger [10] ). We use here Lindley’s compromise (Lee [11] ), which consists of computing the
highest posterior density interval and accept or reject
depending on whether
belongs or not to that interval. Here, for the same example, if
, using Pham-Gia and Turkkan’s algorithm [12] , the 95% hpd interval for
is
, which leads us to technically accept
(see Figure 2), although the lower bound of the hpd interval can be considered as zero and we can practically reject
.
We can see that the above conclusions on
are consistent with each other.
3.2. Bayesian Testing Using the Bayes Factor
Bayesian hypothesis testing can also be carried out using the Bayes factor B, which would give the relative weight of the null hypothesis w.r.t. the alternative one, when data is taken into consideration. This factor is defined as the ratio of the posterior odds over the prior odds. With the above expression of the difference of two betas given by (1) we can now accurately compute the Bayes factor associated with the difference of two proportions. We consider two cases:
a) Simple hypothesis:
. Then
, which
corresponds to the value of the posterior density of
at
, divided by the value of posterior density of
at
. As an application, let us consider the following hypotheses (different from the previous numerical example):
vs.
, where we have uniform priors for both
and
, and where we consider the sampling results from Table 1. We obtain the posterior parameters
. Using the density of the difference (1), we calculate the Bayes factor,
. This value indicates that the data slightly
favor
over
, which is a logical conclusion since
.
Figure 2. Prior
and posterior
distributions of
. The red dashed lines correspond to the bounds of the posterior 95%-hpd interval.
Table 1. Data on customers in area A and B.
b) Composite hypothesis: As an application, let us consider the hypotheses (4), that is,
vs.
.
In general,
vs.
, where
. We have
and
(or
) as posterior probabilities. Consequently, we define the posterior odds on
against
as
. Similarly, we have the prior odds on
against
,
which we define here as
. The Bayes factor is
. Again, we use the
sampling results from Table 1, yielding the prior and posterior distributions presented in Figure 1 with
prior separately for both proportions.
Now, using (4),
, we can determine the required prior
and posterior probabilities. For example,
gives
. In the same way, we obtain
, using the prior
. Since
and
, we have
and
. Finally, the Bayes factor is
, which is a mild argument in favor of
.
4. Prior and Posterior Densities of
The testing above can be seen to be quite straightforward, and is limited to some numerical values of the function
that can be numerically computed. But to make an in-depth study of the Bayesian approach to the difference
, we need to consider the analytic expressions of the prior and posterior distributions of this variable, which can be obtained only from the general beta distribution. Naturally, the related mathematical formulas become more complicated. But Pham-Gia and Turkkan [13] have also established the expression of the density of
, where both have general beta distributions.
4.1. The Difference of Two General Betas
The general beta (or GB), defined on a finite interval, say (c, d), has a density:
(5)
and is denoted by
. It reduces to the standard beta above when
and
. Conversely a standard beta can be transformed into a general beta by addition of, or/and, multiplication with a constant.
Theorem 2: Let
and any two scalars
,
. Then
1)
2)
when
. Otherwise,
when
.
Proof:
1) We have
2) For
,
When
,
Q.E.D.
Pham-Gia and Turkkan [13] gave the expression of the density of
, where
and
are independent general beta variables. The density of
, which is only mentioned there, is explicitly given below.
Proposition 2:
Let
and
. For the difference
defined on
, there are two different cases to consider, depending on the relative values of
and
, since
and
do not have symmetrical roles.
Case 1:
(6)
Case 2:
(7)
Theorem 3: Let
and
be two independent general betas with their supports satisfying (6). Then
has its density defined as follows:
For
(8)
For
(9)
and for
(10)
where
is Appell’s first hypergeometric function already discussed.
Proof:
The argument uses first part 2) of Theorem 1 to obtain that
. Then, it uses the exact expression of the density of the sum of two general betas (see Theorem 2 in the article of T. Pham-Gia & N. Turkkan [14] ).
Q.E.D.
We denote the above density given by (8), (9) and (10) by
Note: The corresponding case 2, when relation (7) is satisfied, is given in Appendix 1 (Theorem 3a).
To study the density of
, a particular case that will be used in our study here is the difference between
and
, with
being a positive constant.
In this case both Theorem 2 and Theorem 3 apply since
and the middle definition section of
disappears.
Theorem 4: Let
and
be two independent general beta distributed random variables. Then the density of
, defined on
, is:
1) for
2) for
and we denote this distribution by
. (11)
Proof:
This is a special case of Theorem 3.
Q.E.D.
An equivalent form using Theorem 4 leads to a slightly different expression, which gives however, the same numerical values for the density of
(see Theorem 4a in Appendix 1).
4.2. Prior and Posterior Distributions of
Let
be two independent beta distributed random variables, the first being a regular beta,
, and the second being a general beta,
.
Binomial sampling, with these two different beta priors, leads to the following
Proposition 3: The prior distribution of
is
, given by (11), and its posterior distribution is
with
and
Proof:
is the difference of two random variables with respective distribution
and
, The prior distributions of
is hence
, as given by (14).
Binomial sampling affects these 2 distributions in different ways. For the first, the posterior is
while the posterior distribution of the second is
(see Proposition 3a in Appendix 2). Figure 3 shows the prior and the posterior of
.
From Theorem 4, we obtain the expression of the posterior density
of
as follows:
(12)
Figure 4 shows the above density.
(a)(b)
Figure 3. (a) Prior
distribution of
and (b) Posterior
distribution of
. The posterior of
is hence given by Theorem 4, as
.
5. Conclusion
The Bayesian approach to testing the difference of two independent proportions leads to interesting results which agree with frequentist results when non-informative priors are considered. Undoubtedly, all preceding results can be
Figure 4. Posterior density
of
.
generalized to other measures frequently used in a 2 ´ 2 table.
Acknowledgements
Research partially supported by NSERC grant 9249 (Canada). The authors wish to thank the Universite de Moncton Faculty of Graduate Studies and Research for the assistance provided while conducting this work.
Appendix 1
Below is the expression of the density of
when (7) is satisfied, instead of (6). This expression, with the one given in Theorem 3, covers all cases.
Theorem 3a: Let
and
be two independent general betas with their supports satisfying (10). Then
has its density defined as follows: for
(13)
For
(14)
For
(15)
Proof:
By rewriting
, we can apply the above Theorem 2 and Theorem 3.
Q.E.D
A parallel, and equivalent, result to Theorem 4 is given below:
Theorem 4a: The density of
is:
For
For
and we denote
.
Proof:
Similar to the proof of Theorem 4.
Q.E.D
Appendix 2
Proposition 3a:
Suppose that
and
has the prior distribution
then the posterior distribution of
is
.
Proof:
The prior distribution of
is
(see Theorem 2) with the pdf
,
The likelihood function is
Thus the marginal distribution of
, the number of success, with
, has density:
Therefore, the posterior distribution of
given
is
This is the p.d.f. of
.
Q. E. D.
End