^{1}

^{*}

^{2}

^{2}

Let
π=π_{1}-π_{2} be the difference of two independent proportions related to two populations. We study the test H
_{0}:
π
≥0 against different alternatives, in the Bayesian context. The various Bayesian approaches use standard beta distributions, and are simple to derive and compute. But the more general test H
_{0}:
π
≥
η, with
η
＞0, requires more advanced mathematical tools to carry out the computations. These tools, which include the density of the difference of two general beta variables, are presented in the article, with numerical examples for illustrations to facilitate comprehension of results.

For two independent proportions π 1 and π 2 , their difference is frequently encountered in the frequentist statistical literature, where tests, or confidence intervals, for π 1 − π 2 are well accepted notions in theory and in practice, although most frequently, the case under study is the equality, or inequality of these proportions. For the Bayesian approach, Pham-Gia and Turkkan ( [

But testing π 1 = π 2 is only a special case of testing H 0 : π 1 − π 2 ≤ η , with η being a positive constant value, which is much less frequently dealt with. In Section 2 we recall the unconditional approaches to testing H 0 based on the maximum likelihood estimators of the two proportions and normal approximations. A new exact approach not using normal approximation has been developed by our group and will be presented elsewhere. Fisher’s exact test is also recalled here, for comparison purpose. The Bayesian approach to testing the equality of two proportions and the computation of credible intervals are given in Section 3. The Bayesian approach using the general beta distributions is given in Section 4. All related problems are completely solved, thanks to some closed form formulas that we have established in earlier papers.

As stated before, taking η = 0 we have a test for equality between two proportions. Several well-known methods are presented in the literature. For example, the conditional test is usually called Fisher’s exact test, and is based on the hypergeometric distribution. It is used when the sample size is small. Pearson’s Chi-square test using Yates correction is usually used for intermediary sample size while Pearson’s Chi-square is used for large samples. Their appropriateness is discussed in D’Agostino et al. [

T 1 = X 1 / n 1 − X 2 / n 2 [ ( X 1 / n 1 ) ( 1 − X 1 / n 1 ) / n 1 + ( X 2 / n 2 ) ( 1 − X 2 / n 2 ) / n 2 ] 1 / 2 , and the pooled version T 2 = X 1 / n 1 − X 2 / n 2 [ ( X 1 + X 2 ) / ( n 1 + n 2 ) ( ( 1 − ( X 1 + X 2 ) ) / ( n 1 + n 2 ) ) ( 1 / n 1 + 1 / n 2 ) ] 1 / 2 , both being

approximately N ( 0 , 1 ) under H 0 : π 1 ≤ π 2 . Cressie [

To investigate its proportions of customers in two separate geographic areas of the country, a company picks a random sample of 25 shoppers in area A, in which 17 are found to be its customers. A similar random sample of 20 shoppers in area B gives 8 customers. We wish to test the hypothesis that H 0 : π 1 ≤ π 2 against H 1 : π 1 > π 2 .

We have here the observed value of T 1 = 1.9459 and of T 2 = 1.8783 which lead, in both cases, to the rejection of H 0 at significance level 5% (the critical value is 1.64) for H 1 : π 1 > π 2 .

Under H 0 the number of successes coming from population 1 has the Hyp ( n 1 + n 2 , t = x 1 + x 2 , n 1 , x ) distribution. The argument is that, in the combined sample of size n 1 + n 2 , with x 1 successes from population 1 out of the total number of successes t = x 1 + x 2 , the number of x successes coming from population 1 is a hypergeometric variable.

To compute the significance of the observation we have to compute several tables corresponding to more extreme results than the observed table. It is known that the conditional test is less powerful than the unconditional one.

Numerical Example 2We use the same data as in numerical example 1 to test H 0 : π A = π B vs H 1 : π A > π B i.e. the proportion of customers in area A is significantly higher than the one in area B. We have

the observed data ( x B = 8 ) , and also cases more extreme, which means x B = 0 , 1 , 2 , ⋯ , 7 . The p-value of the test is hence

p -value = ∑ x B = 0 8 ( 25 25 − x B ) ( 20 x B ) ( 45 25 ) = 0.0542 .

Although technically not significant at the 5% level, this result shows that the proportion of customers in area B can practically be considered as lower than the one in area A, in agreement with the frequentist test.

REMARK: The problem is often associated with a 2 ´ 2 table where there are three possibilities: constant column sums and row sums, one set constant the other variable and both variables. Other measures can then be introduced (e.g. Santner and Snell [

In the estimation of the difference of two proportions the Bayesian approach certainly plays an important role. Agresti and Coull [

Again, let π = π 1 − π 2 . Using the Bayesian approach will certainly encounter some serious computational difficulties if we do not have a closed form expression for the density of the difference of two independently beta distributed random variables. Such an expression has been obtained by the first author some time ago and is recalled below.

Let us recall first the following theorem:

Theorem 1: Let π i ~ beta ( α i , β i ) , for i = 1 , 2 be two independent beta distributed random variables with parameters ( α 1 , β 1 ) and ( α 2 , β 2 ) , respectively. Then the difference π = π 1 − π 2 has density defined on ( − 1 , 1 ) as follows:

p π ( x ) = { B ( α 2 , β 1 ) x β 1 + β 2 − 1 ( 1 − x ) α 2 + β 1 − 1 F 1 ( β 1 , α 1 + α 2 + β 1 + β 2 − 2 , 1 − α 1 ; β 1 + α 2 ; ( ( 1 − x ) , 1 − x 2 ) ) / A , 0 ≤ x < 1 B ( α 1 + α 2 − 1 ; β 1 + β 2 − 1 ) / A , x = 0 , if α 1 + α 2 > 1 , β 1 + β 2 > 1 B ( α 1 , β 2 ) ( − x ) β 1 + β 2 − 1 ( 1 + x ) α 1 + β 2 − 1 F 1 ( β 2 , 1 − α 2 , 1 − α 2 ; α 1 + α 2 + β 1 + β 2 − 2 , α 1 + β 2 ; 1 − x 2 , 1 + x ) / A , − 1 ≤ x < 0 A = B ( α 1 , β 1 ) B ( α 2 , β 2 ) (1)

F 1 ( . ) is Appell’s first hypergeometric function, which is defined as

F 1 ( a , b 1 , b 2 ; c ; x 1 , x 2 ) = ∑ i = 0 ∞ ∑ j = 0 ∞ a [ i + j ] c [ i + j ] b 1 [ i ] b 2 [ j ] x 1 i x 2 j i ! j ! (2)

where a [ b ] = a ( a + 1 ) ⋯ ( a + b − 1 ) . This infinite series is convergent for | x 1 | < 1 and | x 2 | < 1 , where, as shown by Euler, it can also be expressed as a convergent integral:

Γ ( c ) Γ ( a ) Γ ( c − a ) ∫ 0 1 u a − 1 ( 1 − u ) c − a − 1 ( 1 − u x 1 ) − b 1 ( 1 − u x 2 ) − b 2 d u (3)

which converges for c − a > 0 , a > 0 . In fact, Pham-Gia and Turkkan [

Here, we denote the above density (1) by π ~ ψ ( α 1 , β 1 , α 2 , β 2 ) .

Proof: See Pham-Gia and Turkkan [

The prior distribution of π is hence ψ ( α 1 , β 1 , α 2 , β 2 ) , obtained from the two beta priors. Various approaches in Bayesian testing are given below.

Bayesian Testing Using a Significance LevelWhile frequentist statistics frequently does not test H 0 : π ≤ η vs . H 1 : π > η , for η > 0 and limits itself to the case η = 0 , Bayesian statistics can easily do it.

a) One-sided test:

Proposition 1: To perform the above test at the 0.05 significance level, using the two independent samples { X 1 , i } i = 1 n 1 and { X 2 , i } i = 1 n 2 , we compute p π 1 − π 2 ( π 1 − π 2 | α 1 ∗ , β 1 ∗ , α 2 ∗ β 2 ∗ ) , where α i ∗ = α i + x i and β i ∗ = β i + n i − x i , i = 1 , 2 . This expression of the posterior density of π , obtained by the conjugacy of binomial sampling with the beta prior, will allow us to compute P ( π > η ) and compare it with the significance level α .

For example, as in the frequentist example of Section 2.1, we consider n 1 = 25 , x 1 = 17 , n 2 = 20 , x 2 = 8 and use two non-informative beta priors, that is, Beta ( 0.5 , 0.5 ) .

We note first that π ^ 1 = 17 / 25 = 0.68 , π ^ 2 = 8 / 20 = 0.40 , giving π ^ = 0.28 .

We obtain the prior and posterior distributions of π 1 and π 2 (

H 0 : π ≤ 0.35 vs H 1 : π > 0.35 (4)

We have α 1 ∗ = 17.5 , β 1 ∗ = 8.5 , α 2 ∗ = 8.5 , β 2 ∗ = 12.5 : H 1 has posterior probability Pr ( π > 0.35 ) = ∫ 0.35 1 ψ ( x ; 17.5 , 8.5 , 8.5 , 12.5 ) d x = 0.2855 , and we fail to reject H 0 at the 0.05% level. This means that data combined with our judgment is not enough to make us accept that the difference of these proportions exceeds 0.35. Naturally, different informative, or non-informative, priors can be considered for π 1 and π 2 separately, and the test can be carried out in the same way.

b) Point-null hypothesis:

The point null hypothesis H 0 : π = η vs . H 1 : π ≠ η to be tested at the significance level α in Bayesian statistics has been a subject of study and discussion

in the literature. Several difficulties still remain concerning this case, especially on the prior probability assigned to the value η (see Berger [

We can see that the above conclusions on π are consistent with each other.

Bayesian hypothesis testing can also be carried out using the Bayes factor B, which would give the relative weight of the null hypothesis w.r.t. the alternative one, when data is taken into consideration. This factor is defined as the ratio of the posterior odds over the prior odds. With the above expression of the difference of two betas given by (1) we can now accurately compute the Bayes factor associated with the difference of two proportions. We consider two cases:

a) Simple hypothesis: H 0 : π = a vs H 1 : π = b . Then B = p π ( π | a ) p π ( π | b ) , which

corresponds to the value of the posterior density of π at a , divided by the value of posterior density of π at b . As an application, let us consider the following hypotheses (different from the previous numerical example): H 0 : π = 0.35 vs. H 1 : π = 0.25 , where we have uniform priors for both π 1 and π 2 , and where we consider the sampling results from

B = ψ ( 0.35 | α 1 ∗ , β 1 ∗ , α 2 ∗ , β 2 ∗ ) ψ ( 0 .25 | α 1 ∗ , β 1 ∗ , α 2 ∗ , β 2 ∗ ) = 0.8416 . This value indicates that the data slightly

favor H 1 over H 0 , which is a logical conclusion since π ^ = 0.28 .

Area | ||||
---|---|---|---|---|

A | B | Combined Response | ||

Response | Yes | 17 | 8 | 25 |

No | 8 | 12 | 20 | |

Totals | 25 | 20 | 45 |

b) Composite hypothesis: As an application, let us consider the hypotheses (4), that is, H 0 : π ≤ 0.35 vs. H 1 : π > 0.35 .

In general, H 0 : π ∈ Θ 0 vs. H 1 : π ∈ Θ 1 , where Θ 0 ∪ Θ 1 = R . We have

p 0 = Pr ( π ∈ Θ 0 | posterior ) and p 1 = Pr ( π ∈ Θ 1 | posterior ) (or p 1 = 1 − p 0 ) as posterior probabilities. Consequently, we define the posterior odds on H 0 against H 1 as p 0 / p 1 . Similarly, we have the prior odds on H 0 against H 1 ,

which we define here as z 0 / z 1 . The Bayes factor is B = p 0 z 1 p 1 z 0 . Again, we use the

sampling results from

Now, using (4), π ∼ ψ ( α 1 ∗ , β 1 ∗ , α 2 ∗ , β 2 ∗ ) , we can determine the required prior

and posterior probabilities. For example, p 0 = ∫ − 1 0.35 ψ ( t | α 1 ∗ , β 1 ∗ , α 2 ∗ , β 2 ∗ ) d t gives

p 0 = 0.7145 . In the same way, we obtain z 0 = 0.745 , using the prior ψ ( 1 / 2 , 1 / 2 , 1 / 2 , 1 / 2 ) . Since p 1 = 1 − p 0 and z 1 = 1 − z 0 , we have p 1 = 0.2855 and z 1 = 0.255 . Finally, the Bayes factor is B = 0.8566 , which is a mild argument in favor of H 1 .

The testing above can be seen to be quite straightforward, and is limited to some numerical values of the function ψ ( . ) that can be numerically computed. But to make an in-depth study of the Bayesian approach to the difference π − η = π 1 − ( π 2 + η ) , we need to consider the analytic expressions of the prior and posterior distributions of this variable, which can be obtained only from the general beta distribution. Naturally, the related mathematical formulas become more complicated. But Pham-Gia and Turkkan [

The general beta (or GB), defined on a finite interval, say (c, d), has a density:

f g b ( x ; α , β ; c , d ) = ( x − c ) α − 1 ( d − x ) β − 1 / [ ( d − c ) α + β − 1 B ( α , β ) ] , α , β > 0 , c ≤ x ≤ d (5)

and is denoted by X ~ G B ( α , β ; c , d ) . It reduces to the standard beta above when c = 0 and d = 1 . Conversely a standard beta can be transformed into a general beta by addition of, or/and, multiplication with a constant.

Theorem 2: Let X ~ G B ( α , β ; a , b ) and any two scalars θ , λ . Then

1) X + θ ~ G B ( α , β ; a + θ , b + θ ) ,

2) λ X ~ G B ( α , β ; λ a , λ b ) when λ > 0 . Otherwise, λ X ~ G B ( β , α ; λ b , λ a ) when λ < 0 .

Proof:

1) We have

f X + θ ( y ) = f X ( y − θ ) = ( ( y − θ ) − a ) α − 1 ( b − ( y − θ ) ) β − 1 / [ ( b − a ) α + β − 1 B ( α , β ) ] , a ≤ y − θ ≤ b = ( y − ( a + θ ) ) α − 1 ( ( b + θ ) − y ) β − 1 / [ ( ( b + θ ) − ( a + θ ) ) α + β − 1 B ( α , β ) ] , a + θ ≤ y ≤ b + θ

2) For λ > 0 ,

f λ X ( y ) = 1 λ f X ( y / λ ) = 1 λ ( y / λ − a ) α − 1 ( b − y / λ ) β − 1 / [ ( b − a ) α + β − 1 B ( α , β ) ] , a ≤ y / λ ≤ b = ( y − λ a ) α − 1 ( λ b − y ) β − 1 / [ ( λ b − λ a ) α + β − 1 B ( α , β ) ] , λ a ≤ y ≤ λ b

When λ < 0 ,

f λ X ( y ) = − 1 λ f X ( y / λ ) = − 1 λ ( y / λ − a ) α − 1 ( b − y / λ ) β − 1 / [ ( b − a ) α + β − 1 B ( α , β ) ] , a ≤ y / λ ≤ b = ( y − λ b ) β − 1 ( λ a − y ) α − 1 / [ ( λ a − λ b ) α + β − 1 B ( α , β ) ] , λ b ≤ y ≤ λ a

Q.E.D.

Pham-Gia and Turkkan [

Proposition 2:

Let X 1 ~ G B ( α , β ; c , d ) and X 2 ~ G B ( γ , δ ; e , f ) . For the difference X 1 − X 2 defined on ( c − f , d − e ) , there are two different cases to consider, depending on the relative values of c − e and d − f , since X 1 and X 2 do not have symmetrical roles.

Case 1:

c − f ≤ d − f ≤ c − e ≤ d − e (6)

Case 2:

c − f ≤ c − e ≤ d − f ≤ d − e (7)

Theorem 3: Let X 1 and X 2 be two independent general betas with their supports satisfying (6). Then Y = X 1 − X 2 has its density defined as follows:

For c − f ≤ y ≤ d − f ,

f ( y ) = ( y − ( c − f ) ) α + δ − 1 ( d − f − y ) β − 1 B ( δ , α ) ( d − c ) α + β − 1 ( f − e ) δ B ( δ , γ ) B ( α , β ) F 1 ( δ , 1 − β , 1 − γ ; α + δ ; ( c − f ) − y ( d − f ) − y , y − ( c − f ) f − e ) (8)

For d − f ≤ y ≤ c − e ,

f ( y ) = ( y − ( d + f ) ) δ − 1 ( d − e − y ) γ − 1 ( f − e ) δ + γ − 1 B ( δ , γ ) F 1 ( β , 1 − δ , 1 − γ ; α + β ; c − d y − ( d − f ) , d − c d − e − y ) (9)

and for c − e ≤ y ≤ d − e ,

f ( y ) = ( ( d − e ) − y ) β + γ − 1 ( y − ( d − f ) ) δ − 1 B ( β , γ ) ( d − c ) β ( f − e ) δ + γ − 1 B ( δ , γ ) B ( α , β ) F 1 ( β , 1 − α , 1 − δ ; β + γ ; ( d − e ) − y d − c , y − ( d − e ) y − ( d − f ) ) (10)

where F 1 ( . ) is Appell’s first hypergeometric function already discussed.

Proof:

The argument uses first part 2) of Theorem 1 to obtain that − X 2 ~ G B ( δ , γ ; − f , − e ) . Then, it uses the exact expression of the density of the sum of two general betas (see Theorem 2 in the article of T. Pham-Gia & N. Turkkan [

Q.E.D.

We denote the above density given by (8), (9) and (10) by φ π ( α 1 , β 1 , α 2 , β 2 ; c , d , e , f )

Note: The corresponding case 2, when relation (7) is satisfied, is given in Appendix 1 (Theorem 3a).

To study the density of π − η = π 1 − ( π 2 + η ) , a particular case that will be used in our study here is the difference between X 1 ~ G B ( α 1 , β 1 ; 0 , 1 ) and X 2 ~ G B ( α 2 , β 2 ; η , η + 1 ) , − 1 ≤ η ≤ 1 , with η being a positive constant.

In this case both Theorem 2 and Theorem 3 apply since c − e = d − f and the middle definition section of φ π ( α 1 , β 1 , α 2 , β 2 ; c , d , e , f ) disappears.

Theorem 4: Let X 1 ~ G B ( α 1 , β 1 ; 0 , 1 ) and X 2 ~ G B ( α 2 , β 2 ; η , η + 1 ) be two independent general beta distributed random variables. Then the density of Y = X 1 − X 2 , defined on [ − ( η + 1 ) , 1 − η ] , is:

1) for − η − 1 ≤ y ≤ − η ,

f ( y ) = ( y + ( η + 1 ) ) α 1 + β 2 − 1 ( − η − y ) α 2 − 1 B ( α 1 , β 2 ) B ( α 1 , β 1 ) B ( α 2 , β 2 ) F 1 ( β 2 , 1 − β 1 , 1 − α 2 ; α 1 + β 2 ; ( η + 1 ) + y η + y , y + ( η + 1 ) )

2) for − η ≤ y ≤ 1 − η ,

f ( y ) = ( ( 1 − η ) − y ) α 2 + β 1 − 1 ( y + η ) β 2 − 1 B ( α 2 , β 1 ) B ( α 1 , β 1 ) B ( α 2 , β 2 ) F 1 ( β 1 , 1 − α 1 , 1 − β 2 ; α 2 + β 1 ; ( 1 − η ) − y , y − ( 1 − η ) y + η )

and we denote this distribution by

Y ~ ξ η ( α 1 , β 1 , α 2 , β 2 ; η ) . (11)

Proof:

This is a special case of Theorem 3.

Q.E.D.

An equivalent form using Theorem 4 leads to a slightly different expression, which gives however, the same numerical values for the density of π − η (see Theorem 4a in Appendix 1).

Let π i , i = 1 , 2 be two independent beta distributed random variables, the first being a regular beta, π 1 ~ beta ( α 1 , β 1 ) , and the second being a general beta, π 2 ~ G B ( α 2 , β 2 ; η , 1 + η ) .

Binomial sampling, with these two different beta priors, leads to the following

Proposition 3: The prior distribution of π − η = π 1 − ( π 2 + η ) is ξ η ( α 1 , β 1 , α 2 , β 2 ; η ) , given by (11), and its posterior distribution is ξ η ( α 1 ∗ , β 1 ∗ , α 2 ∗ , β 2 ∗ ; η ) with α i ∗ = α i + x i and β i ∗ = β i + n i − x i , i = 1 , 2.

Proof:

π 1 − ( π 2 + η ) is the difference of two random variables with respective distribution beta ( α 1 , β 1 ) and G B ( α 2 , β 2 ; η , η + 1 ) , The prior distributions of π − η is hence ξ η ( α 1 , β 1 , α 2 , β 2 ; η ) , as given by (14).

Binomial sampling affects these 2 distributions in different ways. For the first, the posterior is beta ( α 1 + x 1 , β 1 + n 1 − x 1 ) while the posterior distribution of the second is G B ( α 2 + x 2 , β 2 + n 2 − x 2 ; η , η + 1 ) (see Proposition 3a in Appendix 2).

From Theorem 4, we obtain the expression of the posterior density ξ .35 ( 17.5 , 8.5 , 8.5 , 12.5 ; 0.35 ) of π − η as follows:

f ( x ) = { ( x + 1.35 ) 29 ( − 0.35 − x ) 7.5 B ( 17.5 , 12.5 ) B ( 17.5 , 8.5 ) B ( 8.5 , 12.5 ) F 1 ( 12.5 , − 7.5 , − 7.5 ; 30 ; 1.35 + x 0.35 + x , x + 1.35 ) , − 1.35 ≤ x < − 0.35 ( 0.65 − x ) 16 ( x + 0.35 ) 11.5 B ( 8.5 , 8.5 ) B ( 17.5 , 8.5 ) B ( 8.5 , 12.5 ) F 1 ( 8.5 , − 16.5 , − 11.5 ; 17 ; 0.65 − x , x − 0.65 x + 0.35 ) , − 0.35 ≤ x < 0.65 (12)

The Bayesian approach to testing the difference of two independent proportions leads to interesting results which agree with frequentist results when non-informative priors are considered. Undoubtedly, all preceding results can be

generalized to other measures frequently used in a 2 ´ 2 table.

Research partially supported by NSERC grant 9249 (Canada). The authors wish to thank the Universite de Moncton Faculty of Graduate Studies and Research for the assistance provided while conducting this work.

Pham-Gia, T., Thin, N.V. and Doan, P.P. (2017) Inferences on the Difference of Two Proportions: A Bayesian Approach. Open Journal of Statistics, 7, 1-15. https://doi.org/10.4236/ojs.2017.71001

Below is the expression of the density of Y = X 1 − X 2 when (7) is satisfied, instead of (6). This expression, with the one given in Theorem 3, covers all cases.

Theorem 3a: Let X 1 and X 2 be two independent general betas with their supports satisfying (10). Then Y = X 1 − X 2 has its density defined as follows: for c − f ≤ y ≤ c − e ,

f ( y ) = ( y − ( c − f ) ) α + δ − 1 ( c − e − y ) γ − 1 B ( α , δ ) ( f − e ) δ + γ − 1 ( d − c ) α B ( α , β ) B ( δ , γ ) F 1 ( α , 1 − γ , 1 − β ; α + δ ; ( c − f ) − y ( c − e ) − y , y − ( c − f ) d − c ) (13)

For c − e ≤ y ≤ d − f ,

f ( y ) = ( y + ( c + e ) ) α − 1 ( d − e − y ) β − 1 ( d − c ) α + β − 1 B ( α , β ) F 1 ( γ , 1 − α , 1 − β ; δ + γ ; e − f y − ( c − e ) , f − e d − e − y ) (14)

For d − f ≤ y ≤ d − e ,

f ( y ) = ( ( d − e ) − y ) β + γ − 1 ( y − ( c − e ) ) α − 1 B ( β , γ ) ( f − e ) γ ( d − c ) α + β − 1 B ( δ , γ ) B ( α , β ) F 1 ( γ , 1 − δ , 1 − α ; β + γ ; ( d − e ) − y f − e , y − ( d − e ) y ) (15)

Proof:

By rewriting Y = ( − X 2 ) − ( − X 1 ) , we can apply the above Theorem 2 and Theorem 3.

Q.E.D

A parallel, and equivalent, result to Theorem 4 is given below:

Theorem 4a: The density of X 1 − X 2 − η is:

For − η − 1 ≤ y ≤ − η ,

f ( y ) = ( y + ( η + 1 ) ) α 1 + β 2 − 1 ( − η − y ) α 2 − 1 B ( α 1 , β 2 ) B ( α 1 , β 1 ) B ( α 2 , β 2 ) F 1 ( α 1 , 1 − α 2 , 1 − β 1 ; α 1 + β 2 ; ( η + 1 ) + y η + y , y + ( η + 1 ) )

For − η ≤ y ≤ 1 − η ,

f ( y ) = ( ( 1 − μ ) − y ) α 2 + β 1 − 1 ( y + η ) α 1 − 1 B ( α 2 , β 1 ) B ( α 1 , β 1 ) B ( α 2 , β 2 ) F 1 ( α 2 , 1 − β 2 , 1 − α 1 ; α 2 + β 1 ; ( 1 − η ) − y , y − ( 1 − η ) y + η )

and we denote Y ~ ξ η ∗ ( α 1 , β 1 , α 2 , β 2 ; η ) .

Proof:

Similar to the proof of Theorem 4.

Q.E.D

Proposition 3a:

Suppose that X 2 ~ Bin ( n 2 , π 2 ) and π 2 has the prior distribution beta ( α 2 , β 2 ) then the posterior distribution of π 2 + η is G B ( α 2 + x 2 , β 2 + n 2 − x 2 ; η , η + 1 ) .

Proof:

The prior distribution of π 2 + η is G B ( α 2 , β 2 ; η , η + 1 ) (see Theorem 2) with the pdf

f π 2 + η ( π 2 | x 2 ) = [ B ( α 2 , β 2 ) ] − 1 ( π 2 − η ) α 2 − 1 ( 1 + η − π 2 ) β 2 − 1 , η ≤ π 2 ≤ η + 1 ,

The likelihood function is

f X 2 | π 2 + η ( x 2 | θ ) = f X 2 | π 2 ( x 2 | π 2 ) = ( n 2 x 2 ) π 2 x 2 ( 1 − π 2 ) n 2 − x 2 , x 2 = 0 , 1 , ⋯ , n

Thus the marginal distribution of X 2 , the number of success, with π 2 = θ − η , has density:

K ( x 2 | α 2 , β 2 , n 2 ) = ( n 2 x 2 ) B ( α 2 , β 2 ) ∫ η η + 1 ( θ − η ) α 2 − 1 ( 1 + η − θ ) β 2 − 1 π 2 x 2 ( 1 − π 2 ) n 2 − x 2 d θ , = ( n 2 x 2 ) B ( α 2 , β 2 ) ∫ η η + 1 ( θ − η ) α 2 − 1 ( 1 + η − θ ) β 2 − 1 ( θ − η ) x 2 ( 1 + η − θ ) n 2 − x 2 d θ = ( n 2 x 2 ) B ( α 2 , β 2 ) ∫ η η + 1 ( θ − η ) α 2 + x 2 − 1 ( 1 + η − θ ) β 2 + n 2 − x 2 − 1 d θ = ( n 2 x 2 ) B ( α 2 , β 2 ) B ( α 2 + x 2 , β 2 + n 2 − x 2 )

Therefore, the posterior distribution of θ given X 2 = x 2 is

f π 2 + η | X 2 ( θ | x 2 ) = f π 2 + η ( θ | x 2 ) f X 2 | π 2 + η ( x 2 | θ ) K ( x 2 | α 2 , β 2 , n 2 ) = [ B ( α 2 , β 2 ) ] − 1 ( θ − η ) α 2 − 1 ( 1 + η − θ ) β 2 − 1 ( n 2 x 2 ) π 2 x 2 ( 1 − π 2 ) n 2 − x 2 ( n 2 x 2 ) B ( α 2 , β 2 ) B ( α 2 + x 2 , β 2 + n 2 − x 2 ) , with π 2 = θ − η , η ≤ θ ≤ η + 1 = ( θ − η ) α 2 + x 2 − 1 ( 1 + η − θ ) β 2 + n 2 − x 2 − 1 B ( α 2 + x 2 , β 2 + n 2 − x 2 ) , η ≤ θ ≤ η + 1

This is the p.d.f. of G B ( α 2 + x 2 , β 2 + n 2 − x 2 ; η , η + 1 ) .

Q. E. D.

End