Exact Distribution of Difference of Two Sample Proportions and Its Inferences

Comparing two population proportions using confidence interval could be misleading in many cases, such as the sample size being small and the test being based on normal approximation. In this case, the only one option that we have is to collect a large sample. Unfortunately, the large sample might not be possible. One example is a person suffering from a rare disease. The main purpose of this journal is to derive a closed formula for the exact distribution of the difference between two independent sample proportions, and use it to perform related inferences such as a confidence interval, regardless of the sample sizes and compare with the existing Wald, Agresti-Caffo and Score. In this journal, we have derived a closed formula for the exact distribution of the difference between two independent sample proportions. This distribution doesn’t need any requirements, and can be used to perform inferences such as: a hypothesis test for two population proportions, regardless of the nature of the distribution and the sample sizes. We claim that exact distribution has the least confidence width among Wald, Agresti-Caffo and Score, so it is suitable for inferences of the difference between the population proportion regardless of sample size.


Introduction
Comparing two population proportions, especially when the sample size is small is very challenging in statistics, and has applications in many fields. Several procedures have been suggested [One of the most popular and common methods that has been used for a long time is the Wald interval]. Due to simplicity and convenience, the first method that comes in the mind of most statisticians is the

Exact Distribution of Difference of Two Sample Proportions
Let 1 2 , , , m X X X  and 1 2 , , , n Y Y Y  are iid Bernoulli random samples from two different populations with parameters 1 p and 2 p respectively and let From the Theorem above, we derive the next results by corresponding them to different relations between m and n.
Corollary 1 If ( ) , 1 gcd m n = , then the exact distribution of D is given by:

Corollary 2
If m n = and k l = then the exact distribution of D is given by The exact distribution of D is given by The exact distribution of D is symmetrical about zero if m n = and 1 2 p p = .

Support of the Distribution
Support of the exact distribution is denoted by ( ) , D m n . For small values of m and n, it can be derived manually. However, for larger values of m and n, it is tedious and time consuming, so the software such as R is used.
The graphs of the Probability mass function for exact distribution for the difference of two population proportion for m = n and p 1 = p 2 are plotted in Figure   1. These graphs (Figure 1) are the evidence to support corollary 4.

Hypothesis Testing
To test . Then the null distribution of D is given by The critical region can be obtained by finding 2 c α and 1 2 c α − such that: This means that: respectively. Since p-value is less than α , we reject the null hypothesis and conclude that there is gender discrimination in promotion. However the p-value is slightly less than α , so there is moderate gender discrimination for the promotion of the employees.

Continuation of the example: Gender Discrimination
In this example, we have rejected null hypothesis with the significance level 0.05 α = . Now we want to find power of the hypothesis test for

Confidence Interval
A relatively easy approach to compare the difference between population proportions (

Conclusion
Inferences of the difference of the population proportion are a very basic problem in statistics. Standard Wald interval has been used universally. Standard Wald interval is persistently chaotic, and has unacceptably poor coverage probabilities when either the sample sizes are small or one proportion is very large and the other is very small. Several intervals have been suggested but their level of performance is not satisfactory when the sample size is small. We have been shown that our distribution does not depend on sample size. We have also shown that exact distribution has the least confidence width among Wald, Agresti-Caffo and Score, so it is suitable for inferences of the difference between the population proportion regardless of sample size.

Conflicts of Interest
The authors declare no conflicts of interest regarding the publication of this paper.

Proof of lemma
If we define , then W can be written as We multiply the RHS' of 2 and 3 to obtain 1.

Proof of Theorem
Notice that, even though the support of D and W are different, their pmf's have the same probabilities: For this case, k m − is either 0 or −1 and n l n − is either 0 or 1 so, now from the theorem we get, Proof of corollary 3 The exact distribution of D, using lemma, is given by;