Testing for a Zero Proportion

Tests for a proportion that may be zero are described. The setting is an environment in which there can be misclassifica-tions or misdiagnoses, giving the possibility of nonzero counts from false positives even though no real examples may exist. Both frequentist and Bayesian tests and analyses are presented, and examples are given.


Introduction
To show that something is possible seems easy, just demonstrate or find one instance of its existence.However, due to misclassification, such an instance may seem to occur, even when it is not possible.Deciding whether a few occurrences rule out that the event is really the empty set is our starting point.A clinician asserts that patients with gout are precluded from getting multiple sclerosis (MS), but a sample of 36,733 MS patients contains 4 with gout.Does this provide sufficient evidence that the clinician is wrong when there is a chance that these contradicting examples were misdiagnosed?
Our purpose is to present two statistical tests-one Bayesian and the other frequentist-to determine if a set is empty in an environment of misclassifications.Both tests are presented, so that practitioners can select the one that better fits their needs or statistical philosophy.Since the inputs and models are different, any two statistical procedures may give very different outcomes, as the present analyses and examples show.
As a second example, a sample of 223 eczema sufferers contains 3 who were diagnosed as having psoriasis.The question is: Does having eczema indicate that the patient does not have psoriasis in this population of 11-year-old British children?
For a third example, could it be that no one in a particular population of students has the psychiatric condition called Generalized Anxiety Disorder (GAD)?A psychologist contends that none of them has that disorder.In a sample of 2843 students, 111 were diagnosed with GAD.
Two less-serious examples are: Our colleague claims that no dogs eat statistics homework, even though there are reported cases.A friend says that no one has been abducted by space aliens, notwithstanding some firstperson testimony and magazine articles.We leave homework-eating dogs and space-alien abductions for others to study.In the first three examples, are the numbers 4, 3, and 111 sufficiently large to indicate that there are indeed MS, psoriasis, or GAD patients in the sampled populations, when there can be misdiagnoses?
We consider two statements to be equivalent: The proportion of a large population that has a certain feature is zero, and the probability that a randomly selected individual from the population will have the feature is zero.

The Frequentist Test
We incorporate misclassification rates into a statistical test for a proportion.Under the null hypothesis that the set is empty, that is, the probability of obtaining an element is  = 0, an imperfect classification process is the only way to obtain a positive count, X.The false positive rate is p + .The number X is a binomial random variable with parameters n and p + .A test statistic is where  = P(Z ≥ z  ) is the level of significance.The sample's number of individuals designated as having the feature is X s .If X s ≥ x c , then the count is too large compared to the number expected from misclassifications alone, and we would reject the null hypothesis  = 0.If X s < x c , we would not have sufficient evidence to reject that  = 0.For our first example, it was conjectured in a landmark study that excessive production of uric acid by people with gout might preclude the onset of multiple sclerosis [3].The population is gout patients, and the null hypothesis is that no gout patients have MS.Indeed, in that study of 36,733 gout patients, only X s = 4 were recorded as having MS.Take the misdiagnosis rate to be p + = 0.001, that is, the false positive rate is only 0.1% for MS among those patients with gout.The count in the sample, 4, is very small compared to the expected count np + = (36733) (0.001)  37. Larger error rates produce even larger expected counts.For level of significance 0.05, the critical value is approximately 47. Since 4 < 47, we do not reject the null hypothesis that there are no cases of MS among people with gout.Of course, this does not establish a cause-and-effect relationship between the presence of uric acid and the absence of MS.

The Bayesian Test
The spirit and intent of a Bayesian analysis is different.The parameters have distributions, and conclusions are probability statements concerning which hypothesis is more likely to be correct [4, pp.145-167], [5, pp. 73-83].
This test requires that we introduce the false negative rate.Under the alternative hypothesis in which the set is not empty, there is the possibility of falsely designating a subject as not having the condition, with rate p -.Under the particular value , X has a binomial distribution with parameters n and In order to avoid complications, assume that 1p +p -> 0, which would almost always be true for a real experiment since error rates should be small.
Then,  = 0 if and only if p = p + , and under the null hypothesis p -does not explicitly enter the calculations.To test the null hypothesis, create the prior distribution of p with mixed discrete and continuous parts This prior distribution has been characterized as "virtually mandatory" [4, p. 151].It gives probability 0.5 to each hypothesis with an uninformative uniform distribu-tion covering the alternative hypothesis.The distribution of X is the binomial probability mass function Bayes' theorem says that the posterior probability that the null hypothesis is true is where The integral in Equation ( 3) is which is a constant times a probability computed from a beta distribution [4, p. 560], [5, pp. 33 and 48].Decide in favor of the hypothesis with the larger posterior probability.
For our second example, in a sample of 223 eleven-year-old children in Britain diagnosed with eczema, 3 were diagnosed with psoriasis [6].For the false positive rate p + = 0.01 among eczema suffers, g( 3 223, 0.01, 0.01) and
One choice that was made to create the test in Equations (2)-( 4) is the prior distribution in Equation (1).Generally, the prior distribution's impact on the analysis matters less and less as sample size is increased.Another choice was that p + has a fixed value.A more complicated analysis would place a distribution on this error rate and average it out by integrating over the now-variable p + [4,5].This type of analysis is presented in the next section.

More Extensive Tests
In this section we analyze our third example using expanded frequentist and Bayesian tests.
A person who suffers from Generalized Anxiety Disorder is in an almost constant state of apprehension.The null hypothesis is that there is no one in the population of American university students with GAD.That says GAD, as defined in psychiatry, does not exist in that population.Szasz [7] and others argue against the existence of such For this example, we perform a more extensive Bayesian analysis, which uses simulation and a beta distribution for the hyper parameter p + .Suppose we feel that p + follows a beta distribution with mean 0.02 and standard deviation 0.005.The mean is the center of the interval 0.01 ≤ p + ≤ 0.03, and the standard deviation is one-fourth the width of the interval.The mean and standard deviation uniquely determine the beta distribution's parameters Natural Scavenger of Peroxynitrite, in Experimental Allergic Encephalomyelitis and Multiple Sclerosis," Proceedings of the National Academy of Science, USA, Vol.95, No. 2, 1998, pp.675-680.doi:10.1073/pnas.95.2.675