A Geometric Approach to Conditioning and the Search for Minimum Variance Unbiased Estimators ()
1. Introduction
We are given a coin that has probability θ of coming up heads when tossed once, where θ is an unknown real number in the interval (0, 1). We wish to estimate θ. Typically, we toss this coin a number of times, which is fixed in advance, and use the proportion of heads that appear as an estimator of θ. This approach not only makes sense intuitively, but also is optimal in that this estimator is unbiased and has the smallest possible variance based on the fixed number of tosses. It is the minimum variance unbiased estimator (MVUE) of θ [1]. The sample proportion is also the maximum likelihood estimator(MLE) of θ, because it maximizes the likelihood of obtaining the observed result of these coin tosses [1] [2].
Suppose that we proceed in a different way. We perform a “do until” experiment. Instead of fixing the number of tosses in advance, we toss this coin until we obtain heads for the nth time, where n is fixed. The recorded data are
, where x1 is the number of tosses it takes to obtain heads for the first time, and for 2 ≤ k ≤ n, xk is the additional number of tosses it takes to obtain the kth occurrence of heads after we have obtained headsk − 1 times. These data are realizations of independent random variables
, each of which has the geometric distribution with probability mass function (pmf)
for
[1] [2]. Our goal is to find the MVUE of θ based on this alternative data.
We use this example to illustrate how conditioning an unbiased estimator on a sufficient statistic can be used to find a MVUE. This shows how we can discover the best estimator by using the very important concept of conditioning. The example demonstrates that this is not simply of theoretical interest. Finding appropriate estimators is a consequential problem in inference, decision-making, and data reduction. In practical terms, an advantage of this method of using conditioning in this manner is that it is guaranteed to yield minimum variance and unbiased estimators. While they might be considered by some to be intuitive, other techniques, such as the method of moments, may not produce estimators with those desirable properties [1] [2].
2. The Minimum Variance and Unbiased Estimator (MVUE)
One way to find an unbiased estimator of θ for the geometric distribution and the data
is to start with X1. Because
, the estimator
is an unbiased estimator of θ. However, this estimator ignores the values of
, which makes it suspect. Indeed, it is not the MVUE of θ unless n = 1.
The random variable
is the total number of tosses of the coin that it takes to obtain n heads. It has the negative binomial distribution with pmf
for
[1] [2]. The variable Y is a sufficient statistic for θ in the sense that the conditional, joint distribution of (
), given the value of Y, does not depend upon θ, as we now demonstrate. For
and
,
(1)
when
, the probability is zero.
The denominator of (1) is the number of ordered n-tuples of positive integers that sum to y. We conclude that, given Y = y, the conditional distribution of (
) is uniformly distributed on the set of all such n-tuples. The distribution does not depend upon θ. This means that, as long as the experimenters retain the value Y, they may discard the data (
) without losing any information about θ. The statistic Y is therefore sufficient or good enough, because it contains all the knowledge about the value of θ that is available in the data. In other words, Y drains the data of all the useful information that it has to convey about the value of θ.
The estimator
is unbiased for θ, because
.
Thus, u(X1) and v(Y) are both unbiased estimators of θ which can be computed from the data (
).
According to the conditional variance formula
(2)
[1] [3] [4], so
(3)
This reasoning says that the MVUE of θ must be a function of Y, because we can use the conditional variance formula to condition any unbiased estimator of θ, which is not a function of Y, on Y itself to obtain another unbiased estimator of θ, which is a function of Y having a smaller variance. This is the content of the Rao-Blackwell theorem [3] [4].
There is only one function of Y that is an unbiased estimator of θ, which is demonstrated as follows. Suppose that there are two such estimators
and
, then
and
Because the series of powers of 1 – θ is identically zero for 0 < θ < 1, all of its coefficients must be zero. Therefore,
for
, and the uniqueness is established. Thus,
must be the unique MVUE of θ.
To compute the estimator, proceed as follows:
where the independence of
yields the last equality. Further,
Hence, the MVUE of θ is
3. Geometric Interpretation
There is a geometric interpretation of the conditional variance formula (2) that can provide a deeper understanding of these results. There is a long tradition of showing that variables and statistics reside in geometric spaces. Herr [5] gives a history of these representations. See [6] [7] [8] for other examples. The concept of projection, which is the essence of the present geometric approach, is widely used elsewhere, for example in machine learning. This approach places the process of computing MVUEs in the framework of the universally important concept of an inner product space and thus provides a deeper mathematical insight into this process.
The set of random variables on a fixed probability space which have finite second moment is a Hilbert space
, where its inner product
and metric d are defined for points
as
and
[9]. The conditional variance formula (2) is the “Pythagorean Theorem” in
as
(4)
because
and
by employing
. Equation (4) makes clear that
is the orthogonal projection of
onto the subspace of random variables that are functions of Y, as the representational illustration in Figure 1 shows.
If the unbiased estimator
is not a function of Y, then
, and from (4)
That is,
as in (3).
Because the hypotenuse of a right triangle is its longest side, Figure 1 makes it clear that the variance of v(Y) is strictly less than the variance of u(X1).
4. Conclusions
Using a prototypical example, we have shown how the powerful technique of conditioning can be used to find the minimum variance unbiased estimator (MVUE) of a parameter. The data collection technique and the observations’ probability distribution determine the MVUE. Oftentimes, these MVUEs are not simply standard statistics, like means and variances, as the example illustrates. The geometry of the conditional variance formula shows how the minimum variance estimator is obtained as a projection.
Future endeavors might involve further development of the geometrical representation of this technique. Also, the example reveals how a feature of a
Figure 1.
as the result of an orthogonal projection in the Hilbert space
.
physical population can be viewed as a parameter in different probability distributions, depending upon the sampling methods. In the example, θ is in a Bernoulli distribution and a geometric distribution. Topics for potential study are such choices, the ease of the sampling methods, the required sample sizes or efficiency, and the formulations of the MVUEs.
Acknowledgements
The authors want to thank an anonymous referee for many insightful comments.