Uniformly Minimum-Variance Unbiased Estimator (UMVUE) for the Gamma Cumulative Distribution Function with Known and Integer Scale Parameter

Abstract

Uniformly minimum-variance unbiased estimator (UMVUE) for the gamma cumulative distribution function with known and integer scale parameter. This paper applies Rao-Blackwell and Lehmann-Scheffeé Theorems to deduce the uniformly minimum-variance unbiased estimator (UMVUE) for the gamma cumulative distribution function with known and integer scale parameters. The paper closes with an example comparing the empirical distribution function with the UMVUE estimates.

Share and Cite:

Kubrusly, J. (2022) Uniformly Minimum-Variance Unbiased Estimator (UMVUE) for the Gamma Cumulative Distribution Function with Known and Integer Scale Parameter. Open Journal of Statistics, 12, 168-174. doi: 10.4236/ojs.2022.122011.

1. Introduction

Statistical inference is an important topic in scientific studies, whether from the theoretical or applied aspects. Studying efficient estimators for the probability density function (PDF) or for the cumulative distribution function (CDF) can be useful for various applications, such as the estimation of Fisher information or the estimation of quantiles. Other applications are mentioned in [1].

Some studies on the estimation of CDF have appeared in recent literature for some continuous distributions, for instance, Pareto-Rayleigh distribution [2], Exponentiated Burr XII distribution [1] or general distributions [3].

The purpose of this paper is to present the Uniformly minimum-variance unbiased estimator (UMVUE) for the gamma cumulative distribution function with known and integer scale parameter.

The Gamma distribution is a member of the two-parameter family of continuous probability distributions. There are two different parameterizations currently used. In this article we focus on the shape (k) and scale ( λ ) parameters, so X G a m m a ( k , λ ) and the probability density function of X is defined by

f X ( x ) = λ k Γ ( k ) x k 1 e λ x , x > 0.

If k is a positive integer, Γ ( k ) = ( k 1 ) ! , such a distribution is called an Erlang distribution and represents the sum of k independent exponential random variables, each of which has mean equal to 1 / λ .

Let X 1 , X 2 , , X n be a random sample of size n from the population X. The most common way to estimate a cumulative distribution function of X, F, is from the empirical distribution function defined by

F ^ e ( q ) = number of elements in the sample q n = 1 n i = 1 n 1 X i q

where 1 A is the indicator of event A. This paper presents another estimator for F when k is known as an integer parameter, which is referred to as the minimum-variance unbiased estimator (UMVUE).

The paper is organized as follows. Section 2 presents two lemmas for the main result. Section 3 presents and demonstrates the main result of this paper. To finish, Section 4 presents a simple example that compares the empirical distribution function and the UMVUE estimator of F ( q ) .

2. Preliminaries

The following lemmas will be used for establishing the main results.

Lemma 1 Let q , t , q < t , a + and b + . Then,

q t ( t x ) a x b d x = a ! b ! j = 1 b + 1 ( t q ) a + j q b j + 1 ( a + j ) ! ( b j + 1 ) ! .

Proof. The proof goes by induction on b. Let’s call q t ( t x ) a x b d x = h ( q , t , a , b ) .

Base case: check the result for b = 1 . That is,

h ( q , t , a , 1 ) = q t ( t x ) a x d x = a ! j = 1 2 ( t q ) a + j q 2 j ( a + j ) ! ( 2 j ) ! = ( t q ) a + 1 q a + 1 + ( t q ) a + 2 ( a + 2 ) ( a + 1 ) = ( t q ) a + 1 q ( a + 2 ) + t q ( a + 1 ) ( a + 2 ) = ( t q ) a + 1 q ( a + 1 ) + t ( a + 1 ) ( a + 2 ) , q , t , a .

For this, an integration by substitution will be performed.

h ( t , q , a , 1 ) = q t ( t x ) a x d x = t q 0 y a ( t y ) d y = 0 t q y a ( t y ) d y = 0 t q t y a y a + 1 d y = t y a + 1 a + 1 y a + 2 a + 2 | 0 t q = t ( t q ) a + 1 a + 1 ( t q ) a + 2 a + 2 = ( t q ) a + 1 ( t a + 1 t q a + 2 ) = ( t q ) a + 1 ( t ( a + 2 ) ( t q ) ( a + 1 ) ( a + 1 ) ( a + 2 ) ) = ( t q ) a + 1 t + q ( a + 1 ) ( a + 1 ) ( a + 2 ) .

Inductive step: show that the result is true for b + 1 if it is true for b. That is, assuming that

h ( t , q , a , b ) = q t ( t x ) a x b d x = a ! b ! j = 1 b + 1 ( t q ) a + j q b j + 1 ( a + j ) ! ( b j + 1 ) ! ( induction hypothesis )

we will conclude that

h ( t , q , a , b + 1 ) = q t ( t x ) a x b + 1 d x = a ! ( b + 1 ) ! j = 1 b + 2 ( t q ) a + j q b j + 2 ( a + j ) ! ( b j + 2 ) ! .

To solve q t ( t x ) a x b + 1 d x an integration by parts will be done. Consider u = x b + 1 and d v = ( t x ) a d x , then d u = ( b + 1 ) x b d x and v = ( t x ) a + 1 a + 1 .

h ( t , q , a , b + 1 ) = q t ( t x ) a x b + 1 d x = x b + 1 ( t x ) a + 1 a + 1 | q t + q t ( t x ) a + 1 a + 1 ( b + 1 ) x b d x = q b + 1 ( t q ) a + 1 a + 1 + b + 1 a + 1 q t ( t x ) a + 1 x b d x = q b + 1 ( t q ) a + 1 a + 1 + b + 1 a + 1 ( a + 1 ) ! b ! × j = 1 b + 1 ( t q ) a + 1 + j q b j + 1 ( a + 1 + j ) ! ( b j + 1 ) ! ( by induction hypothesis )

Replace j + 1 with l in the above summation:

= q b + 1 ( t q ) a + 1 a + 1 + a ! ( b + 1 ) ! l = 2 b + 2 ( t q ) a + l q b l + 2 ( a + l ) ! ( b l + 2 ) ! = q b + 1 ( t q ) a + 1 a ! ( b + 1 ) ! ( a + 1 ) a ! ( b + 1 ) ! + a ! ( b + 1 ) ! l = 2 b + 2 ( t q ) a + l q b l + 2 ( a + l ) ! ( b l + 2 ) ! = a ! ( b + 1 ) ! ( ( t q ) a + 1 q b + 1 ( a + 1 ) ! ( b + 1 ) ! + l = 2 b + 2 ( t q ) a + l q b l + 2 ( a + l ) ! ( b l + 2 ) ! ) = a ! ( b + 1 ) ! l = 1 b + 2 ( t q ) a + l q b l + 2 ( a + l ) ! ( b l + 2 ) !

which closes the proof by induction.

Lemma 2 Let X = { X 1 , , X n } be a random sample with size n from the population X G a m m a ( k , λ ) . Let T = i = 1 n X i . Then,

f T | X 1 ( t | x 1 ) = λ ( n 1 ) k Γ ( ( n 1 ) k ) ( t x 1 ) ( n 1 ) k 1 e λ ( t x 1 ) , t > x 1 .

Proof. First, notice that

f T | X 1 ( t | x 1 ) = f i = 1 n X i | X 1 ( t | x 1 ) = f i = 1 n X i ( t x 1 ) .

Set W = i = 2 n X i , W G a m m a ( ( n 1 ) k , λ ) and f W ( w ) = λ ( n 1 ) k Γ ( ( n 1 ) k ) w ( n 1 ) k 1 e λ w , w > 0 .

Then,

f T | X 1 ( t | x 1 ) = f i = 2 n X i ( t x 1 ) = λ ( n 1 ) k Γ ( ( n 1 ) k ) ( t x 1 ) ( n 1 ) k 1 e λ ( t x 1 ) , t x 1 > 0.

3. Main Result

Theorem 1 Let X = { X 1 , X 2 , , X n } be a random sample with size n from the population X G a m m a ( k , λ ) . Consider k a known parameter. For any q + let F X ( q ) = P ( X q ) = θ be an unknown parameter. The Uniform Minimum Variance Unbiased Estimator (UMVUE) for θ is given by

θ ^ = 1 ( i = 1 n X i q i = 1 n X i ) n k 1 j = 1 k ( n k 1 k j ) ( q i = 1 n X i q ) k j , i f q < i = 1 n X i .

If q i = 1 n X i , θ ^ = 1 .

Proof. Rao-Blackwell Theorem [4] [5] states that if θ ˜ is an unbiased estimator for θ and T ( X ) is a sufficient statistic for θ , then θ ^ = E ( θ ˜ | T ( X ) ) is an unbiased estimator for θ based on T ( X ) and V a r ( θ ^ ) V a r ( θ ˜ ) . Lehmann-Scheffé Theorem [6] [7] states that if θ ˜ is an unbiased estimator for θ and T ( X ) is a complete sufficient statistic for θ , then θ ^ = E ( θ ˜ | T ( X ) ) is the (unique) uniformly minimum-variance unbiased estimator (UMVUE) for θ . Let us find some unbiased estimator and a complete sufficient statistic for θ to deduct its UMVUE.

First note that θ ˜ defined below is an unbiased estimator for θ .

θ ˜ = { 1 , X 1 q 0 , X 1 > q

Also note that X G a m m a ( k , λ ) , where k is a known parameter, belong to an uniparametric exponential family. It is known that for the exponential family it is possible to find directly a sufficient and complete statistic [8]. In the case of X, T ( X ) = i = 1 n X i is a complete sufficient statistic. Then,

θ ^ = E ( θ ˜ | i = 1 n X i )

is the UMVUE for θ . Let’s show that θ ^ has the expression given in the statement.

θ ^ = E ( θ ˜ | i = 1 n X i ) = P ( X 1 q | i = 1 n X i ) = 1 P ( X 1 > q | i = 1 n X i ) = 1 P ( X 1 > q | i = 1 n X i = t ) | t = i = 1 n X i = { 1 , t q 1 q f X 1 | i = 1 n X i ( x 1 | t ) d x 1 , q < t | t = i = 1 n X i

Before proceeding with the computations, we develop the expression of f X 1 | i = 1 n X i . For this, we used Lemma 2.

f X 1 | i = 1 n X i ( x 1 | t ) = f i = 1 n X i | X 1 ( t | x 1 ) f X 1 ( x 1 ) f i = 1 n X i ( t ) = λ ( n 1 ) k ( t x 1 ) ( n 1 ) k 1 e λ ( t x 1 ) Γ ( ( n 1 ) k ) λ k x 1 k 1 e λ x 1 Γ ( k ) Γ ( n k ) λ n k t n k 1 e λ t , t x 1 > 0 , x 1 > 0 , t > 0. = Γ ( n k ) Γ ( n k k ) Γ ( k ) ( t x 1 ) n k k 1 x 1 k 1 t n k 1 , x 1 < t , x 1 > 0 , t > 0.

Assuming q < t , and using Lemma 1,

θ ^ = 1 q f X 1 | i = 1 n X i ( x 1 | t ) d x 1 | t = i = 1 n X i = 1 q t Γ ( n k ) Γ ( n k k ) Γ ( k ) ( t x 1 ) n k k 1 x 1 k 1 t n k 1 d x 1 | t = i = 1 n X i = 1 Γ ( n k ) Γ ( n k k ) Γ ( k ) t n k 1 q t ( t x 1 ) n k k 1 x 1 k 1 d x 1 | t = i = 1 n X i = 1 Γ ( n k ) Γ ( n k k ) Γ ( k ) t n k 1 ( n k k 1 ) ! ( k 1 ) ! × j = 1 ( k 1 ) + 1 ( t q ) ( n k k 1 ) + j q ( k 1 ) j + 1 ( ( n k k 1 ) + j ) ! ( ( k 1 ) j + 1 ) ! | t = i = 1 n X i

= 1 Γ ( n k ) t n k 1 j = 1 k ( t q ) n k k 1 + j q k j Γ ( n k k + j ) Γ ( k j + 1 ) | t = i = 1 n X i = 1 ( t q t ) n k 1 j = 1 k Γ ( n k ) Γ ( n k k + j ) Γ ( k j + 1 ) ( q t q ) k j | t = i = 1 n X i = 1 ( i = 1 n X i q i = 1 n X i ) n k 1 j = 1 k ( n k 1 ) ! ( n k k + j 1 ) ! ( k j ) ! ( q i = 1 n X i q ) k j = 1 ( i = 1 n X i q i = 1 n X i ) n k 1 j = 1 k ( n k 1 k j ) ( q i = 1 n X i q ) k j .

4. A Very Simple Example

Consider the population X G a m m a ( 3, λ ) and the random sample of size 5:

2.049282 2.429458 1.288630 3.967066 3.527220

which was generated in the R Program [9] by the command

set.seed(1);rgamma(5,3,1).

For this sample, i = 1 5 x i = 13.26166 and the uniformly minimum-variance unbiased estimator for the gamma cumulative distribution function with scale parameter k = 3 is

θ ^ = F ^ X ( q ) = 1 ( i = 1 5 X i q i = 1 5 X i ) 14 j = 1 3 ( 5 × 3 1 3 j ) ( q i = 1 5 X i q ) 3 j = 1 ( 13.26166 q 13.26166 ) 14 ( 91 ( q 13.26166 q ) 2 + 14 ( q 13.26166 q ) + 1 ) = 1 91 ( q 13.26166 q ) 16 + 14 ( q 13.26166 q ) 15 + ( q 13.26166 q ) 14

Figure 1. Gamma cumulative distribution function—approximate and exact curves.

In Figure 1 can be compared the curves of the uniformly minimum-variance unbiased estimator for the gamma cumulative distribution function F ^ (UMVUE) and the empirical cumulative function (ECF), both created from the random samples presented above. The black dotted line represents the exact curve (Real), considering λ = 1 , parameter used to generate the sample.

5. Conclusions

In the previous sections, we argue about the construction of the UMVUE for the gamma cumulative distribution function with known and integer scale parameter. This is a pontual estimator for the P ( X q ) , for q , where X G a m a ( k , λ ) and k is a well-known parameter.

The advantage of using the UMVUE estimator is that besides being the uniformly minimum-variance unbiased estimator it is a continuous estimator with respect to q. Moreover, with small samples, the results are already satisfactory, which can be seen in Section 4. The disadvantage of the present approach is that for large values of k the estimator will have a complex expression and if n × k is too large, it may not be simple to compute ( n k 1 k j ) .

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] Hassan, A.S., Assar, S.M., Ali, K.A. and Nagy, H.F., et al. (2021) Estimation of the Density and Cumulative Distribution Functions of the Exponentiated Burr XII Distribution. Statistics, 22, 171-189.
https://doi.org/10.21307/stattrans-2021-044
[2] Jebeli, M. and Deiri, E. (2020) Estimation Methods for the Probability Density Function and the Cumulative Distribution Function of the Pareto-Rayleigh Distribution. Statistics, 54, 135-151.
https://doi.org/10.1080/02331888.2019.1689979
[3] Zamanzade, E., Mahdizadeh, M. and Samawi, H.M. (2020) Efficient Estimation of Cumulative Distribution Function Using Moving Extreme Ranked Set Sampling with Application to Reliability. AStA Advances in Statistical Analysis, 104, 485-502.
https://doi.org/10.1007/s10182-020-00368-3
[4] Rao, C. (1945) Information and Accuracy Attainable in Estimation of Statistical Parameters. Bulletin of the Calcutta Mathematical Society, 37, 81-91.
[5] Blackwell, D. (1947) Conditional Expectation and Unbiased Sequential Estimation. The Annals of Mathematical Statistics, 18, 105-110.
https://doi.org/10.1214/aoms/1177730497
[6] Lehmann, E.L. and Scheffe, H. (1950) Completeness, Similar Regions, and Unbiased Estimation: Part I. Sankhyā: The Indian Journal of Statistics (1933-1960), 10, 305-340.
[7] Lehmann, E.L. and Scheffe, H. (1955) Completeness, Similar Regions, and Unbiased Estimation: Part II. Sankhyā: The Indian Journal of Statistics (1933-1960), 15, 219-236.
[8] Casella, G. and Berger, R.L. (2021) Statistical Inference. Cengage Learning.
[9] R Core Team (2020) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.