Asymptotic Comparison of Method of Moments Estimators and Maximum Likelihood Estimators of Parameters in Zero-Inflated Poisson Model

This paper discusses the estimation of parameters in the zero-inflated Poisson (ZIP) model by the method of moments. The method of moments estimators (MMEs) are analytically compared with the maximum likelihood estimators (MLEs). The results of a modest simulation study are presented.


Introduction
Zero-inflated models have found applications in situations where excess number of zero observations are generated.The application of the zero-inflated Poisson model by Lambert [1] in a count regression model is well known.Recently Couturier et al. [2] have used the zeroinflated truncated generalized Pareto distribution for the analysis of radio audience data.
The ZIP model is introduced in this section in the context of a practical situation.Maximum likelihood estimation of the parameters involved in the model is discussed in Section 2. The MMEs of the parameters are obtained in Section 3. The ZIP model is shown to be a member of the two-parameter exponential family and hence the asymptotic normality of the MMEs is established.Further, in Section 4, the details of computing the Fisher information matrix corresponding to this model are shown.In Section 5, the MMEs and the MLEs are asymptotically compared.Also, an empirical evidence for the asymptotic result is given through a modest simulation study in Section 6.
A random variable X is said to have a zero-inflated Poisson distribution, if its probability mass function (p.m.f.) is given by Thus the distribution of X is a convex combination of a distribution degenerate at zero and a Poisson distribution with mean θ.This is known as the zero-inflated Poisson model.

Maximum Likelihood Estimation
Let X = (X 1 , X 2 , X 3 , •••, X n ) be a random sample on X with the p.m.f.specified in (1.1).Then the likelihood function is given by where   It is obvious that the above likelihood function does not yield closed form expressions for the MLEs of θ and  .
Yip [3] has shown that the conditional MLE of θ treating  as a nuisance parameter, is the solution of of θ also has no closed form expression and it has to be computed using a numerical procedure.Of course, it is much easier than computing the MLE of θ by maximizing (2.1).He has also observed that finding the values of θ and  that maximize the likelihood function (2.1) is difficult because of its flat surface and boundary problem.Also, it is not necessary that the global maximum is located for every observed sample (See [3]).Kale [4] has obtained the optimal estimating equation for θ treating  as a nuisance parameter when of a general power series distribution.When is the p.m.f. of a Poisson distribution with mean θ, the optimal estimating equation obtained by Kale [4] for θ reduces to (2.2).
Estimating  may be of significant interest and it cannot be treated as a nuisance parameter.

EM Algorithm
When the likelihood function has a complicated structure and maximizing it by numerical methods is difficult, a simple alternative procedure is the EM-algorithm developed by Dempster et al. [5].Nanjundan [6] has computed the MLEs of θ and  using the EM algorithm.He has obtained the E-and M-steps by rewriting the likelihood function so as to accommodate missing data.Let 1, if the sampled sampled leaf is suitable 0, otherwise Then, we have and If , then and 0 j X  0 j Z  or 1.In other words, , we have no information on 0 j X  .
In the E-step, the expectation of the likelihood function of the complete data is taken and E(Z) is replaced by the conditional expectation   0 0 , , , The computational details of these steps can be summarized as follows.
1) Choose the initial estimates of 0  and 0  by and n g = number of and n 0 = number of x ] values is less than a desired threshold value.
The corresponding values of 1  and 1  are the MLEs of θ and  respectively (See [5]).Unlike Fisher's method of scoring, the EM algorithm does not yield the estimate of the standard errors of the MLEs as a by product.
Nanjundan [7] has further compared the MLE and the conditional MLE of θ.

Method of Moments Estimators
The first and the second theoretical moments of X having the p.m.f.(1.1) are ) is a random sample on X with the p.m.f.specified in (1.1), the MMEs θ and  are given by the following simultaneous equations: where The MMEs of θ and  are respectively It is easy to see that Hence the problem of division by zero in these MMEs doesn't arise when n is sufficiently large.

n  
Note that the probability mass function (1.1) can be written as where   Taking log on both sides of (3.3), we get After simple rearrangement of terms, the above expression can be written as , which is the general form of two parameter exponential family.Hence the zero-inflated Poisson model belongs to two parameter exponential family and thus is minimal sufficient and complete for   ,   .Since the ZIP model belongs to two parameter exponential family and the MMEs are based on these minimal sufficient statistics for the parameters,  where is the Fisher information matrix and it is obtained in the next section.

Fisher Information Matrix
Note that  log ; , p x   is twice differentiable w.r.t.
both θ and  .
Taking logarithm on both sides of (1.1), we get We get the following partial derivatives: and Using (4.1), (4.2), (4.3), we can verify that   log ; , 0 On simplification, we get, After simplification we arrive at the following expres-sion Therefore the Fisher information matrix becomes e e e e e e e e The inverse of the Fisher information matrix is

Asymptotic Relative Efficiency
Hence the asymptotic relative efficiency of ˆm  with respect to ˆmle Therefore, the MMEs and the MLEs of θ are asymptotically equally efficient.The same is true in the case of  too.

Simulation Study
Using R software, 1000 samples of various sizes were simulated fixing θ = 2.5 and  = 0.3.For each of the samples the MLEs and the MMEs of θ and  were com- puted.The histograms of the MLEs and the MMEs were separately drawn for θ and  .The histograms are given in the Figures 1-4.
Nanjundan et al. [8] have carried out an elaborate simulation study by considering various values of θ and  and varying the number of samples.From the above histograms, it can be observed that the MMEs are also normally distributed even for moderate sample sizes.This gives the graphical evidence for the asymptotic normality of the MLEs and the MMEs of both the parameters in the model.
The mean squared errors (MSEs) computed from the simulation study are shown in the Appendix.The following observations about the performance of the estimates are made from the mean squared errors.
1) Though the MSEs corresponding to the MLEs are less than the MSEs of MMEs, the difference is very insignificant.That is the MMEs also perform equally good when compared the MLEs.
2) For the majority of the combinations of θ and  ,   the MSEs corresponding to θ are less than the MSEs corresponding to  in the case of both the estimates.

Discussion and Conclusions
Zero-inflated Poisson models are readily applicable in many biological and social contexts.Two such situations are briefly discussed in this section.
Insects live on the leaves of a tree when they are found to be suitable for feeding and they do not live on those which are unsuitable for feeding.Suppose that the proportion of unsuitable leaves in a tree is  and the num- ber of insects on a suitable leaf has a Poisson distribution with mean θ.If an observed leaf has any insects on it, then it is definitely a suitable one.On the other hand, if a leaf has no insects, then it may or may not be suitable for feeding.Let X denote the number of insects on any leaf.Then X has the p.m.f.given in (1.For more applications, one can refer to Lambert [1] and Kale [4].
The MLEs of the parameters in the ZIP model have no closed form expressions and computing them even by the EM algorithm needs computer facility.Whereas the MMEs have simple closed form expressions and they can be computed even with pocket calculators.The MMEs and the MLEs are asymptotically equally efficient.Hence MMEs can easily be used instead of the MLEs when the sample size is sufficiently large.

Appendix
In the following table, the upper and the lower cells give the mean squared errors of the MLEs of θ and φ respectively.The values within the brackets are the mean squared errors corresponding to MMEs.Sample size = 100 and number of samples = 1000.
The likelihood function of the complete data is given by


are respectively the initial estimates θ and  In the with respect to θ and  If 1  and 1  are the values of θ and  which maximize

4 )
the observed sample, compute the improved estimates of θ and  by Repeat steps 2) and 3) until the difference between the successive   , L   x [or  log , L   


Since the MLEs of the parameters θ and  in the ZIP model have no closed form expressions, their exact standard errors are unlikely.Hence we are left with the asymptotic relative efficiencies of the estimators for the analytical comparison.Since the ZIP model in (1.1) belongs to two parameter exponential family, the MLEs of θ and  are also asymptotically normal and

Figure 1 .
Figure 1.Histograms of the MMEs and the MLEs of θ and φ based on 1000 samples of size 25 each drawn from the distribution 0.3p 0 (x) + 0.7p 1 (x, 2.5).

Figure 2 .
Figure 2. Histograms of the moment estimators and the MLEs of θ and φ based on 1000 samples of size 50 each drawn from the distribution 0.3p 0 (x) + 0.7p 1 (x, 2.5).

Figure 3 .
Figure 3. Histograms of the moment estimators and the MLEs of θ and φ based on 1000 samples of size 100 each drawn from the distribution 0.3p 0 (x) + 0.7p 1 (x, 2.5).

Figure 4 .
Figure 4. Histograms of the moment estimators and the MLEs of θ and φ based on 1000 samples of size 250 each drawn from the distribution 0.3p 0 (x) + 0.7p 1 (x, 2.5).
1).A social group under study may have fertile and sterile couples.If the proportion of sterile couples is  and the number of children per fertile couple has a Poisson distribution with mean θ.Then X, the number of children of a randomly chosen couple, has the ZIP distribution specified in (1.1).