Bayesian Estimation of the Shape Parameters of Mcdonald Generalized Beta-Binomial Distribution ()

1. Introduction
Data is an essential tool that is multidisciplinary in the world today because it informs on the decision making process of all the subjects that exist. Therefore, understanding the data as well as modelling this data to extract useful information is a key step to achieving a good decision. The confounding aspects of data also form an integral part in extracting information from a dataset, they are preliminary aspects that should be identified in a dataset. For instance presence of over dispersion [1]. With the changes that are happening in the world today like technology advancement and new industrial era, big volumes of data are generated on daily basis [2]. Therefore, models of higher dimension are becoming more useful in modelling such data. One characteristic of such data is the diversity that it presents in its features and therefore controlling such data requires flexible modelling techniques [3]. Bayesian framework for example, solves such problems through its unique nature of incorporating prior information about the parameters that is, allowing the parameters to vary as well and quantifying this variation using probability distributions [4]. For this reason, Bayesian modelling has been adopted in areas of artificial intelligence, bioinformatics, agriculture, and economics among many others [5]. Data in the form of proportions or count is not an exception to the emerging era of big data. It is often encountered in many scientific and social fields. One common feature that may be noticed from such data is over or under dispersion [6]. Over or under dispersion can be as result of variation in the success probability which in the case of a simple binomial model is treated as a constant but in reality, is usually not the case. Therefore, mixture distributions are developed to capture over dispersion where the mixing distribution is defined on the [0, 1] interval due to the property that the success probability should range on this interval. McDonald Generalized Beta-Binomial distribution (McGBB) has been proven to be superior in modelling overdispersed binomial data by [7]. The maximum likelihood estimation procedure was applied by [7] and [8] introduced the estimating equations for the McGBB distribution. Previously, estimation of the McGBB parameters using the Bayesian framework has not been addressed. Therefore, in this paper we develop a Bayesian framework through a Monte Carlo simulation technique specifically the Metropolis-Hasting step within Gibbs sampler to obtain the marginal posterior samples of the parameters of the McGBB, which will in turn help in Bayesian inference in particular estimation of the estimates of the parameters. The first part of this paper introduces the reader to the topic, the second part covers methodology which is subdivided into sections, the third part presents results and the fourth part gives a conclusion. In the paper different symbols have been used to identify different parameters, the symbols are defined as:
α, β and γ: Shape parameters for McGBB distribution, which the paper wishes to estimate using the Bayesian framework.
: A notation that represents the parameter space, which contains the three parameters named (α, β and γ).
,
and
: notation for prior distribution of each shape parameter.
: A notation of the metropolis hasting ratio.
: Joint proposal distribution used for the metropolis hasting algorithm.
: Credible region showing upper and lower limits for the shape parameter α.
: Credible region showing upper and lower limits for the shape parameter β.
: Credible region showing upper and lower limits for the shape parameter γ.
2. Methodology
In the methodology section, brief discussions of the distribution, the simulation algorithm and Bayesian method are discussed. This ensures that the reader is able to follow through each step of what forms the entire part of developing the Bayesian framework.
2.1. McDonald Generalized Beta Binomial Distribution
The McGBB is a mixture of distribution that is used to capture and model over-dispersion in binomial data. The mixture distribution is obtained by mixing the binomial distribution and the McDonald generalized beta distribution of the first kind [7]. A random variable X follows a MCGBB with parameters (n, α, β, γ) if the probability mass function of X is expressed as:
For
.
where,
is the beta function.
An alternative form of the probability mass function of X is given as:
For
. Source [7].
Where the parameters α, β and γ are the shape parameters of the distribution that can only assume positive values.
The properties of the distribution have been discussed in detail by [7], this paper primary attention is given to the discussion of the Bayesian framework to estimate the parameters of the distribution.
2.2. Simulation of McDonald Generalized Beta-Binomial Variables
Simulation of the McGBB variables has not been implemented using the following algorithm. The algorithm provides a direct approach to simulate directly from a McGBB distribution while being in control of the parameters. The variables obtained are McGBB variables as opposed to the algorithm suggested by [7] whose variables were over dispersed random binomial variables:
Step 1: Setting fixed values of α, β and γ.
Step 2: Generate K random variables from Beta (α, β) (i.e.
for
.
Step 3: For each of the random variables that is
for
.
Step 4: For each
in step 3 generate binomial random variables that is:
, then
for
.
2.3. Bayesian Method
Prior information is what makes Bayesian framework a unique method of parameter estimation as it provides a framework through which expert opinion that may have been obtained from previous studies is incorporated to the study [4]. Bayesian theory demands that the parameters of a distribution be treated random variables and thus the prior distribution is the main way in which such information (beliefs) is quantified [9]. The first step of Bayesian analysis is to define a probability distribution that is best suited to model a given dataset. The second step is to choose appropriate prior distributions for the parameters that characterizes the distribution selected. The choice of prior distributions is mostly guided by intuition and information that exists and is known about these parameters [10]. The methodology of choosing a prior distribution is however criticized by the classical statisticians since it is subjective as it is guided by intuitive knowledge which often lead to the use of informative priors. However, to circumvent this limitation, theory has highlighted the method of non-informative priors [11]. Non-informative prior distributions provide no information or provide an equal chance for all the possible parameters values in the parametric space before the data is observed [12]. The subject of obtaining the non-informative priors is the current in thing in Bayesian research, but a commonly used non-informative prior is the flat or diffuse priors. In this paper flat priors were used. In the Bayesian framework inferences are made based on the marginal posterior distributions of the parameters. However, a closed form of the joint posterior distribution may not be feasible thus making it more challenging to sample from such a distribution. Further, in cases of high dimensional parametric spaces, the integrations of such joint posterior distributions become intractable and complex and thus not easy to obtain the marginal posterior distributions [13]. In order to avoid the problem of intractability problem, the most commonly used methods are the Markov Chain Monte Carlo methods (MCMC) [14]. With these methods it is possible to obtain a samples from the marginal posterior distributions of the parameters from the joint without performing the integrations [15]. In this paper the Bayesian framework developed utilized the MCMC methods, specifically the use of a Metropolis-Hasting step within the Gibbs sampling.
In the Bayesian framework, the unknown parameters of the model were assumed to be random variables. Thus, there was a need to make appropriate assumptions about the distributions (prior distributions) of unknown parameters. The McGBB distribution has three shape parameters
which we are interested in estimating, the study assumed the prior distribution of
,
and
had jointly a flat prior, which were represented as:
The joint posterior distribution of
was obtained by multiplying the conditional distribution
(essentially the likelihood function) with
the joint prior distribution of
,
and
. Let
be a random sample of size N from a McGBB distribution with unknown parameter vector
. The conditional distribution of
is obtained as:
The posterior distribution becomes:
It is evident that sampling from this joint posterior distribution is complicated, thus the study employed MCMC methods and in particular, the Metropolis Hasting step within the Gibbs sampling technique. In order to implement this algorithm, computation of the full conditional distributions for the parameters was obtained as follows:
Then the metropolis within Gibbs sampling algorithm involved the following steps:
Step 1. Start with 𝑗 = 1 and the initial values of
.
Step 2. Using the proposal distributions of
, where the proposal for the parameters was chosen as
,
.
sample a candidate value for
.
Step 3. Generate U from a Uniform (0, 1) distribution (i.e.
.
Step 4. Calculate the Metropolis-Hasting (MH) ratio at the candidate value
and the previous value
, using block updating.
Step 5: If
, accept the candidate point with probability
, i.e., set
. Otherwise set
. Therefore for all
a sample of size M is obtained as the joint posterior distribution
sample.
Let
be the burn-in period for the markov chains for the parameters then under squared error loss function of the Bayesian estimates parameters were obtained as the mean of the samples generated using the algorithm above, i.e.,
The
Bayesian Credible Intervals and
Highest Probability Density (HPD) intervals for
,
and
were obtained using the algorithm proposed by [16]. The algorithm packaged in the package CODA
in R language. Let
be an ordered sample corresponding to the MCMC chain
obtained using the algorithm above.
Then the approximate
Bayesian Credible Intervals for
,
and
were obtained as:
,
and
respectively.
2.4. Real Dataset
The framework was also applied to a real dataset and point estimates obtained. The dataset is secondary called alcohol dataset which has also been used in literature by [7] and [8] for illustration. This dataset is documented in the fitODBOD package in R language. It consist of data collected in Netherlands for self-reported alcohol consumption frequencies from 399 randomly selected sample for two independent weeks. When number of days a respondent consumes alcohol out of 7 days is treated as a binomial random variable, traits of over-dispersion is portrayed from the variations of different individuals need to consume alcohol.
3. Results and Discussion
A small sample of size
and a large sample of size
was simulated from the McGBB distribution while fixing the shape parameters at
respectively and applied to the Bayesian framework to get estimates of these parameters (essentially to recover the fixed parameters mentioned). The results obtained are presented using visuals of trace plots to show the behaviour of the samples generated, histogram plots to show the distribution or the unique shapes that the parameter estimates assume, autocorrelation plots, trace plots for means and over dispersion and the estimates presented in a table.
Figure 1 shows the different chains generated from the posterior sampling when the sample size
which was generated by setting the true values at
respectively. The starting points were set at
for the parameters
. The values of sigma for the proposal distributions were set at
. The first ten thousand values of three hundred thousand iterations were discarded as the burn-in-period. The blue horizontal line shows the true parameter value while the black horizontal line shows the parameter estimate that was given by the mean value for every chain.
The horizontal lines on the plot for α shows a bigger difference between the true parameter value and the estimate. However, for β and γ the horizontal lines are close to each other which is an indication that the estimates were relatively close to the true value. From the figure it can be seen that there was a higher serial correlation for α and γ than it was for β. To see the autocorrelation behaviour, autocorrelation plots are presented in Figure 2.
From the plots, there was high correlation in α values which is almost close to 1; confirming the behaviour of the trace plot for α. The parameter β and γ show a decreasing correlation as more iterations are performed. Beta had the lowest correlation of all the parameters which is also evident from its trace plot.
![]()
Figure 1. Trace plots for the three shape parameters of the McGBB when k = 25.
![]()
Figure 2. Autocorrelation plots for α, β and γ when k = 25.
Figure 3 shows the different histogram plots for the shape parameters. This shows the shape of the distribution that each of the parameter takes from the iterations performed. From the figure the histogram for α and γ are skewed to the left with γ being the most skewed. β has a relatively normal curve.
Figure 4 shows the trace plots for each mean computed at every parameter estimate obtained from the chains generated and a plot for the over-dispersion parameter. The blue horizontal line shows on average the mean and over-dispersion parameter across the chains. The mean was a value close to 0.7 and the over dispersion of a value close to 0.5. For values of over dispersion close to one indcate high over-dispersion while values of row close to zero indicate low over-dispersion. From the plot it can be seen that over-dispersion was present and it was relatively on the average.
Figure 5 shows the different chains generated from the posterior sampling when a sample of size
, with true values of
respectively. The starting points were set at
for the parameters
. The values of sigma for the proposal distribution were set at
respectively. The black horizontal line shows the true parameter value while the blue horizontal line shows the parameter estimate that was given by the mean value for every chain. From the figure, the plots of
and
it can be seen that autocorrelation was high and low for
plot.
From Figure 6, correlation was relatively average for the parameter β, and high for α and γ. This can be seen from the behaviour of the traceplots. Comparing the autocorrelation plots for when sample size is k = 25, it can be seen that autocorrelation has increased for sample size k = 1000.
From Figure 7, the shape of α is skewed so is γ. However, γ is more skewed in shape than α. β approaches a normal distribution shape. It can be noticed that with the increase in sample size the shapes are becoming flatter compared to when the sample size was k = 25.
![]()
Figure 3. Histogram plots for the parameters when k = 25.
![]()
Figure 4. Trace plots for means and over dispersion parameter ρ computed from the estimates generated.
![]()
Figure 6. Correlation plots for α, β and γ.
![]()
Figure 7. Histogram plots for α, β and γ.
Figure 8 shows trace plots for each mean computed at every parameter estimate obtained from the chains generated and a plot for the over-dispersion parameter. The blue horizontal line shows on average the mean and over-dispersion parameter across the chains. When the sample size is increased it can be noticed that there was an increase in over dispersion as well. On average for every combination of estimates computed the mean was 0.50. The average over dispersion was about 0.65 for every combination of the estimates computed across three hundred thousand iterations.
From Table 1 the estimates were obtained from the posterior mean with true values of the three parameters set at
respectively. The first one thousand values sampled were discarded as the burn-in-period The acceptance rates of the sampled candidate values were relatively within the acceptable limits for small samples however, relatively declined for large samples. The acceptable limits of the acceptance rates should be between 13% - 50% [5]. Acceptance rates show the percentage of the values from the specified iterations which in this case was three hundred thousand that mimic the posterir distribution. The estimates for the parameter β and γ were close to the true parameter values. However, the parameter estiamtes of α were not as close as β and γ to the true parameter value, but was not as far from the true value as much. The bayesian framework produced estimates that were close to the true parameter values for small samples.
Table 2 shows the standard errors for the estimates obtained. Standard errors shows the how precise the estimates are. From the table the parameter α had higher standard errors, which is also reflected in the trace plots figure where the variability is high. The parameter β had lower standard errors, which is evident from the low variation displayed in the trace plot figures while γ had relatively low standard errors.
From Table 3, all the credible regions for all the parameters contained the true parameter value. Therefore, there is a 95% chance that the true value of each of the parameters
will lie within their respective credible regions computed.
Table 4 shows the confidence lengths which were short especially for the parameter β. This further stamps the preciseness of the Bayesian framework in the estimation of the McGBB parameters.
Table 5 shows the highest posterior densities for the parameters. The HPD is the shortest credible interval among all the intervals. Any point within the HPD has a higher density than any other point outside [17].
The article also uses a real dataset to apply the Bayesian framework. Table 6 shows the estimates that were obtained when the Bayesian framework was applied for the alcohol dataset. Other methods of estimation by [7] and [8] have been applied for the same dataset. While the primary attention of this article was to come up with a Bayesian framework, readers can compare the estimates for the alcohol dataset with those obtained by [7] and [8].
Table 7 shows the standard errors of the estimates when a real dataset was used. From the table the standard errors were high for α which could have been contributed by the high variation portrayed in α.
![]()
Figure 8. Trace plots for means and over dispersion parameter.
![]()
Table 1. Parameter estimates for α, β and γ.
![]()
Table 2. Standard errors of the estimates.
![]()
Table 3. Bayesian credible intervals.
![]()
Table 4. Lengths of the credible regions.
![]()
Table 6. Application to a real dataset (Alcohol dataset).
![]()
Table 7. Standard Errors for the Estimates of the Real data.
4. Conclusion
This article explores a Bayesian framework for the mixture distribution known as McDonald generalized beta-binomial distribution. From the framework, the paper has been able to obtain the point estimates, credible regions and HPD intervals. It is evident that the point estimates obtained do not deviate from the true parameter which can also be seen from the standard errors obtained. The credible regions obtained are of short lengths and all include the true parameter value, which further emphasizes on the precision of the Bayesian framework. This paper emphasizes the importance of the Bayesian framework on modelling mixture models or distributions. The use of Bayesian framework for estimation is preferably good especially for a mixture distribution that is susceptible to problems of integration. The ability of the framework to overcome the challenge of intractable integrations especially in high-dimensional distributions by applying a random search procedure to obtain estimates makes it a preferable method. McGBB is one of the many beta-type generated distributions, this work can be extended to other distributions of the same class. Moreover, future studies can explore the Bayesian methodology by using informative priors as opposed to non-informative that were used in this study.