_{1}

This paper presents a new approach to identify and estimate the dispersion parameters for bivariate, trivariate and multivariate correlated binary data, not only with scalar value but also with matrix values. For this direction, we present some recent studies indicating the impact of over-dispersion on the univariate data analysis and comparing a new approach with these studies. Following the property of McCullagh and Nelder [1] for identifying dispersion parameter in univariate case, we extended this property to analyze the correlated binary data in higher cases. Finally, we used these estimates to modify the correlated binary data, to decrease its over-dispersion, using the Hunua Ranges data as an ecology problem.

The dispersion parameter should be the unity in case of the univariate Bernoulli data, but there may be deviation if there is a sequence of the Bernoulli outcomes included in a study that may lead to a binomial variable. The over-dispersion is happened if the variance of actual response is more than the nominal variance,

This paper can be organized as follows: Some of the previous studies are presented in the Section 2.

A proposed approach for identifying and estimating the dispersion parameters in a scalar and matrix values, and the impact of over-dispersion in the case of bivariate, trivariate and multivariate binary outcomes associated with covariates, are demonstrated in the Sections 3, 4 and 5, respectively.

Finally, the numerical examples for the vectorized generalized additive model, VGAM, or vectorized generalized linear model, VGLM, Yee and Wild [

In this section, we present some studies on the over-dispersion problem as shown below:

(1) Smith and Heitjan [

The hypothesis testing problem is formulated as

An appropriate procedure to test

where

And

with p degrees of freedom. The eventual rejection of

(2) Cook and Ng [

where, the expectation,

(3) Saefuddin et al. [

A simple method proposed by William, [

where

where

The algorithm of the William method is described as follows:

1. Assume

2. Compare

3. Using the initial weights

we can recalculate the value of

4. If

If

(4) Davila et al. [

where,

The conditional variance is

From the relation (12), we see that the marginal dispersion parameter is

Comparing the relation (1) with the relation (12), it is noted that the later has a greater variance. In their study, as compared with the multivariate normal (MVN), the marginal GLM, and the marginal over-dispersion model (ODM), they have shown that the model based on the Beta-binomial model (BBM) displayed the higher standard errors associated to estimated parameters.

(5)-The vectorized generalized additive model (VGAM) introduced by Yee and Wild [

where,

And the

The conditional distribution of VGAM family function for trivariate binary responses,

Note that a third order association parameter,

The conditional distribution of VGAM (VGLM) function for multivariate correlated binary responses,

where

In the next section, we suggest a new approach to estimate the dispersion parameter,

Using the following notations which imply to the link functions which enable us to use the regression model:

we have the log-likelihood function for the bivariate AQEF measure as

The log-likelihood function for the trivariate AQEF measure is

where,

Finally, the log-likelihood function for the multivariate AQEF measure is

where,

In this section, we determine the identification and estimation of a fixed value for dispersion parameter,

We can use the variance-covariance matrix of

Following the GLM property, the variance-covariance matrix of Y is

where,

And,

Then, the estimator of

Hence, we can show that

Then,

Follows the non-central

Now, we use different values for dispersion parameter, such that

The estimator of dispersion parameters matrix is

Then,

From the equation (26), we have

Follows the non-central

We can correct the data using the estimates of dispersion parameters,

We can define the response vector

The variance-covariance matrix of Y can be written as

where,

The estimator of

Since,

Follows the non-central

The variance-covariance matrix of Y can be displayed as

The estimator of dispersion parameters,

Since,

Follows the non-central

Similarly, we can correct the data using the estimates of dispersion parameters,

We can define the response vector

The variance-covariance matrix of Y can be written as

where,

The estimator of

Since,

Follows non-central

The variance-covariance matrix of Y can be displayed as

The estimator of dispersion parameters,

Since,

Follows non-central

In this section, we present two examples. The first one applies to the bivariate correlated binary data. This example presents the results obtained by using AQEF measure and the VGLM measure which are similar in the bivariate case. The second one applies on the trivariate binary data. However, the third association is absent in the VGAM (VGLM) measure. In both examples, we will use the Hunua Ranges data, Yee [

At 392 sites in the forest, the presence/absence of 17 plant species was recorded along with the altitude. Each site was of area size 200 m^{2}. The Hunua Ranges data frame has 392 rows and 18 columns. Altitude is a continuous variable, and there are binary responses (presence = 1, absence = 0) for 17 plant species. These data frame contains the following columns: agaaus, beitaw, corlae, cyadea, cyamed, daccup, dacdac, eladen, hedarb, hohpop, kniexc, kuneri, lepsco, metrob, neslan, rhosap, vitluc and altitude (meters above the sea level).

Hence, we will use the first two columns, agaaus and beitaw, as correlated binary outcome variables,

We will use the estimates,

From

1. The estimates of the regression parameters are changed.

2. The standard errors are decreased for the estimates of association parameters. This leads to a significant association between the two outcomes binary variables,

3. The Wald statistic test shows lower values, this confirms a significant association between the two outcomes binary variables,

4. The LRT is increased, this also confirms the conclusion observed from the Wald statistic.

5. The estimate of a scalar dispersion parameter,

6. The estimates of the matrix of dispersion parameters,

7. The scaled deviance value is increased.

We will use the columns, cyadea, beitaw and kniexc, as the dependent correlated binary variables,

Parameters | Estimates | Standard Errors | Wald Statistic | Parameters/Tests | Estimates |
---|---|---|---|---|---|

−0.9320 | 0.1487 | −6.2663 | 1.4458 | ||

−1.0139 | 1.0793 | −0.9393 | S. Deviance | 13.8206 | |

−0.2389 | 0.1191 | −2.0057 | LRT: | 14.5761 | |

1.0656 | 0.4686 | 2.2742 | 0.8579 | ||

−0.9598 | 0.2876 | −3.3377 | 0.9906 | ||

−10.9504 | 154.7484 | −0.0708 | 0.7977 |

Hence, the LRT’s will be compared with

Parameters | Estimates | Standard Errors | Wald Statistic | Parameters/Tests | Estimates |
---|---|---|---|---|---|

−0.8212 | 0.1456 | −5.6397 | 1.4748 | ||

−1.0256 | 1.0457 | −0.9808 | S. Deviance | 26.5973 | |

−0.2106 | 0.1203 | −1.7503 | LRT: | 16.1546 | |

1.0645 | 0.4720 | 2.2552 | 0.9549 | ||

−0.9820 | 0.2788 | −3.5225 | 1.0853 | ||

−10.9610 | 149.3405 | −0.0734 | 0.8361 |

Hence, the LRTs will be compared with

Estimates and Tests | Before modifying the data | After modifying the data | ||
---|---|---|---|---|

Model | AQEF | VGLM | AQEF | VGLM |

−0.2910 | −0.9517 | −0.2917 | −1.0348 | |

−0.0023 | −0.0006 | −0.0026 | −0.0002 | |

−0.5336 | −2.8037 | −0.4942 | −0.6708 | |

0.0009 | 0.0093 | 0.0009 | 0.0062 | |

−0.0139 | −0.7867 | −0.0724 | −0.6435 | |

0.0015 | 0.0048 | 0.0010 | 0.0032 | |

−0.1245 | 0.9098 | −0.1340 | 0.6782 | |

−0.0014 | −0.0016 | −0.0003 | −0.0013 | |

−0.1180 | −0.1369 | −0.1269 | 0.0400 | |

−0.0007 | 0.0016 | −0.0006 | 0.0008 | |

0.0443 | 2.2313 | 0.0184 | 1.5890 | |

0.0006 | −0.0072 | 0.0007 | −0.0048 | |

0.0438 | None | 0.0243 | None | |

0.0053 | None | 0.0044 | None | |

Scaled Deviance | 248.8728 | 119.2507 | 272.9934 | 134.7810 |

Log-likelihood | −762.1282 | −738.9422 | −767.4405 | −717.1155 |

LRT: | 34.6890 | 53.2214 | 31.8918 | 112.5485 |

LRT: | 2.7690 | 47.3497 | 0.4542 | 143.5384 |

LRT: | 6.4283 | 57.8120 | 76.6875 | 136.8635 |

LRT: | 23.9190 | None | 3.8179 | None |

1.0600 | 1.2202 | 1.0235 | 0.9937 | |

0.9802 | 2.4209 | 0.9336 | 1.8922 | |

1.1670 | 1.2720 | 1.0416 | 0.9494 | |

1.1933 | 1.8250 | 1.1978 | 1.0703 | |

1.1767 | 1.6050 | 1.1295 | 1.0700 | |

0.9434 | 1.7933 | 0.8912 | 0.9673 | |

1.8424 | 3.3679 | 1.6760 | 2.0798 |

Hence, the LRT’s will be compared with

From

1. The estimates of regression parameters in the two measures are changed.

2. The scaled deviance is increased for the two measures.

3. The estimate of a scalar dispersion parameter,

4. The estimates of values of dispersion parameters,

5. For the VGLM measure, the LRTs reflect significant association between the pairwise outcome variables,

For the AQEF measure, the LRTs also reflect significant association between the pairwise outcome variables,

However, no significant association is observed between the correlated binary outcome variables,

6. The LRT for the third association, which is observed from the AQEF measure, reflects no significant association between the correlated binary outcome variables,

So, when modifying the correlated data, the estimates of dispersion parameters,

For all my professors.

Ahmed Mohamed Mohamed El-Sayed, (2016) A New Approach for Dispersion Parameters. Journal of Applied Mathematics and Physics,04,1554-1566. doi: 10.4236/jamp.2016.48165