_{1}

^{*}

This paper addresses the problem of inference for a multinomial regression model in the presence of likelihood monotonicity. This paper proposes translating the multinomial regression problem into a conditional logistic regression problem, using existing techniques to reduce this conditional logistic regression problem to one with fewer observations and fewer covariates, such that probabilities for the canonical sufficient statistic of interest, conditional on remaining sufficient statistics, are identical, and translating this conditional logistic regression problem back to the multinomial regression setting. This reduced multinomial regression problem does not exhibit monotonicity of its likelihood, and so conventional asymptotic techniques can be used.

We consider the problem of inference for a multinomial regression model. The sampling distribution of responses for this model, and, in turn, its likelihood, may be represented exactly by a certain conditional binary regression model.

Some binary regression models and response variable patterns give rise to likelihood functions that do not have a finite maximizer; instead, there exist one or more contrasts of the parameters such that as this contrast is increased to infinity, the likelihood continues to increase. For these models and response patterns, maximum likelihood estimators for regression parameters do not exist in the conventional sense, and so monotonicity in the likelihood complicates estimation and testing of binary regression parameters. Because of the association between binary regression and multinomial regression, multinomial regression methods inherit this difficulty. In particular, methods like those suggested by [

[

Section 2.1 reviews binary and multinomial regression models, and relations between these models that let one swap back and forth between them. Section 2.2 reviews conditional inference for canonical exponential families. Section 2.3 reviews techniques of [

This section describes existing methods used in cases of likelihood monotonicity in multinomial models, and presents new methods for addressing these challenges.

Methods will be developed in this manuscript to address both multinomial and binary regression models. In this section, relationships between these models are made explicit.

Consider first the multinomial distribution. Suppose that M multinomial trials are observed; for trial

Here

Let

The binary regression model is similar; let

for

where

for

The binary regression model can be recast as a multinomial regression model. Furthermore, the multinomial regression model may be expressed as a conditional binary regression model. Suppose that (1) and (2) hold. Let

are 0. Let

with

The model for

for

When

Then

least

The cumulative probabilities implicit in (9) may be approximated as

for

logarithm of the likelihood (in the multinomial regression case, given by (2)),

and

Similar techniques may be applied to the multinomial regression model, with

This section reviews and clarifies techniques for inference in the presence of monotonicity in the logistic regression likelihood (4) given by [

for

Theorem 1. Suppose that random vectors

with

where

Furthermore, the conditional probabilities are the same as those arising if observations with positive entries in

The matrix

Suppose that there exists a vector

with strict inequality holding in place of at least one of the inequalities. Then the likelihood

Bias-correction is possible for maximum likelihood estimators [

Standard errors may be calculated from the second derivative of the unpenalized log likelihood [

Here the superscript 11 on

The approach using Jeffreys’ prior to penalize the likelihood has some disadvantages. The union of all possible confidence intervals resulting as the penalized estimator plus or minus a multiple of the standard error has a finite range, and so the confidence region procedure as described above has vanishing coverage probability for large values of the regression parameter.

We investigate the behavior of maximum likelihood estimates in the multinomial regression model (2). Maximizers for both the original likelihood and for the likelihood of the distribution of sufficient statistics of interest (6) or (3) conditional on the remaining canonical sufficient statistics are considered. Conditional probabilities arising from the logistic regression model (4) are of form (2), and so may also be handled as below.

Consider the occurrence of infinite estimates for model (2). Denote the sample space for the

Corollary 1. Unique finite maximizers of the likelihood given in (2) exist if and only if

is greater than zero.

Proof. If

Suppose that

given by

Now take

One may determine whether such a c exists by maximizing c over non-negative c and

The second corollary follows directly from Theorem 1.

Corollary 2. Suppose that the random vector

If either Corollary 1 or Corollary 2 indicates that finite maximum likelihood estimators do not exist, one might look for estimators in the extended real numbers

[

The following date reflects the results of a randomized clinical trial testing the effectiveness of a screening procedure designed to reduce hepatitis transmission in blood transfusions [

A log linear model is used, implying a comparison between each of the two hepatitis categories and the third no-disease baseline category. Each of these comparisons involves parameters for I, S, T, and

The lack of any Hepatitis C cases among the treated individuals in the early period gives rise to infinite estimate for the T and

In this simple case, closed-form maximizers of (2) under the alternative hypothesis exists, since the alternative hypothesis may be viewed as a saturated model for a

The sampling distribution of the sufficient statistics is available in under the hypothesis that all six S, T, and

Group | Hepatitis Outcome: Response Variable | ||
---|---|---|---|

C | Non-ABC | No disease | |

Time 0 Treated | 0 | 2 | 400 |

Time 0 Untreated | 5 | 3 | 389 |

Time 1 Treated | 3 | 10 | 1896 |

Time 1 Untreated | 5 | 11 | 1864 |

Inference on the two

A second example concerns polling data related to British general elections [

The null distribution of this data set cannot be trivially expressed as the independence distribution for a contingency table. One might enumerate the conditional sample space for the sufficient statistics vectors of (3), and the associated conditional probabilities [

This manuscript is primarily concerned with producing confidence intervals. One might use the asymptotic intervals calculated using the penalized likelihood (17), or (9), with probabilities calculated exactly using

Maximized Posterior | Conditional MLE | |
---|---|---|

Minimum | −0.986 | −¥ |

Maximum | 0.967 | ¥ |

Asymptotic, based on posterior | Exact | |
---|---|---|

Minimum | 0 | 0.950 |

Sufficient Statistic | Number of Corresponding | Conditional |
---|---|---|

Value | Response Vectors | Probability |

73 | 8673 | 0.03214 |

74 | 9009 | 0.03339 |

75 | 1335 | 0.00495 |

76 | 26,208 | 0.09712 |

77 | 62,412 | 0.23129 |

78 | 22,440 | 0.08316 |

79 | 15,120 | 0.05603 |

80 | 65,484 | 0.24268 |

81 | 59,160 | 0.21924 |

Sufficient | Interval Type | ||
---|---|---|---|

Statistic Value | (17) | Exact | Saddlepoint |

73 | (−1.934, 0.118) | (−¥, −0.086) | (−¥, −0.242) |

74 | (−1.508, 0.134) | (−2.996, 0.052) | (−8.668, 0.122) |

75 | (−1.264, 0.204) | (−1.478, 0.068) | (−3.420, 0.400) |

76 | (−1.092, 0.296) | (−1.436, 0.310) | (−2.246, 0.712) |

77 | (−0.958, 0.408) | (−1.180, 0.674) | (−1.624, 1.118) |

78 | (−0.848, 0.548) | (−0.784, 0.816) | (−1.216, 1.712) |

79 | (−0.760, 0.728) | (−0.666, 1.010) | (−0.910, 2.850) |

80 | (−0.694, 0.990) | (−0.616, 3.056) | (−0.632, 8.042) |

81 | (−0.696, 1.492) | (−0.406, ¥) | (−0.216, ¥) |

noted above, coverage for the asymptotic interval is zero for

fails to cover some values of

In practice,

One can also consider simultaneous inference on both education parameters.

This paper presents an algorithm for converting a multinomial regression problem that features nuisance parameters estimated at infinity to a similar problem in which all nuisance parameters have finite estimates; this conversion is such that the distribution of a sufficient statistic associated with the parameter of interest, conditional on all other sufficient statistics, remains unchanged. These conditional probabilities in the reduced model may be approximated using standard asymptotic techniques to yield confidence intervals with coverage behavior superior to those that arise from, for example, asymptotics derived from the likelihood after penalizing using Jeffreys’ prior.

This research was supported in part by NSF grant DMS 0906569.

John E. Kolassa, (2016) Inference in the Presence of Likelihood Monotonicity for Polytomous and Logistic Regression. Advances in Pure Mathematics,06,331-341. doi: 10.4236/apm.2016.65024