Log-Link Regression Models for Ordinal Responses

The adjacent-categories, continuation-ratio and proportional odds logit-link regression models provide useful extensions of the multinomial logistic model to ordinal response data. We propose fitting these models with a logarithmic link to allow estimation of different forms of the risk ratio. Each of the resulting ordinal response log-link models is a constrained version of the log multinomial model, the log-link counterpart of the multinomial logistic model. These models can be estimated using software that allows the user to specify the log likelihood as the objective function to be maximized and to impose constraints on the parameter estimates. In example data with a dichotomous covariate, the unconstrained models produced valid coefficient estimates and standard errors, and the constrained models produced plausible results. Models with a single continuous covariate performed well in data simulations, with low bias and mean squared error on average and appropriate confidence interval coverage in admissible solutions. In an application to real data, practical aspects of the fitting of the models are investigated. We conclude that it is feasible to obtain adjusted estimates of the risk ratio for ordinal outcome data.


Introduction
Several logit-link regression models have been proposed to deal with ordered categorical response data. Three of these are the adjacent categories model [1], the continuation-ratio model [2], and the cumulative odds model [3]. The last is referred to also as the proportional odds model [4]. The basis of each of these models is the discrete choice model [5] for nominal categorical outcomes that are also termed the multinomial logistic regression model [6].
The purpose of this paper is to investigate the practicality of fitting the ordinal models with a logarithmic link in place of the logit link. We refer to the resulting models as the adjacent categories (AC) probability model, the continuation-ratio (CR) probability model, and the proportional probability (PP) model. Each is a constrained form of the log multinomial model [7], the log-link counterpart of the multinomial logistic model. The ordinal log-link models make it possible to directly estimate different but related forms of the risk ratio in prospective studies and the prevalence ratio in cross-sectional studies, overcoming thereby a limitation of logit-link models.
Epidemiological research is grounded largely in assessment of average risk, and in that field the worth of the odds ratio as a measure of effect has long been questioned [8,9] particularly for prospective [10] and crosssectional [11] data.
To describe the log-link models for ordinal data, we have adapted specialist terminology used for ordinal logistic models. Several authors [6,12,13] have distinguished "forwards" and "backwards" versions of the CR logit-link model, with the outcome categories taken in reverse order in the "backwards" version. For proportional odds models, O'Connell [14] distinguished between an "ascending" version for lower-ordered categories versus higher categories, and a "descending" version for higher-ordered categories versus lower categories. Accordingly, we distinguish "forwards-ascending" and "forwards-descending" versions of the AC probability model and the PP model. The two versions of each model produce coefficients that differ both in sign and magnitude. For the CR probability model, it is necessary to additionally distinguish "backwards-ascending" and "backwardsdescending" versions because the four possible versions each produce a different set of estimates. For brevity, we 1 [15] used these data to demonstrate logit-link ordinal regression models.
For the example data, the log multinomial model for the final 1 2 J   outcomes involves estimation of the joint probability of the "Mild" outcome among all subjects, and the joint probability The results of estimating the model are shown at left in panel A of  Table 2, and the estimated standard errors are

Forwards-Descending AC Probability Model
For the example data, the ratio constraint is The particular assumption of the AC probability model is that the joint probabilities have a response to covariates that is log-linear in the coefficients and a multiple of category order. The forwards-descending AC probability model is: for and , and where the superscript r denotes a constrained estimate, and the intercepts for when, in general, these ratios are unequal and differ for each covariate holding. 3, , j 

Assessment of Loss of Model Fit
Three equivalent large-sample methods [17] are available to assess whether the constraints result in significant loss of model fit. Each has an asymptotic distribution that is chi-squared with degrees of freedom equal to the number of additional constraints imposed. Firstly, a test may be based on the likelihood ratio criterion [18]. Secondly, a test may be based on the efficient scores criterion [19]. Thirdly, a test may be based on the criterion proposed by Wald [20]. For a fully-constrained AC probability model in which the constraint is applied to each covariate, the null hypothesis for these tests is that for . It is possible to apply the constraints to a subset of the covariates in a partially-constrained model [21], and to test a composite hypothesis. Readers are referred to Rao [17] for details.
The score and Wald statistics are identical in the first four decimal places. The results suggest that the effect of exposure has been summarized into a single relative risk without significant loss of model fit.

Forwards-Descending Conditional Model
Consider ordinal outcome data arising from a hierarchical sequence of binary events in which the probability at each stage is conditional on having reached that stage. McCullagh and Nelder [22] provide an example of insemination of milch cows, and propose a model of the probability of impregnation of a cow at the attempt after unsuccessful attempts. This is the probability of terminating at the stage conditional on having reached that stage. O'Connell [14] provides an example of childhood proficiency in literacy in which mastery of any level of literacy requires mastery of all previous levels, and interest lies in the probability of success at a higher level. This is the probability of advancing beyond the Assume that the conditional probabilities ij  depend also upon the observed values of the covariates, and have the exponential form are parameters to be estimated. The forwards-descending conditional probability model is: The linear predictor is: The likelihood and log likelihood of the data under this model are given in Supplementary Materials. The model can be fitted with software that provides a procedure for maximising the log likelihood with respect to its An alternative approach to fitting the model is also described in Supplementary Materials.
For the example data, the forwards-descending conditional probability model involves estimating the conditional probability of a "Mild" or "Severe" outcomes among all subjects, and the conditional probability of the "Severe" outcome among subjects with a "Mild" or "Severe" outcome. The component tables for the model are shown in panel B of Table 2.
The results of estimating the model are shown at left in panel B of

RR
. Those estimates and their estimated standard errors can be verified from the data in Table 2.

Forwards-Descending CR Probability Model
The particular assumption of CR probability models is that the conditional probabilities have a response to covariates that is independent of the outcome category and log-linear in the coefficients.
The forwards-descending CR probability model is:

Forwards-Descending Cumulative Probability Model
One further way of partitioning the ordered response categories is to form cumulative "splits" of the outcome data between category 1 and all higher categories, category 1 or 2 and all higher categories, and so on. This prompts a model of the cumulative probability of falling in category j or a higher category: The forwards-descending cumulative probability model is: for and and where 1, 2, , , and hence . The linear predictor is: The likelihood and log likelihood of the data under this model are given in Supplementary Materials. The model can be fitted with software that provides a procedure for maximising the log likelihood with respect to the    

Forwards-Descending PP Model
The particular assumption of the PP model is that the cumulative probabilities have a response to covariates that is independent of the outcome category and log-linear in the coefficients. The forwards-descending PP model is: where the intercepts 0 c j  and common slope coefficient are unique. This model can be estimated by fitting the cumulative probability model (5)

Three Properties of a Fully-Constrained Cumulative Probability Model
Firstly, the constrained value of the largest fitted cumulative probability at each set of covariate values, is identical to its unconstrained value . Comparing solutions to the constrained and unconstrained models in panel C of Table 3, we see that and 21 21 , and hence that . This remarkable result is a property of a fully-constrained cumulative probability model.
Secondly, the PP model (6) can be fitted as a log multinomial model (1) estimated subject to an equality constraint on the coefficients of each covariate (see Supple-

 
from the log multinomial model. The PP model forces these joint probabilities to be equal.
Thirdly, in a fully-constrained model, the fitted probabilities To confirm that the three ordinal log-link regression for where and denote distinct covariate holdings. These ratio equalities justify the description of the fully-constrained cumulative probability model as a proportional probability (PP) model. 3

Starting Values
A careful choice of starting values is required for the cumulative probability models. This is a consequence of the need to evaluate terms of the form  in the log likelihood. As starting values we use the estimates from the common-slopes log multinomial model, with intercept values derived from the recursive relation given previously.

Non-Admissable Solutions
Solutions to a probability model are not admissible if any of the fitted probabilities exceed unity and if the sum of the fitted probabilities exceeds unity. The functional form of the three ordinal models does not constrain the fitted probabilities to take admissible values, and hence non-admissible solutions can occur in the extremes of covariate space if the model-based probabilities are very high.

Non-Convergence
The log likelihoods for each model contain terms that cannot be evaluated if the fitted values of the probabilities, or their sums in the AC model, reach or exceed unity. This may prevent convergence to a solution. For modelbased simulated data, the simple expedient of requiring evaluation of a term in the log likelihood only if its coefficient is non-zero eliminates almost all problems with convergence. Converged but non-admissible solutions are produced instead. We take advantage of this to study non-admissible solutions in the next section.

Simulation Study
models produce estimates that match their theoretical values on average and without excessive variation, and with estimated standard errors that provide appropriate confidence interval coverage, we repeatedly generated data from each probability model and fitted the ordinal models to each dataset. The simulations involved a single Each of t oduced estimates with minimal bias and mean squared error on average, and with 95 percent confidence interval coverage that was close to the nominal 95 percent. The summary model fit statistics were similar for each model, and the minor differences did not impart a clear advantage for any single model. Type 1 error rates were close to 5 percent but, particularly for the CR probability model, with a tendency to reject slightly too often. At larger values of the uniform covariate, for which the theoretical values from three probability models are most different, power to reject an incorrect model was moder-ate for each of the AC probability model and the PP model, and higher for the CR probability model. Bias and mean squared error were smaller, error rates on average were closer to 5 percent, and power in each setting was greater, in the larger datasets   500 n  than in the smaller datasets   200 n  . The com preceding ments refer to solutions for which th

Application to Real Data
on 189 births to e fitted values are admissible for a probability model. In non-admissible solutions that were deliberately engineered by allowing the uniform covariate to take large values, model fit was less satisfactory because bias was relatively high and confidence interval coverage was greatly reduced, and the tendency for each model to reject too often a correctly fitted model was more pronounced. tegories, we ch J = 4 categorizations of birth weight (2500 g, 2501 -3000 g, 3001 -3500 g and >3500 g). The principal study factor for this analysis is maternal smoking status during pregnancy. The data on birth weight category and maternal smoking status are shown in Table 4.
In respect of the ordering of outcome ca ose to retain the natural ordering of the birth weight categories from lightest   1 j  to heaviest   4 j  rather than reverse them as er and Lemesh did, and to fit forwards-descending models with the first category as the identified category. For log-link models, these decisions are important because reversing the ordering of the outcome categories and/or modeling them in ascending order results in different models of the data.
In respect of choice of modeling approach, the CR Hosm ow [6] probability model assumes the birth weight categories are the outcomes of a sequence of binary events when they arose from arbitrary categorization of an underlying continuous response variable. The AC and PP models provide poorer descriptions of the data in Table 4, however.
The relative risk estimates are  2 1.13 RR  ,  3 0.91 RR  and  4 0.49 RR  for the 2501 -3000 g, 3001 -3500 g and >3500 g categories respectively. They suggest that the impact of maternal smoking is restricted mainly to the highest birth weight category, but the AC probability model requires the response to covariates to increase monotonically and multiplicatively with birth weight category whilst the PP model constrains it to be identical in each outcome category.
The results of fitting the three models to the birth weight data are summarized in Table 5. Estimation results are provided for a binary (0 = no, 1 = yes) covariate for maternal smoking (smoker). The common relative risk increase is  0.88 AC RR  , the summary relative conditional risk is  0.81 CR RR  , and the summary relative cumulative risk is  0.
PP RR  80 . Also shown in Table 5 are the results of adjusting for confounding factors including maternal age (age). In these models, only the coefficients of smoker are constrained. The adjustment has increased the estimated summary effect of maternal smoking in each model.   Adjacent categories probability ded covariates are linear predictors for maternal the mother at the last enotes P < 0.05; † model; ‡ Ad age (age) and weight of menstrual period (lwt), two binary indicators for race (1 = white, 2 = black, 3 = other), and binary indicators for presence of uterine irritability (ui) and history of premature labour (ptd); § Continuation-ratio probability model; ¶ Proportional probability model. the pronounced impact of maternal smoking in the high-advantage of doing so in simulated data is that the estiest birth weight category, the AC probability model estimate is less extreme that those of the CR model, and the CR probability model estimate is less extreme than that of the PP model. To simplify matters for demonstration purposes, non-linearity in the scale of the continuous covariates (particularly age) and statistical interaction between covariates (including between smoker and age) have been ignored. The resultant models had 6 or more covariates including continuous variables, and produced converged and admissible solutions. Non-convergence was an occasional problem only, and occurred with very complex models that involved higher-powered polynomials of continuous variables and/or product terms for statistical interaction.
The likelihood ratio tests reveal that the constraints imposed to produce the summary estimates have resulted in substantially greater loss of model fit for the AC and PP models than for the CR model. If the subset   189 n  of data analyzed here is representative of all d ted in the birth weight study, the loss of model fit would be statistically significant if the full dataset was two (AC probability model), three (PP model) or 20 (CR probability model) times larger in size.

ata collec
In this paper, we proposed log-link alternatives to three logistic regression models for ordinal data, and demonstrated that it is possible to estimate these models with software that allows the user to specify the log likelihood as the objective function to be maximized and to impose constraints on the parameter estimates. In example data with a dichotomous covariate, the unconstrained models produced relative risk estimates and estimated standard errors that could be verified from the data, and the constrained estimates were credible approximations to the unconstrained estimates. Models with a single continuous covariate performed well in data simulations, with low bias and mean squared error on average and appropriate confidence interval coverage in admissible solutions. Finally, the models were successfully fitted to a complex real-world dataset with an ordinal outcome for which one of the models provided a questionable theoretical explanation but a superior practical description.
When the model-based probabilities were high, issues in fitting the ordinal response log-link regression models arose because the fitted probabilities are not restricted to values less than unity. The estimation algorithm may fail to converge or, if it does converge, the fitted values may not be admissible for a probability model. It is possible to write the algorithm in a manner that makes non-convergence a rare occurrence in model-based simulated data (less than 1 per 10,000 replications in our settings). The mation algorithm converges to a non-admissible solution, and this makes it feasible to study them. We found that the estimated coefficients of the inadmissible solutions had much greater bias on average, with estimated standard errors that were too small, and 95% confidence interval coverage that was poorer in consequence. These findings caution against relying on the results of nonadmissible solutions.
Each of the three ordinal response log-link models we considered is a constrained version of the log multinomial model. The constraints impose a simple linear relation on the slope coefficients, and it is possible to test whether the unconstrained model provides a better description of the data. We found that likelihood ratio, score and Wald tests of the constraints each had close to the correct type I error rates though they tended to be slightly conservative. The power of tests for each model to reject fits to data from any of the other ordinal response log-link models increased with the size of the maximum model-based probability, and was moderate to high when the maximum model-based probability was high.
In our estimation procedure, the user has flexibility to place linear constraints on the coefficients of all, or some, of the covariates. This makes it possible to fit "partiallyconstrained" models in which the linear constraint is applied to a subset only of the explanatory variables [21]. In the analyses of the birth weight data, which focused on the relationship between maternal smoking and birth weight outcome, it served no purpose to place restrictions on the other covariates, and to have constrained the coefficients of maternal age would have significantly reduced model fit.
The ordered birth weight data provided opportunity to consider the choice of model, investigate the merits of the fitted models, and probe the interpretation of the estimates in a real world setting. The modeling assumptions underlying the CR probability model were untenable for an outcome variable that arose from categorization of birth weight, because the birth weight categories did not truly arise as a hierarchy of binary events, but otherwise there was much to recommend this model. The constraint imposed in fitting the CR probability model produced far less loss of model fit than the constraints imposed in the other two models, and the summary estimate-a 24 percent reduction in the conditional probability of higher birth weight outcomes for the infants of mothers who smoked during pregnancy-was interpretable and plausible. Reasonably in the circumstances, it fell between the estimates from the AC and PP models.
The limitations of this work should be kept in mind. Firstly, the results are based on a simple data example for a dichotomous covariate, a single set of data simulations