On Expressing the Probabilities of Categorical Responses as Linear Functions of Covariates

Logistic regression is usually used to model probabilities of categorical responses as functions of covariates. However, the link connecting the probabilities to the covariates is non-linear. We show in this paper that when the cross-classification of all the covariates and the dependent variable have no empty cells, then the probabilities of responses can be expressed as linear functions of the covariates. We demonstrate this for both the dichotmous and polytomous dependent variables.


Introduction
The probability of a dichotomous response is usually modelled as functions of covariates using the following: A feature of the above formulation is that the quantity on the right-hand side of the above equation is a fraction, and so the rule that probabilities have to lie in the interval [0, 1] is not violated assuming the estimates of 1 , , , p     exist.In this paper, we are interested in the following questions: under what conditions we can express the probabilities as the following: so that the quantities on the left-hand side of the above equations indeed lie in the interval [0, 1] once the estimates of the unknown parameters are known to be finite.We show in the remaining paper that the above, linear formulation will yield estimates of probabilities lying in [0, 1] if the cross-classification of all the covariates and the dependent variable has no empty cells.In Section 2, we formulate the problem and prove our main result.In Section 3, we work out a detailed example wherein the dependent variable is dichotomous.In Section 4, we work out a detailed example wherein the dependent variable is ordinally polytomous.In Section 5, we present a conjecture regarding the least-squares estimation of the parameters in our model.In Section 6, we end the paper with concluding remarks.

Problem Formulation and the Main Result
Let be a categorical variable with possible values .may be a dichotomous random variable, a nominal polytomous random variable, or an ordinal polytomous random variable.The covariates, , may be categorical or continuous.Let Then we have the following result: Theorem 1: Suppose that the cross-classification of the data 1  ; , ,  j j j y x x  p , , has no empty cells.If the mle's obtained by specifying the likelihood using (1.1) and (1.2) exist, then the estimates of probabilities of the response given the covariates are constrained to lie in the interval (0, 1).
, , ,   L This means that the maximum of over the space is either finitely positive or it is positive infinity.Suppose that the maximum of is finitely positive.Then the maximization of must yield the same parameter values as the maximization of .Let be the parameter estimates obtained by maximizing .Then note that for any the term    cannot be less than or equal to as that would mean 0 that , and hence , is log L undefined.Similarly, for any the term  cannot be greater than or equal to 1 because then again and hence logL, would be undefined.Furthermore, note that   would again be undefined.Thus all the estimates of the probabilities in (1.1) and (1.2) are constrained to lie in the interval (0, 1).

Detailed Example: Dichotomous Response
Consider the data in Table 1.The data comes from a study on coronary artery disease and is reported in [1].
The question of interest is whether gender and electrocardiogram (ECG) measurement have an effect on disease status.
 and 2 ,  and check whether the estimated probabilities lie in the interval   0,1 .We wish to use the Newton-Raphson method for the purpose of estimation.To use the Newton-Raphson method, we need good starting estimates.As starting estimates, we use the estimates provided by least-squares estimation of the following linear model: The least-squares estimates are: , , ˆ0.23563 . We use these as starting estimates of ,   and 2  , respectively.We stop the Newton-Raphson algorithm when the absolute difference of successive iterates is less than for all the three parameters.Using this criterion we notice that the Newton-Raphson algorithm converges and estimates we get are: . Note that we can now witness the effect of the covariates on the disease status.For example, as SEX goes from 0 to 1, the probability of being diseased goes up.Similarly, as ECG status goes from 0 to 1, the probability of being diseased goes up.The estimated probabilities, using our method and the least-squares method, are given in Table 2.
Note that the estimation of probabilities using the leastsquares method is as follows: Notice that all the estimates of probabilities in Table 2 lie in the interval (0, 1).Also notice the striking similarity between the estimates using our method and the corresponding estimates using the least-squares method.However, it seems difficult to prove a least-squares analogue of Theorem 1.Now we turn our attention to goodness of fit.The two traditional goodness-of-fit statistics are Pearson's chisquare and the likelihood ratio chi square, namely, P Q and L Q , respectively.The latter statistic is also known as  3.
The goodness-of-fit statistics thus indicate that the above model fits the data reasonably well.It must be noted that there are sample-size guidelines to be followed in order to ensure that the Pearson's and likelihood-ratio statistics approximately follow the chi-square distribution.These guidelines are mentioned in [1].

Detailed Example: Polytomous Response
Logistic regression is defined in terms of a dichotomous  nse va herefore polytom espon one has to form cumulative logits in case of ordinal response, and generalized logits in the case of a nominal response.Thus, logistic regression is indirectly applied.However, the application of our model is direct in the sense that the possibility of a polytomous response is already accounted for.We illustrate with the following example.
Consider ported in [1] and it concerns an arthritis study wherein males and females were administered either a drug or placebo and their response (improvement) was measured as being one of "marked", "some" or "none".
The data in Table 4 does not meet the requ eorem 1 since there is one zero count in the crossclassification.Since our purpose here is to illustrate our model and estimation of model parameters, we will consider the fictional data set obtained by replacing the zero count with a count of 1.The fictional data is presented in Table 5.
There   Pr To estimate the model parameters, we specify the loglik The goodness-of-fit tests are conducted as in Section 3 except that the number of degrees of freedom for     .The goodness-of-fit statistics and their respective p-values are given in Table 7.
elihood and apply the Newton-Raphson algorithm.Once again, we use least-squares estimates as starting values.Consider the following two linear models: So both Pearson's chi-square and the deviance statistics seem to support model-fit.The response in this example is ordinal, so the question arises whether an analogue of the proportional-odds model can be defined.It can be defined as follows: t from the directly assess the effect of covariates on the probability of improvement.The estimated probabilities are given in Table 6.
Note that, once again, the p 31 0.13885 Note, again, t n 5 , and a 1 h preceding The problem with the above model is that the resulting likelihood is multi-modal, and no good starting estimates for the Newton-Raphson algorithm are available.Indeed, the author found that with some starting estimates, the resulting probabilities lay outside the interval [0, 1].More research is needed on this front.robabilities in Table 6 lie in the interval (0, 1).Also, once again, note the similarity between the estimated probabilities obtained using our method, and the ones obtained using the least-squares method.To take into account the ordinality in the response, read the probabilities across the rows in Table 6.The response levels are correlated with the row probabilities.Note that for any treatment, active or placebo, males perform poorly compared to females.As expected, both males and females respond better to active treatment than placebo in the sense that for both sexes, the probability of some or marked treatment goes up with active treatment.The least-squares estimates of probabilities were obtained as follows: 

A Conjecture Regarding the Least-Squares Estimates
We saw in the preceding examples that the least-squares estimates of probabilities of responses lay in the interval [0, 1] if the cross-classification of the covariates and the responses contained no empty cells.The author believes that this is not a coincidence, but is unable to prove it.So we offer the following conjecture: be the resulting est 1, , , k q   rs obtained usi ima ng ordinary leastsquares.Then the following estimates of probabilities lie in the interval [0, 1]: , , ˆˆˆ, 1, , , and

Concluding Remarks
that probability esti stic reg where th ion is ear.
In this article, we demonstrated mates lying in the interval [0, 1] can be obtained if the probabilities themselves are modelled as linear functions of covariates, provided that the cross-classification of the covariates and the response has no empty cells.The main advantage of this formulation is that effects of covariates on the probabilities can be directly measured, unlike in The emphasis of this article is on estimation.However, hypothesis-testing using the m.l. and least-squares estimates can be done routinely as is discussed extensively in the literature.See, for example, [2,3].Also, the data sets we have considered in this paper are complete.When data are missing at random, one may multiply impute the data sets, say, m times, and then combine the m estimates to yield a single estimate.See [4] for more details.To be honest, our method does have its limitations.For example, when one of the covariates is continuous, there are likely to be several cells in the crossclassification that are empty.Consequently, our method will be usually applicable when the covariates as well as the response are categorical.Another limitation seems to be that the analogue of the proportional-odds model is not straightforward to implement.Also, both maximumlikelihood estimation and least-squares estimation find their utility when the underlying sample sizes are relatively large.For smaller sample sizes, one has to develop exact methods which will be a subject of one of the author's future articles.
logi ression e link funct non-lin

for a ous r se, the following data in Table 4 .
The data is re irements of Th re no zero counts in the cross-classification in T For the present model, there are four subpopulations and three parameters, giving us degree of freedom for each of the Pearson's and likelihood-ratio statistics.The values of 4 3 1   P Q and L Q and the respective p-values are given in Table

Table 6 . Estimates of probabilities.
 

Table 7 . Goodness-of-fit statistics and their respective p alues. - v
q