Properties of the Maximum Likelihood Estimates and Bias Reduction for Logistic Regression Model ()
1. Introduction
The logistic regression methods are often used to interpret the statistical analysis of dichotomous outcome variables. It is commonly applied procedure for describing the relationship between a binary outcome variable. The general method of estimating the logistic regression parameter is maximum likelihood (ML). In a very general sense the ML method yields values for the unknown parameters that maximize the probability of the observed set of data. The commonly problem with using ML method is convergence problem, which occurs when the maximum likelihood estimates (MLE) do not exist. The subject of the assessment behaviour of MLE for logistic regression model is important, as the logistic model is widely used in medical statistics. Much work discusses on logistic regression model address converges problem like [1] or the bias reduction like [2] [3] . Many assumptions and more details considered about the distribution of the coefficients estimated by MLE approach and bias reduction technique, and also for more application and effects of the sample size, see [4] [5] . However, the behavior and properties of bias correction methods are less investigated. A recent paper takes the bias correction technique proposed by [2] to achieve the MLE existing. In the present paper, it centers to evaluate the behavior and properties of the bias reduction method by simulated data with different sample sizes and parameters. The next section, explains the shape and fits the logistic regression model. Section 3 discusses clearly the ML convergence problem. Application on modified score function in logistic regression model will discuss in Section 4 and it illustrates special case of modified function to give two equations that are used to estimate the parameters. Section 5 investi- gates the asymptotic properties for logistic regression model with making compression between estimated parameters with ML method and reduction technique by simulated data. The discussion, conclusion and some general remarks about the results are in Section 6.
2. The Logistic Regression Model
The goal of a logistic regression analysis is to find the best fitting model to describe the relationship between an outcome and covariates where the outcome is dichotomous. [6] considered the logistic regression model is a member of the class of the generalized linear models. For more details of logistic model see [7] [8] [9] also [10] [11] [12] .
The Model
Suppose now
where
is a response variable. Suppose that
and
are related to a collection of covariates
according to the equation
(1)
We consider the special case
so
where
is the probability of success for each
. We also define
so that
(2)
and
(3)
Here
is called the logit link function and
is the linear predictor.
There are some other link functions which can also be used, instead of the logit link function such as the probit link function
and the complementary log-log link function
Fitting The Model
The logistic model when
with
can be fitted using the method of maximum likelihood to estimate the parameters. The first step is to construct the likelihood function which is a function of the unknown parameters. we choose those values of the parameters that maximize this function. The probability function of the model is
(4)
where the likelihood function is
(5)
Since the observations are independent, the likelihood function is as follows:
(6)
The maximum likelihood estimate of
is the value which maximizes the likelihood function. In general the log likelihood function is easier to work with mathematically and is:
(7)
2.1. Special Case of the Logistic Model with Two Covariates
In this case the logistic regression model with two covariates, thus,
, with one the general mean. So, we have
and
, such that
(8)
where
is now a scalar covariate and
(9)
Therefore we can write the log-likelihood function as:
(10)
To estimate the values of
and
we differentiate
in terms of
and
respectively as:
(11)
(12)
Now we set
and
and so the maximum likelihood estimates
of
and
are the solution of the following equations
(13)
and
(14)
and will be denoted as
and
. We know that for the logistic regression the last two equations are non linear in
and
, and we need to use a numerical method for their solution, such as Newton-Raphson method.
2.2. The Asymptotic Distribution of the (MLE)
The estimated parameters
, have an asymptotic distribution which is given by
where
is Fisher’s information matrix defined as
(15)
where the matrix is evaluated at the MLE. For the logistic regression the estimated Fisher Information matrix can be writen as
(16)
where
and
. The variance of
is approximated
defined by
.
3. Maximum Likelihood Convergence Problems
A problem occurs in estimating logistic regression models when the maximum likelihood estimates do not exist and one or more components of
are infinite. The one case of the occurrence of this problem is when all of the observations have the same response. For example, suppose that
and that all of the response variables equal zero i.e.,
. In this case the log-likelihood function is
(17)
Now differentiating
in terms of
and
respectively and setting equal to zero gives
(18)
and
(19)
The first equation has no solution because it is the sum of positive quantities and so cannot be equal to zero and satisfy the equation. To make this equation equal to zero we need to make
larger and negative i.e. tend to
. However, if precisely one of the response variable equal 1, the result maximum likelihood equation become
(20)
(21)
where we have assumed the numbers such that
. Here the maximum likelihood estimates is exist and the convergence of the MLE is achieved. Because the two previous equations are sum of positive quantities equal positive values. So as in first equation, if parameter is large and positive, then the sum is larger than one as well as if it is large and negative, then the sum is smaller than one and will not satisfy the equation, then we can find finite estimate of parameters which satisfy the equation.
4. Modified Score Function
Firth [2] proposed a method to reduce bias in MLE. The maximum likelihood convergence problem does not exist with the modified score function. The idea that extend and focus on two standard approaches have been extensively studied in the literature. The computationally-intensive jackknife method proposed by [13] [14] . The second approach simply substitutes
for the unknown
in
. The point that discussed in case of small size sets of data, it is not
uncommon that
is infinite in some samples of logistic regression models [15] [16] . We know that the maximum likelihood approach is dependent on the derivative the log-likelihood function as a solution to the score equation
(22)
[2] proposed that instead, we solve
, where the appropriate modification to
is:
(23)
and the expected value of
proposed by [3] , is given by:
(24)
where
The variance of
is approximated defined by
.
4.1. Modified Function with Logistic Regression Model
In this part we will apply the modified score function to simple logistic regression model. We know that the
bias vector given in the form
which proposed by [17] . Here
has ith element
and
is the ith diagonal element of the hat matrix
where
and
is the design matrix. Then, the modified score function is written as
(25)
In this case, the modified score function
gives two equations
(26)
and
(27)
These are used to estimate the parameters.
4.2. Special Case of Modified Function
For more evaluation, we will discuss the behaviour of the adjusted score function when all the observation have the same response i.e.
. As a special case, suppose we have one explanatory variable
taking values 0 or 1. Before we calculate the adjusted score function, first calculate the form of
which we obtain from
. Here,
is the diagonal element of the
matrix and is
(28)
where
,
,
and
, where
and
are the number of observations of x equal to 0 and 1 respectively. Hence
(29)
and
(30)
Therefore, when we set the adjusted score function
with
we have
(31)
This gives
(32)
and
(33)
Now,
(34)
and so
(35)
we get
(36)
Before calculate
and
we can consider the following way to calculate
and
. Let
and
. Then,
,
and
, so, we can write
as
(37)
Therefore,
and
can be written as
(38)
and
(39)
Then, we obtain
(40)
and
(41)
As a result of this example with
when
, we can say that, the estimate of parameters are finite. The modified function works well and the problem of convergence does not exist.
5. Simulation Study
The follow discussion are the simulation plan and the designs used in generating the data to identify the effect of sample size and proportion of events (the percentage of
or
) on estimation of parameters. We will examine the precision of the estimation by calculating the variance of parameters obtained by simulation for the two approaches, MLE and Firth, and compare those with
evaluated at the known values of
. The simulation study is designed as follows:
1) Thre sample sizes have been used
,
and
.
2) For each sample size we choose
as a draw from
. The x variables are fixed at these values throughout the simulation.
3) We choose
and
to give three cases. Choose
and adjust
so that over the covariates
is approximately (a) 0.5, (b) 0.1, (c) 0.05.
4) For each sample size and set of parameter values we perform 100,000 simulation.
5) Two approaches are used to estimate the parameters, MLE and the bias- reduced estimator Firth.
5.1. Results and Discussion of Sample Size n = 500
The simulation reported the accuracy of the estimation of
using the information matrix. We calculate
and
for the simulated values of
,
and also by evaluating
at the known values of
. The results in the Table 1, which shows the three cases of the proportion of
, achieved the convergence of likelihood maximization alogrithm.
As can be seen in Table 1,
Sim and
Sim are the variance of the parameters estimated by MLE and Firth method respectively. Ratio L and Ratio F denote the ratio of the variance estimated by MLE and Firth’s method, respectively. The results showed that, both the variance of the parameters calculated from the simulation and the variance calculated by evaluating the information matrix at the known values of
are almost the same. We note that the ratio in the first case when
is 0.5 appeared nearly close to one but in the second case and the last case the ratio appeared slightly larger than in the first case.
The variance of parameters calculated by Firth’s method were smaller than when calculated by MLE and the ratio in general was close to 1. Moreover, the bias (
) was smaller.
5.2. Results and Discussion of Sample Size n = 120
In this part using the same way used in the previous case when
. The
(a) The variance of the parameters estimated by MLE and Firth with (0.5, 0.1, 0.05) propotion of y = 1.
(b)The bias value with (0.5, 0.1, 0.05) propotion of y = 1.
Table 1. Results of 100,000 simulations with sample size n = 500 and (0.5, 0.1, 0.05) propotion of y = 1.
results of simulation are shown in Table 2. Maximum likelihood convergence problems occurred (when
). Note that, there are many situations in which the likelihood function has no maximum, in which case we say that the maximum likelihood estimate does not exist. Consider the simulation which generating the data set 100,000 times, in some cases the coefficients reach to infinite in the final iterations and so, we have not results of the estimation
and
, that result in at which point the algorithm has not converged. In our simulation we consider the cases that not achieved the converges algorithm.
Here for only 99,806 (99%) of the data sets was it possible to obtain finite estimates of
and
converged. Moreover, the variance of the parameters
and
is large. This is because even though convergence is achieved when
, There are some very large negative values of
. In the other two cases of
we achieved ML convergence in every simulation. We note that the ratio is nearly one but is a bit high when compared with case of
. Firth’s approach showed reasonable results, all cases achieved the maximum likelihood convergence. Moreover, the ratio was better than MLE approach as well as the bias
.
5.3. Results and Discussion of Sample Size n = 40
We used the same analysis as in the previous cases with
. As can be seen in Table 3, the results showed that, MLE approach had convergence problems,
(a) The variance of the parameters estimated by MLE and Firth with (0.5, 0.1, 0.05) propotion of y = 1.
(b) The bias value with (0.5, 0.1, 0.05) propotion of y = 1.
Table 2. Results of 100,000 simulations with sample size n = 120 and (0.5, 0.1, 0.05) propotion of y = 1.
(a) The variance of the parameters estimated by MLE and Firth with (0.5, 0.1, 0.05) propotion of y = 1.
(b) The bias value with (0.5, 0.1, 0.05) propotion of y = 1.
Table 3. Results of 100,000 simulations with sample size n = 40 and (0.5, 0.1, 0.05) propotion of y = 1.
98,273 (98%) and 85,967 (86%) of data sets achieved ML convergence when
was (0.1, 0.05), respectively. Convergence was only achieved in every simulation in the case of
, where the ratio was nearly close to one, but is a bit high from previous cases. Moreover, we found the same problem as discussed in the case of
, in that the variance of the parameters
and
is large. However, when we use Firth’s approach, all data sets achieved M.L convergence. Moreover, the ratio was better than M.L.E approach as well as the bias
being smaller.
6. Conclusion
Attention has been directed in this work to determine the behaviour of the asymptotic estimation of parameters by two methods―MLE and bias reduction technique compared with the result of the information matrix. In fact in regular convergence problem the modified score function appeared appropriate behaviour, which denoted that the bias form may be removed from the MLE by reduction bias term. The asymptotic variance of the MLE may be appeared as strange behaviour, and the results shown variance of the parameters were large in some cases, even though convergence is achieved. It is denoted that there are some very large negative values of
, as shown in results section. We can report that the small sample size and the value of
have an effect on behaviour estimation of parameters when using MLE. Clearly, we found conver- gence problem for some combinations of sample size and
. The approach of Firth appeared a moderate results that the data sets in all cases of sample size and
achieved ML convergence. Overall, we can consider the bias reduction technique is worked well and has a moderate behaviour almost with all cases which have been investigated. Moreover, the convergence problem is not only effective on behaviour of the MLE, and although the convergence is achieved, the variance of the parameters estimates appeared large value.