1. Introduction
Logistic regression is a method used in statistics for modeling the relationship between a binary dependent variable and one or more explanatory variables. It is usually utilized in various fields. Often, the estimation method for parameters of a statistical model uses the maximum likelihood estimator by optimizing the likelihood function. In logistic regression models, MLE is utilized to estimate the coefficients of the predictor variables that most effectively fit the data. Unfortunately, this approach is not robust to unusual data findings. Several robust estimators have been presented as an alternative to MLE to address this issue. The MLE in logistic regression was shown to be very sensitive to outlying data by [1] , who also devised a diagnostic assessment of outlying observations; see also [2] . For binary regression, [3] studied several M-estimators, their estimations depending on leverage down weight, a Mallows-type estimator. [4] created a robust estimate for the logistic regression model based on a modified median estimator, and they also investigated a Wald-type test statistic. [5] created incredibly robust projection estimators for the GLM, but their calculation is quite difficult. [6] proposed a quasi-likelihood estimator by substituting the least absolute deviation estimator (L1 norm) for the least squares estimator (L2 norm) in the definition of quasi-likelihood. [7] suggested resilient estimators and testing techniques for Poisson and binomial models depending on the idea of a quasi-likelihood estimator created by [8] . The breakdown of the MLE in the logistic model was investigated in [9] , whereas [10] offered a method for the logistic regression model. [11] created a new robust technique for Poisson regression. [12] provided reliable estimators for generalized linear models; the fundamental concept is to change the response via a variance stabilizing transformation before using an estimation. [13] indicated that was a very consistent and reliable estimator. [14] provides a reliable and efficient approach for computing the M-estimator described in [13] . [15] presented a quick technique for the generalized linear model based on breakdown points of the trimmed likelihood. Fisher-consistent estimators are another kind of robust estimator, as introduced in [13] . [16] investigated a resistant robust estimator and this estimation relied on the misclassification model. [17] presented a new family of robust methods for logistic regression. [18] compared minimal distance approaches to more robust methods discovered a unique weighted likelihood and used it in Poisson and binary regression. The ideally bounded scoring functions described by [19] for linear models were applied to the logistic model in [20] .
All of these estimators possess significant disparities in their resistance to outliers and effectiveness within the framework of the model. In this paper, we have conducted a comprehensive investigation into the behavior of some of these estimators, both in terms of their asymptotic properties and their behavior in finite samples. Our findings indicate that the Mallows-type estimator proposed by [3] is very robust to outlier contamination but inefficient under the model, while Schweppe-type estimators proposed by [2] are very efficient under the model but show a poor outlier resistance. In this paper, we propose an estimator that can be as robust as Mallows-estimators under contamination but is much more efficient under the model, this is achieved by an adaptive continuous weight. This continuous weighted maximum likelihood estimator depends on the annoyance parameter estimator as a result of Kolmogorov-Smirnov statistics. The maximum likelihood estimators and a logistic regression model are covered in Section 2. Robust Methods for Logistic Regression Model discussed in section 3. We propose a robust technique for logistic regression in section 4. Section 5 displays the findings of the Monte Carlo simulation research and real data analysis. Section 6 contains the conclusions.
2. Logistic Regression Model and ML Estimator
The logistic regression model is a popular technique used to examine the relationship between a categorical variable and one or more predictor variables. It is often used in binary classification issues when there are only two potential outcomes for the dependent variable (e.g., true or false, yes or no, etc.). The logistic function is the foundation of the logistic regression model, which transforms a continuous input variable into a probability value between 0 and 1. Assume a random sample of observations
, where
represents a p predictor variables and
is a binary variable and suppose the probability of positive response
is associated with the covariates through the relation
, such that
is known as the logit link function converts the covariate values in the range
.
Using the logit link function The multiple logistic regression model may be created by:
(1)
where
are the predictor variables values and
represents an unknown parameter vector. We may characterize the binary regression model as follows:
where
is a linear predictor that is sometimes referred to as a transformation function, where
. The MLE is a method for parameter estimation
of a statistical model by maximization of the likelihood function. The MLE assumes that the data are generated by a specific probability distribution (in this case, the logistic distribution) and finds the parameter values that make it most likely for that distribution to have generated those data. The MLE is often used in logistic regression because it provides unbiased estimates of model parameters and has good statistical properties. Assume that the dependent variables
have a Bernoulli distribution and we can derive the probability distribution for the ith observation as follows:
as well as each observation
takes the value 1 with probability
or the value 0 with probability (
). The probability function is defined as follows
(2)
Then we compute the log-likelihood of the preceding formula:
where
and
. So, we can express the log-likelihood as:
(3)
In experimental design, we execute repeated observations at each level of independent variables (x). Let
be the trials number at each predictor level and
be the number of 1’s observed at the ith observations with
. Therefore, we may express the log-likelihood as:
(4)
nevertheless, it is possible to maximize the likelihood function by differentiating it with respect to
:
where,
, we have
where
denotes the average of the binomial variable, the preceding formula may be written in matrix form as
, where
Hence, the MLE is normally computed by resolving the scoring equation:
(5)
Since Equation (5) is a nonlinear function of beta, the iteratively weighted least squares (IWLS) technique may be used to find a solution.
3. Robust Methods for Logistic Regression Model
In logistic regression models, robust estimators are statistical methods for estimating parameters that are less sensitive to outliers and influential observations. These techniques are designed to provide reliable estimates of the regression coefficients even when the data contain extreme values or other anomalies that can distort the results. There are several robust estimators that can be used in logistic regression. [13] suggested robust estimators via the management of deviations to get theoretically unbiased estimators; however, an extra bias-correction component must be introduced, which makes the calculation of their estimator very complex and the estimator itself not straightforward. [2] developed Mallows-type estimators by separately manipulating the variables and residuals in the estimation equation. In the instance of logistic regression, they were categorized as solutions of
(6)
where
represents the annoyance parameters (location and scatter estimate of covariates), and
is often considered as Huber’s function
,
is the bias-correction function expressed as
the weights
often depend only on continuous variables. Suppose we write
, where
are qualitative variables and
are the continuous variables, thereafter, the weights usually take the form
, with
a function that does not increase,
is the robust estimator of the location of the
,
is the robust estimator of scatter of the
and t is the threshold value (usually
for some
). The initial robust estimator of location and scale for predictor variables
and
can be calculated utilizing the minimum covariance determinant methods. MCD was one of the first multivariate location and scatter estimators that were both affine equivariant and relatively robust. It discovers the
observations
whose matrix of traditional covariance
has the smallest possible determinant, where
the average of those h points. As observed, the residual weight
and covariate weight w are independent; moreover, it will decrease the effectiveness of the resultant estimators due to the fact that the estimation equation will downweight well-fitted observations with extreme variables. [21] robustly presented a family of resilient adaptive weighted maximum likelihood techniques for logistic regression models. These adaptive weights are dependent on adaptive cut-off thresholds to regulate observations with extreme covariables. They demonstrated that estimators based on adaptive thresholds are more efficient than estimators based on non-adaptive thresholds for clean models and have equivalent resilience for polluted models.
The lack of dependence between the weights assigned to covariates and the weights assigned to deviances in Equation (6) is the underlying cause for the generally lower efficiency of Mallows-type estimators compared to Schweppetype estimators. This occurs due to the downweighting of observations with extreme covariates, even if they exhibit good fit. It is evident that enhancing the efficiency of Mallows-type estimators can be achieved by reducing the thresholding proportions, although this may compromise the robustness of the estimator. In order to simultaneously achieve both high efficiency and high robustness, it becomes necessary to employ adaptive thresholds, as detailed in the next section.
4. Weighted Maximum Likelihood Technique (WMLT)
In this section, we build a novel class of continuous weighted maximum likelihood estimators based on the annoyance parameter estimator as a function of the Kolmogorov-Smirnov statistics. We’ll refer to these estimators as WMLT (weighted maximum likelihood technique). First build two estimators
and
that are the initial estimates of location and scatter of the predictor variables
, thereafter, calculate the Mahalanobis distances squared of
that
is characterized by
. Furthermore, the empirical distribution function of
may be expressed as:
when
have a normal distribution,
converge to
(
distribution function). Then we can estimate the outliers proportion in covariates by [21]
where
represents the positive part,
measures the length of the tail (
) is an acceptable option, and
. When
is large for a large t, it indicates that the sample contains outliers. Hence, an adaptive threshold may be described as
[21] proposed the adaptive threshold estimators. Specifically, they developed an estimator of the Mallows-type estimator with weights
which are basically weighted maximum likelihood estimators. The proposed weight function may be defined as
we assume m is a completely non-increasing continuous mapping from
to
such that
,
, and the first derivative is bounded with
. Determine an objective function
(7)
where
is the location, scale, and goodness of fit of the explanatory variables respectively, an adaptive weight function
and
, with
,
is
real matrix and
. Finally, we define our adaptive estimator
of
as the solution to the estimating equation
where
is a consistent estimator of
.
If
are normally distributed, then
and the reweighted estimators are asymptotically equivalent to the sample mean and the sample covariance matrix, and therefore fully efficient. This efficient carries over to the adaptive Mallows-type estimators, as shown in the next subsection.
4.1. Asymptotic of Proposed Method
The estimating equation (6) can be written as
, with
. Under appropriate regularity conditions, the classical asymptotic of M-estimators hold, see [22] . Let
be the model parameter and
denote expectation under the model; define
, with
. If
, where D denote the differential and
, then
This result is valid for non-adaptive and adaptive weights alike, as long as
converges to
in probability. For the weighted maximum likelihood estimator given by
,
and
have simple expressions
(8)
(9)
Using the asymptotic normality of
it is possible to construct confidence ellipsoids for
. First, we estimate the matrices (9) and (6) with
Then the estimated asymptotic variance of
is
. The asymptotic confidence ellipsoid of level
for
is given
. This can be generalized to linear transformations of
.
4.2. Asymptotic Properties of WMLT
This subsection focuses on the asymptotic features of the suggested estimator
described in the preceding section. We will show that the estimator is asymptotically consistent based on some general assumptions about the moments of predictor variables. Suppose that
,
and
are the actual values of
,
and
respectively, and the independent sample
follows the logistic model
,
. We define functions
and
where
is the calculated by Equation (7) with
replaced with
and p indicates the joint probability distribution of the (x,y)’s. Theorem 4.2 establishes that
is consistent, which makes use of the conclusions of Lemma 1, 2, and 3 mentioned below. To prove the Lemmas and the Theorems, the following assumptions must be made:
B1:
and
.
B2:
is nonsingular.
B3:
.
B4: The
is a continuous weight function and has a first derivative that is bounded,
and
.
B1 is met for the vast majority of well-known initial robust estimators, such as the Minimum Covariance Determinant used in the simulation experiments. In the following lemmas and theorem, the asymptotic characteristics are assumed to be as
. The proof for Lemma 1 is given in [21] .
Lemma 1. In case B1 is holds, then
.
Lemma 2. In case B2 is holds, then
implies
for any sequence
.
Proof of Lemma 2. Consider the following
If
, then
. Since
, we may see from the equality above that
. Note
where
and
is the initial derivation for logistic link,
. We also have
,
.
Since
is nonsingular, we have
, is shown by the fact that
.
Lemma 3. Suppose that B3 and B4 are true. Then the class
is P-Glivenko-Cantelli for some
, where
.
Proof of Lemma 3. To demonstrate that a class
of vector-valued functions
to be Glivenko-Cantelli, we must display each of the coordinate classes
with
ranging over
is Glivenko-Cantelli.
The class
is a set of measurable functions that are indexed by a bounded subset in
, and
represents a collection of positive semidefinite matrices described in
. This is because
is basically a variance-covariance matrix of continuous predictor variables, hence it is positive semidefinite and symmetric. For the norm, we use
, with
indicates the Euclidean norm of vectors
and for matrix
,
represents the induced norm in general. For two values
,
of
, we have
(10)
But
(11)
where
represents the upper limit of the first derivation of the link function
.
Making use of the Mean Value Theorem, for each
and
, there exists
such that
In addition, we have
(12)
then, we get
(13)
and
(14)
Given that
is a positive semidefinite matrix, as well
. Indicate what the eigenvalues of
are as
. Consequently,
has a set of orthogonal eigenvectors, say
, s.t.
. There exists an orthogonal matrix in matrix format Q s.t.
Then we get
(15)
where
is the spectral radius of
and
applicable to any induced norm. Similarly
is also symmetric, thus indicate the eigenvalues of
as
s.t.
. There exists an orthogonal matrix in matrix format
s.t.
Next, we have
(16)
[23] presented a simple and popular M-estimator that minimizes a “bounded” version of the sum of residuals. The estimating equation is
(17)
Then, using (12), (13), (14), (15) and (17) we get
(18)
Since
, from (11), (10) and (16) now, we can provide a bound for
:
for every
. We have constructed a Lipschitz criterion for each
, so for
we have
, every
,
where
Now investigate the bracketing entropy in relative to the
-norm
Use brackets of the type
for
spanning over a sufficient chosen subset of
and these brackets have
. If
ranges over a grid of mesh width
over
then the brackets
range over
. Based on the Lipschitz condition we get
Hence, many brackets are required as in radial balls
to cover
, alternatively we require less than
cubes with size
to cover parameter space
. If
, then a constant J exists, depends only on
and P, in such a way that the bracketing numbers fulfill
Given that all
are continuous functions, they can be measured. If B3 is fulfilled, then
, and as a result, class
is P-Glivenko-Cantelli from the Theorem (19.4) (Glivenko-Cantelli) in [24] .¨
If B1, B2, B3 and B4 are hold, then estimators
are used as the solving to the equation
converges to
in probability.
Proof of Theorem 1. denote
Note that
. Then we can establish that
, then
since
. Thus, based upon Lemma 2, we have
. To show
, we consider it as follows:
where
Lemma 3 informs us that
is P-Glivenko-Cantelli class, so
. For
,
Using (16), we get
(19)
Since
, we have
,
and
. Also,
, as a result
. Therefore, (19) follows that
. Because B2 and B3 are fulfilled, we also have
, then we get
. For
and the Cauchy-Schwarz inequality was employed in the final inequality. We already know
Then
and
is bounded by 1, using the dominated convergence theorem we get
We also have
and from assumption B3, we obtain
. It then follows that
.¨
5. Assess Robustness of the Estimators
To evaluate the robustness of the methods, two methods have been employed. The first includes simulated models for contrasting the novel technique with the traditional MLE, Mallows-type estimator (Mallows) and (CUBI). In the second, we used the actual leukemia data set and The Erythrocyte Sedimentation Rate Data.
5.1. Simulation study
A Monte Carlo simulation analysis was performed for this subsection to assess the efficiency and robustness of the suggested estimator
. For the initial robust estimators of the scatter and location
and
utilized the minimum covariance determinant (MCD). We calculated the following estimators for comparison: MLE, conditionally unbiased bounded influence (CUBI) of [2] , and the Mallows-type estimator (Mallows) of [3] . In the simulation, the following weight function
was applied:
where
and
.
Three models are involved in the simulation study: a clean logistic regression model, a contaminated model with a 10% contamination rate, and a contaminated model with a 20% contamination rate. First, clean model, the standard normal distribution was used to generate two predictor variables,
and
. Three sample sizes were used: (
) and
. We generate the response variable according to the Bernoulli distribution with a parameter equal to
. The values of the true parameters
are taken
for three models. Second, the percentage of contamination in data equal 10%, and their predicted variables are generated from a normal distribution with (
) and (
). Third, the percentage of contamination in data equals 20% in a similar manner to the above.
The performance of these estimators is evaluated using Bias and mean squared error for various models. Nevertheless, the estimator with the smallest Bias and MSE is the best. Each scenario simulation included over 1000 repetitions. Consequently, for each parameter, the following are the calculations for bias and mean squared error:
and
5.2. Numerical Results
The numerical results, displayed in this paper, are based on simulation studies, and two real data applications. This numerical result is expected to evaluate, the performance of a proposed model. Table 1 shows the bias and mean squared errors of the four estimation techniques for the clean model. The findings show that the bias and MSE of the MLE, Mallows, and CUBIF estimators are relatively similar, while the WMLT estimator performs worse than the other estimators. When the sample size increases, the bias and mean squared errors are observed to decrease. As shown in Table 2, under 10% of the data were contaminated, so the new robust approach WMLT has the greatest overall performance among all comparable estimators for varied sample sizes. Table 3 demonstrates that when 20% of the data are contaminated, our proposed technique (WMLT) outperforms other estimators in terms of bias and mean squared errors. Due to the sensitivity of anomalies, conventional maximum likelihood estimates perform inadequately in the contaminated model. In conclusion, the proposed method outperforms all other methods compared with contaminated data. Furthermore, the new estimator performs reasonably well with clear data.
![]()
Table 1. Bias and mean squared errors of estimators for clean model.
![]()
Table 2. Bias and (MSE) of estimators for second model (10% of data are contaminated).
![]()
Table 3. Bias and (MSE) of estimators for second model (20% of data are contaminated).
5.3. Leukemia Data
This study uses data from [25] which includes information from 33 people who died from acute myeloid leukemia. Each patient was measured for three variables: AG, WBC, and time. The response variable represents the survival time in weeks of the patient; it was converted into a binary variable with (Y = 1) indicating patients whose survival time exceeded 52 weeks and (Y = 0) indicating those who did not. WBC represents the white blood cell concentration of the patient. Whereas AG (present = 1, absent = 0) measured the presence or absence of a morphologic characteristic of white blood cells. The observation number 17 appears to be atypical. Using AG and WBC as predictor variables and binary survival time y as the response variable, a logistic regression model was constructed. The estimators analyzed here are weighted maximum likelihood technique (WMLT), MLE, MLE17 (MLE17 is the maximum likelihood estimator for clean data when observation 17 is excluded.), Mallows (estimator of the Mallows type), and CUBIF (conditionally unbiased bounded-influence function estimator).
Table 4 demonstrates that the MLE is extremely sensitive to influential observations. Furthermore, eliminating observation 17 lowered the impact of WBC to near nil. For the leukemia data, the new estimator (WMLT) demonstrated the greatest performance among all other estimators. However, Mallow’s estimators are reasonably close to the MLE17.
5.4. The Erythrocyte Sedimentation Rate Data
The Erythrocyte Sedimentation Rate (ESR) data. In this data, the primary objective was to determine if the levels of two plasma proteins (Fibrinogen and γ.Globulin) were responsible for the increase in ESR in healthy individuals. The research was conducted by the Institute of Medical Research in Kuala Lumpur, Malaysia, on 32 patients, and the original data was collected by [26] . The response of zero indicates a healthy person, whereas the response of one indicates an unwell person. Here, the continuous variables (FIB and γ.GLO) are compared to the binary response (ESR). In the original ESR data, [27] identified two outliers (cases 13 and 29) in X-space. Cases 14 and 15 are influential observations. Therefore, removing instances 14 and 15 would result in cases with no overlap.
![]()
Table 4. The estimated parameters and standard errors for the leukemia data.
Case 13 was modified so that
in order to execute uncontaminated data analysis. From the uncontaminated data, the ESR data were contaminated where the occurrences (Y = 1) and non-occurrences (Y = 0) were replaced with each other for cases 14 and 15, and this might only leave one out of the three overlapping cases for the ESR data.
Under contaminated ESR data, β0 and se(β0) of all estimators are impacted by outliers as compared to the other parameters (see Table 5). The results shown in Table 5 also indicate that the MLE is primarily influenced by outliers. Following the modification of the tainted data, only one overlapping observation, case 13, remains. This is the reason why the coefficients and standard errors of the WMLT that downweights this observation are so large. Even though the WMLT has the smallest χ2 value, the WMLT estimator should also be taken into consideration. The results shown in Table 5 indicate that the WMLT is a suitable estimator for the ESR data because its estimates are relatively closer to the MLE for uncontaminated data.
6. Conclusions
In this paper, we develop a new robust technique for logistic regression, also known as the weighted maximum likelihood technique (WMLT). The asymptotic consistency of the proposed estimator was demonstrated.
In order to evaluate the robustness of a new technique, we conducted simulation experiments using a variety of scenarios and data sets. Classical maximum likelihood estimates show the lack of robustness in the presence of outliers. Our simulation study for the clean model illustrated that the MLE, Mallows, and CUBIF estimators perform similarly, while the new weighted technique performs less effectively than the other estimators. The simulation study also shows that the WMLT technique outperforms other estimators when dealing with contaminated data and demonstrates the greatest performance among all estimators compared to various scenarios and real data sets. The proposed method (WMLT) can be applied to other generalized linear models (GLMs) and is expected to be superior to existing methods in practical applications. The findings
![]()
Table 5. Estimated coefficients, standard errors, and χ2 for ESR.
of this chapter provide researchers and practitioners with a new approach to developing robust estimators for logistic regression and potentially other generalized linear models (GLMs).
Acknowledgments
We thank the Editor and the referee for their comments. Research of M. Baron is funded by the National Science Foundation grant DMS 1322353. This support is greatly appreciated.