On the Restricted Almost Unbiased Ridge Estimator in Logistic Regression ()
1. Introduction
Multicollinearity inflates the variance of the maximum likelihood estimator (MLE) in the logistic regression. As a result, one may not obtain an efficient estimate for the parameter in the logistic regression model. To combat the multicollinearity in logistic regression, several alternative techniques have been proposed in the literature. One of the most famous techniques is to consider suitable biased estimators in place of Maximum likelihood estimator. The biased estimators proposed in the literature, are the Ridge Logistic Estimator (RLE) (Schaefer et al., 1984 [1] ), Liu Logistic Estimator (LLE) (Liu, 1993 [2] , Urgan and Tez, 2008 [3] , and Mansson et al., 2012 [4] ), Principal Component Logistic Estimator (PCLE) (Aguilera et al., 2006 [5] ), Modified Logistic Ridge Estimator (MLRE) (Nja et al., 2013 [6] ), Liu-type estimator (Inan and Erdogan, 2013 [7] ), and Almost Unbiased Liu Logistic Estimator (AULLE) (Xinfeng, 2015 [8] ). Morever, Asar (2015) [9] , proposed some new methods to solve the multicollinearity in logistic regression by introducing new methods of estimating the shrinkage parameter in Liu-type estimators. Only the sample information was used in all the above estimation procedures. An alternative technique suggested to solve the multicollinearity problem is to consider parameter estimation with some linear restrictions on the unknown parameters, which are generally based on prior information of the sample data, and further they may be in the exact or stochastic form. By incorporating linear restrictions to the sample information, different types of biased estimators were introduced in the literature, and some researchers have incorporated these estimators with the logistic regression estimator to improve its performance. In the presence of exact linear restrictions in addition to sample logistic regression model, Duffy and Santer (1989) [10] introduced the restricted maximum likelihood estimator (RMLE) by incorporating the restricted least squares estimator based on exact linear restriction to the logistic regression. Later, the Restricted Logistic Ridge Estimator (Asar et al., 2016 [11] ), Restricted Logistic Liu Estimator (RLLE) (Şiray et al., 2015 [12] ), Modified Restricted Liu Estimator (Wu, 2016 [13] ), Restricted two parameter Liu type estimator (Asar et al., 2016 [14] ) were introduced to the logistic regression with exact linear restrictions. In the presence of stochastic linear restrictions in addition to sample logistic regression model, Nagarajah and Wijekoon (2015) introduced the Stochastic Restricted Maximum Likelihood Estimator (SRMLE). Following Nagarajah and Wijekoon (2015) [15] , the Stochastic Restricted Ridge Maximum Likelihood Estimator (SRRMLE) was proposed by Varathan and Wijekoon (2016) [16] by incorporating Ridge Logistic Estimator (RLE) with the SRMLE.
Wu and Asar (2016) [17] has proposed a new biased estimator called Almost Unbiased Ridge Logistic Estimator (AURLE), and shown its performance over the other available estimators. In this article, we further improve the logistic regression estimator by combining AURLE with RMLE, and name it as the Restricted Almost Unbiased Ridge Logistic Estimator (RAURLE). Further, the performance of RAURLE based on estimated ridge parameters using different methods given in the literature was considered, and compared each of these cases with MLE, RLE, AURLE and RMLE. The proceeding sections of the article are organized as follows. The model specification and estimation are discussed in Section 2. The proposed estimator and its asymptotic properties are given in Section 3. Section 4 describes the existing methods related to some ridge parameters. In Section 5, the performance of the proposed estimator by considering different ridge parameters is compared with respect to the scalar mean squared error (SMSE) with MLE, RLE, AURLE and RMLE by performing a Monte Carlo simulation study. Finally, conclusions of the study are presented in Section 6.
2. Model Specification and Estimation
Consider the following logistic regression model
(1)
which follows Bernoulli distribution with parameter as
(2)
where is the ith row of X, which is an data matrix with p predictor variables and is a vector of coefficients, are independent with mean zero and variance of the response. The maximum likelihood estimator (MLE) of can be obtained as follows:
(3)
where; Z is the column vector with ith element equals
and, which is an unbiased estimate of
. The covariance matrix of is
(4)
In the presence of multicollinearity, Schaefer et al. (1984) [1] proposed to incorporate the Logistic Ridge Estimator (LRE), in place of the MLE in the logistic regression model (1)
(5)
where and k is the ridge parameter,.
The asymptotic properties of LRE:
(6)
(7)
However the LRE is a biased estimator which produces inconsistent estimates for the parameter (Wu and Asar, 2016 [17] ). Consequently, the Almost Unbiased Ridge Logistic Estimator (AURLE) was introduced by Wu and Asar (2016) [17] and it is defined as
(8)
where.
And the asymptotic properties of AURLE:
(9)
(10)
As another remedial action for multicollinearity, one may use the exact linear restrictions in addition to the sample logistic regression model (1). The resulting esti- mator is called as Restricted estimator.
Suppose that the following exact restriction is given in addition to the general logistic regression model (1).
(11)
where H is a known matrix and h is an vector of known con- stants.
In the presence of the above restriction (11) in addition to the logistic regression model (1), Duffy and Santner (1989) [10] proposed the following Restricted Maximum Likelihood Estimator (RMLE).
(12)
The asymptotic mean and variance of are
(13)
and
(14)
Consequently the bias of,
(15)
3. The Proposed Estimator
To improve the performance of the estimators further, in this section, by combining AURLE and RMLE, we propose a new estimator which is called as the Restricted Almost Unbiased Ridge Logistic Estimator (RAURLE) and defined as
(16)
where. Note that this estimator is based on the ridge para- meter k, and its performance is based on the choice of k.
The asymptotic properties of are
(17)
(18)
and
(19)
Consequently, the mean square error can be obtained as,
(20)
4. Some Ridge Estimators
Now we consider the existing methods to obtain an estimated value for the ridge parameter k, since RAURLE depends on k. Many researchers suggested various methods of estimating the ridge parameter in the ridge regression approach and recently this estimation method is added to the logistic regression. In this research, we have considered the following existing ridge parameter estimation methods to compare the performance of the proposed estimator with some existing estimators in logistic regression.
1) Hoerl and Kennard (1970) [18] ;
(21)
where is the maximum element of, is the eigen vector of.
2) Hoerl et al. (1975) [19] ;
(22)
where p is the number of predictor variables in the model (1).
3) Lawless and Wang (1976) [20] ;
(23)
4) Lindley and Smith (1972) [21] ;
(24)
5) Schaefer et al. (1984) [1] ;
(25)
5. Simulation Study
It is difficult to compare the mean square error of the estimators theoretically, since none of the estimators MLE, RLE, AURLE, RMLE and RAURLE are not always superior. So, we use Monte Carlo simulation to examine the performance of the proposed estimator over the existing estimators under different levels of multicolli- nearity. Following McDonald and Galarneau (1975) [22] and Kibria (2003) [23] , the explanatory variables are generated using the following equation.
(26)
where are independent pseudo standard normal random numbers and repre- sents the correlation between any two explanatory variables. The n observations for the response variable are obtained from the Bernoulli () distribution in (1). Four explana- tory variables are generated using (26) and four different values of corresponding to 0.80, 0.90, 0.95 and 0.99 are considered. Further for the sample size n, three different values 25, 60, and 100 are also considered. The parameter values of are chosen so that and, which is common restrictions in many simulation studies. Further for the ridge parameter k, five different choices are used as defined in the Equations (21)-(25). The simulation is repeated 2000 times by generating new pseudo-random numbers and the simulated SMSE values of the estimators are obtained using the following equation.
(27)
where is any estimator considered in the rth simulation. The simulation results are given in Tables 1-3. It can be noticed from the Tables 1-3 that the scalar mean square error of the proposed estimator RAURLE is smaller compared to MLE, RLE, AURLE and RMLE, with respect to all the selected values of n, r, and k, considered in this research. Further, the new estimator RAURLE has better performance when is used.
6. Concluding Remarks
In this paper, we proposed a restricted almost unbiased ridge logistic estimator (RAURLE) in logistic regression with exact linear restrictions when the explanatory variables are highly correlated. Through a Monte Carlo simulation study, we examined
Table 1. The estimated SMSE values for different k, when.
Table 2. The estimated SMSE values for different k, when.
Table 3. The estimated SMSE values for different k, when.
the performance of the proposed estimator over some existing estimators MLE, RLE, AURLE and RMLE in terms of scalar mean square error. Also, five different choices of existing ridge parameter estimates were used to compare the estimators. The results show that the newly proposed estimator outperforms all the other estimators considered in this study under the selected values of n, r, and k by means of SMSE.
Acknowledgements
We thank the Editor and the referee for their comments and suggestions, and the Postgraduate Institute of Science, University of Peradeniya, Sri Lanka for providing necessary facilities to complete this research.