Evaluating the Performance of Liu Logistic Regression Estimator

This study aims at comparing the performance of logistic Liu estimators with Maximum Likelihood (ML), Stien and ridge regression estimators using a Monte Carlo simulation, where the mean squared/absolute errors, / mean squared/absolute error between the actual probability and the estimated probability , / are used as performance criteria. An algorithm for simulation steps is included. An application of the effect of quantities of household wastes and its components on the probability of getting a running waste recycling factory is analyzed. Results from both the simulation and the application show that logistic Liu estimators are mostly preferred for correcting mutilcollinearity in logistic regression.


INTRODUCTION
Logistic regression, which is considered a member of the generalized linear models family, allows one to predict a discrete outcome.Generally, the dependent or response variable ‫ݕ‬ is dichotomous, such as presence/absence, success/failure,…, etc. whereas the independent or explanatory variables ‫ݔ‬ ଵ , ‫ݔ‬ ଶ , . . ., ‫ݔ‬ may be continuous, discrete, dichotomous, or a mix of these variables.The relationship between ‫ݕ‬ and ‫ݔ‬ ଵ , ‫ݔ‬ ଶ , . . ., ‫ݔ‬ is estimated using maximum likelihood method.Maximum likelihood estimates (MLE) have minimum variances, but in the presence of multicollinearity they were inflated and have large variances.
Biased estimators such as: Stien, ridge regression and Liu estimators were introduced for correcting multicollinearity.These biased estimators have a common advantage that is; all the explanatory variables are considered simultaneously without any reduction to improve the accuracy (Belsley et al., 1980;Belsley, 1991).
These biased estimators were first used for correcting multicollinearity in linear regression (Hoerl and Kennard, 1970 a, b;Dielman, 2005;Farag et al., 2012;Hamed et al., 2013;Rong, 2010).They aimed at achieving two goals: • Reducing the Mean Squared Errors (MSEs) for the estimates of the parameters and • Improving the conditioning of the information matrix, so that the obtained parameter estimates and their standard errors are smaller than ML estimates.
Stien estimators achieved the first goal but, it has a disadvantage which is, the shrinkage parameter d st (James and Stien, 1961), was calculated using MLE, which is already inflated as a result of multicollinearity, so that Stien estimators and their standard errors are inflated.Ridge regression estimates have achieved the second goal but, they have a disadvantage that there is still no consensus regarding how to select the ridge parameter d ridge (Le Cessie and Van Houwelingen, 1992;Kibria et al., 2012;Farag et al., 2012;Kan et al., 2013).Liu estimators were introduced to combine two different methods (Stien estimators and ridge regression estimators) to obtain the advantages of both estimators and avoid their disadvantages (Liu, 1993;Liu, 2003Liu, , 2004;;Akdeniz and Erol, 2003;Rong, 2010).
This study aims to evaluate the performance of logistic Liu estimators in comparison with MLE, Stien and ridge regression estimators using a Monte Carlo simulation, where the mean squared errors of parameters ‫ܧܵܯ‬ሺߚሻ, mean absolute errors of parameters ‫,‪ሺߚሻ‬ܧܣܯ‬ mean squared error between the actual probability ߨሺ‫ݔ‬ሻ and the estimated probability ߨ ොሺ‫ݔ‬ሻ, ‫ܧܵܯ‬൫ߨሺ‫ݔ‬ሻ൯; mean absolute error between the actual probability ߨሺ‫ݔ‬ሻ and the estimated probability ߨ ොሺ‫ݔ‬ሻ, ‫ܧܣܯ‬൫ߨሺ‫ݔ‬ሻ൯ are used as performance criteria and also in the simulation study, factors including the degree of correlation, the sample size and the number of explanatory variables are varied.The estimator with the lowest standard errors and with the minimum (‫ܧܵܯ‬ሺߚሻ, ‫,‪ሺߚሻ‬ܧܣܯ‬ ‫ܧܵܯ‬൫ߨሺ‫ݔ‬ሻ൯, ‫ܧܣܯ‬൫ߨሺ‫ݔ‬ሻ൯) is considered the best option for correcting multicollinearity in logistic regression.
Finally, the benefits of using logistic Liu estimator are shown using the data of municipal solid waste management in Egypt, where the effect of quantities of household wastes and its components (paper packing, plastic, glass and metal) on the probability of getting a running waste recycling factory is investigated.

METHODOLOGY
This section describes the binomial logistic regression model and the effect of multicollinearity on the parameters estimates and on its standard errors.Furthermore, different biased estimators are presented to correct multicollinearity in logistic regression.
The binomial logistic regression model: Let the real relationship between the response variable ‫ݕ‬ and the explanatory variables ‫ݔ‬ ଵ , ‫ݔ‬ ଶ , . . ., ‫ݔ‬ be as follows (Hosmer and Lemeshow, 2002;Agresti, 2002): where, i = 1, 2 …, n ݊ = Sample size ‫‬ = Number of the explanatory variables ‫ݔ‬ = The measurement of the j th explanatory variable for the i th observation, ݅ = 1,2, . . ., ݊, ݆ = 1,2, . . ., ‫.‬ ߚ = the j th regression parameters, ݆ = 1,2, … , ‫‬ ߝ = random error for the i th observation, ݅ = 1,2, . . ., ݊: ‫ݕ‬ = ൝ 1 the i ୲୦ observation has the property under consideration i = 1, 2, . . ., n 0 otherwise (2) The fitted logistic regression model is as follows: It is well known that the Maximum Likelihood Estimates (MLE) ߚ መ ொ for the logistic regression model are obtained by solving the following nonlinear system of equations using numerical methods as presented in Hosmer and Lemeshow (2002) and Agresti (2002): And the covariance matrix is calculated as follows: where, ൫ܺ ሗ ܹ ܺ൯ is the estimated weighted information matrix of order ሺ‫‬ + 1ሻ × ሺ‫‬ + 1ሻ.ܺ is a matrix of order ൫݊ × ሺ‫‬ + 1ሻ൯ that consists of the measurements of explanatory variables for each observation at each level of the response variable.ܹ is a diagonal matrix of order ሺ݊ × ݊ሻ its general element )) ߚ መ ொ are unbiased estimators with minimum variances when the explanatory variables ‫ݔ‬ ଵ , ‫ݔ‬ ଶ , . . ., ‫ݔ‬ are uncorrelated, but often produce poor results because of the multicollinearity problem among the explanatory variables (Lesaffre and Marx, 1993;Tutz and Lieitenstorfer, 2006).Multicollinearity may produce signs opposite to the true signs of paired correlations and yields theoretically important variables with insignificant coefficients.Also, it affects the ability of prediction, wider confidence intervals and incorrect decisions for testing hypotheses for the regression parameters (Agresti, 2002;Hosmer and Lemeshow, 2002;Månsson et al., 2012Månsson et al., , 2015)).

Multicollinearity in binomial logistic regression:
Many studies have been introduced for correcting multicollinearity in logistic regression models (Schaefer et al., 1984;Schaefer, 1986;Steyerberg et al., 2001;Aguilera et al., 2006;Camminatiello and Lucadamo, 2010;Farghali, 2012Farghali, , 2014;;Asar and Genc, 2016).They developed the methods that were used for correcting multicollinearity in linear regression (such as: Stien estimators, ridge regression estimators, principal components regression, Liu estimators and mathematical programming).Most of these studies evaluated the performance of a single method by comparing it with the MLE but not in comparison with other methods (Månsson and Shukur, 2011;Månsson et al., 2012Månsson et al., , 2015)).
In this study, we are concerned about evaluating the performance of logistic Liu estimators (with two different shrinkage parameters) in comparison with: MLE, logistic Stien estimators and logistic ridge regression estimators.
In Farghali (2012), the logistic Stien and logistic ridge regression estimators were introduced as follows: where, 0< d St <1 And the estimated standard errors ܵ‫ܧ‬൫ߚ መ ൯ ௌ௧ were calculated as follows: The shrinkage parameter for logistic Stien estimators was calculated as follows: The logistic ridge regression estimators were as follows: where, The estimated covariance matrix was as follows: (11) And the ridge parameter ݀ መ ௗ for logistic ridge regression was calculated as follows: Logistic Stien estimators have a disadvantage that the shrinkage parameter ݀ ௌ௧ is calculated using MLE, which is already inflated as a result of multicollinearity, so that Stien estimators and its standard errors are inflated.Also, Logistic ridge regression estimators have a disadvantage: there is still no consensus regarding how to select the ridge parameter ݀ ௗ (El-Dash et al., 2011;Hamed et al., 2013;Farghali, 2012).

Logistic Liu estimator:
The hope that the combination of two different methods (Stien estimators and ridge regression estimators) might inherit the advantages of both estimators and avoid their disadvantages motivated Liu (2003Liu ( , 2004) ) and Liu (1993), to suggest another biased estimator for correcting multicollinearity in linear regression.Månsson et al. (2012) suggested logistic Liu estimators to correct multicollinearity in binomial logistic regression.The logistic Liu estimators were as follows: Also, they suggested different methods for estimating the shrinkage parameter ݀ ௨ , one of these methods was as follows: where, ‫ݍ‬ is the j th eigenvalue of the standardized weighted information matrix ൫ܺ * ′ ܹ ܺ * ′ ൯, ݆ = 1,2, . . ., ‫.‬ ‫ݒ‬ is the j th eigenvector corresponding to the j th eigenvalue of the standardized estimated weighted information matrix, The shrinkage parameter ݀ መ ௨ in Eq. ( 14) was estimated in two steps: first, they calculated the value of each individual parameter ݀ መ as follows:  16) to a single value ݀ መ ௨ as shown in ( 14).The disadvantages of Månsson et al. (2012) were: • They did not introduce an exact method for determining a single value ݀ መ ௨ and • They did not study the performance of the suggested estimator in comparison to other biased estimators.
Farghali (2014) suggested multinomial logistic Liu estimators to correct multicollinearity in multinomial logistic regression.Following Liu (1993), the estimated shrinkage parameter ݀ መ ௨ was chosen to minimize the mean squared error of the parameters: So that, the estimated shrinkage parameter for Liu biased estimators was as follows: And the estimated covariance matrix was as follows: (19) Thus, in Farghali (2014) a single value of ൫݀ መ ௨ ൯ ூ was obtained and ܸܽ‫ݎ‬൫ߚ መ ௨ ൯ was introduced.Also, she extended Månsson et al. (2012) to correct multicollinearity in multinomial logistic regression.
The disadvantage of Farghali (2014) was that the performance of the suggested biased estimator was studied only by a set of hypothetical data.
In this study, logistic Liu estimator with ൫݀ መ ௨ ൯ ூ is introduced as a special case of the multinomial logistic Liu estimators by putting ‫ܥ‬ = 2 in Eq. ( 18) and ( 19), we obtained the estimated shrinkage parameter ൫݀ መ ௨ ൯ ூ and the estimated covariance matrix ܸܽ‫ݎ‬൫ߚ መ ௨ ൯ for binomial logistic regression.Simulation studies were conducted that evaluated the performance of logistic Liu estimator with both (݀ መ ௨ and ൫݀ መ ௨ ൯ ூ ) in comparison with MLE, logistic Stien estimators and logistic ridge regression estimators.Furthermore, the logistic Liu estimators were applied to a real-life dataset.

Judging the performance of the estimators:
To investigate the performance of logistic Liu estimators in comparison with MLE, logistic Stien estimators and logistic ridge regression estimators we calculate (‫ܧܵܯ‬ሺߚሻ, ‫,‪ሺߚሻ‬ܧܣܯ‬ ‫ܧܵܯ‬൫ߨሺ‫ݔ‬ሻ൯, ‫ܧܣܯ‬൫ߨሺ‫ݔ‬ሻ൯) using the following equations: where, ߚ መ is the estimator of ߚ obtained from MLE, logistic Stien estimators and logistic ridge regression estimators and logistic Liu estimators with both (݀ መ ௨ and ൫݀ መ ௨ ൯ ூ ) and R equals 2000 which corresponds to the number of replicates used in the Monte Carlo simulation.Monte carlo simulation: This section consists of a brief description of how the data are generated together with a result discussion.
The design of the experiment: The response variable of the logistic regression model is generated using pseudo-random numbers from the ‫݁ܤ‬൫ߨሺ‫ݔ‬ ሻ൯ distribution where: The parameter values of ߚ are chosen so that ߚ ′ ߚ = 1 and ߚ ଵ = ߚ ଶ = ⋯ = ߚ (Månsson and Shukur, 2011).To be able to generate data with different degrees of correlation, we use the following formula: where, ‫ݖ‬ are pseudo-random numbers generated using the standard normal distribution and ߩ ଶ represents the degree of correlation.In the design of the experiment, three different values of ߩ are considered ߩ = 0.75, 0.85 and 0.95.
The other factors that varied in the simulation study are the values of n and p. we use sample sizes corresponding to 50, 70, 100, 150 and 200 observations and regression models including 2 and 3 explanatory variables.

The proposed algorithm:
Step 1: Set sample size ݊; the total number of experiments ܴ; number of the explanatory variables p; and the parameters ߚ .
Step 3 : Generate data with different degrees of correlation according to formula (25).
Step 4 : The maximum likelihood estimates (MLE) ߚ መ ொ for the logistic regression model are obtained through solving nonlinear system of Eq. ( 4) and ( 5).
Step 5 : The shrinkage parameter for logistic Stein estimators is estimated using Eq. ( 9).
Step 9 : The logistic Liu shrinkage parameters were estimated as in Eq. (14 and 18).
Step 11 : Calculate mean squared errors of parameters ‫ܧܵܯ‬ሺߚሻ, mean absolute errors of parameters ‫,‪ሺߚሻ‬ܧܣܯ‬ mean squared error between the actual probability and the estimated probability ‫ܧܵܯ‬൫ߨሺ‫ݔ‬ሻ൯; mean absolute error between the actual probability and the estimated probability ‫ܧܣܯ‬൫ߨሺ‫ݔ‬ሻ൯ as in Eq. ( 20)-( 23).
As observed from Table 2, at ߩ = 0.75 and small samples sizes (݊ = 50 and 70) the best options are Liu estimators with ݀ መ ௨ followed by ridge regression estimators and Liu estimators with ൫݀ መ ௨ ൯ ூ , respectively according to the ‫ܧܵܯ‬ሺߚሻ, While according to ‫,‪ሺߚሻ‬ܧܣܯ‬ ‫ܧܵܯ‬൫ߨሺ‫ݔ‬ሻ൯ and ‫ܧܣܯ‬൫ߨሺ‫ݔ‬ሻ൯, Liu estimators with both (݀ መ ௨ and ൫݀ መ ௨ ൯ ூ ) are the best option.While for moderate and large samples sizes (݊ = 100, 150 and 200) the best option is Liu estimators with ൫݀ መ ௨ ൯ ூ according to the four criteria.In the case of high multicollinearity (ߩ = 0.85 and 0.95) with all samples sizes Liu estimators with ݀ መ ௨ showed its best performance by means of the reduction of the four criteria.
Thus it can be seen that, Liu estimators with both (݀ መ ௨ and ൫݀ መ ௨ ൯ ூ ) are mostly preferred for correcting multicollinearity problem in binomial logistic regression.

Real data application:
In this section, a real data set taken from the annual statistical book for environment in Egypt (September 2014) is used for comparing the different methods for correcting multicollinearity in logistic regression.A logistic regression model is estimated, where the response variable is defined as follows: 1 the i th governorate has a running waste recycling factory i = 1,2, . . .,25 0 otherwise The data was collected from the entire Egyptian governorate (25 governorates) during the year (2011).
This response variable will be explained by the following explanatory variables: the quantity of home wastes by tons/year ሺܺ ଵ ሻ, the quantity of packing paper wastes by tons/yearሺܺ ଶ ሻ, the quantity of plastic wastes by tons/year ሺܺ ଷ ሻ, the quantity of glass wastes by tons/year ሺܺ ସ ሻ and the quantity of metal wastes by tons/year ሺܺ ହ ሻ, respectively.Hence, in this real data application, the effect of changing the type and the  The above correlation matrix; showed that all the bivariate correlations are greater than 0.88 which means that there is a problem of multicollinearity.The logistic regression model is estimated using the computer software R by applying the IWLS algorithm.
It can be noticed that the quantity of household wastes ሺܺ ଵ ሻ and the quantity of plastic wastes ሺܺ ଷ ሻ have a positive impact on the running waste recycling factory, where, the quantity of packing paper wastes ሺܺ ଶ ሻ and the quantity of metal wastes ሺܺ ହ ሻ have negative impact.For the quantity of glass wastes ሺܺ ସ ሻ it has negative impact on the running waste recycling factory for all estimators except the Liu estimators (with ݀ መ ௨ ), these means that one can increase the probability of having a running waste recycling factory by increasing the quantities of household wastes and plastic wastes and decreasing the quantities of packing paper wastes, metal wastes and glass wastes.
Table 3 indicates, that the lowest parameter estimates and its standard errors are obtained by logistic Liu estimators with both (݀ መ ௨ and ൫݀ መ ௨ ൯ ூ ), while the largest are obtained by the MLE estimates which suffer from multicollinearity.It means that logistic Liu estimators with both (݀ መ ௨ and ൫݀ መ ௨ ൯ ூ ) are mostly preferred than other estimators to correct mutilcollinearity in logistic regression which ensure the simulation results.

CONCLUSION
In this study, a new shrinkage parameter for logistic Liu estimator, named ൫݀ መ ௨ ൯ ூ , which provides an alternative method for dealing with multicollinearity in logistic regression, was introduced.We have designed an algorithm for a Monte Carlo experiment by generating random numbers for explanatory variables and the response variable.We have considered several sample sizes, degrees of correlation and number of the explanatory variables.We have compared the logistic Liu estimators with both (݀ መ ௨ and൫݀ መ ௨ ൯ ூ ) MLE and other estimators (Stien and ridge regression) that were used to correct multicollinearity in binomial logistic regression.The MSE(β), MAE(β), ‫ܧܵܯ‬൫ߨሺ‫ݔ‬ሻ൯ ܽ݊݀ ‫ܧܣܯ‬൫ߨሺ‫ݔ‬ሻ൯are used as performance criterion.The results showed that logistic Liu estimators with both (݀ መ ௨ and൫݀ መ ௨ ൯ ூ ) are much more robust to the correlation than other estimators to correct mutilcollinearity in logistic regression.Therefore, the MLE should not be used in the presence of severe multicollinearity, as it becomes unstable with large variances and it has large MSEሺβሻ.Logistic Liu estimators with both (݀ መ ௨ and൫݀ መ ௨ ൯ ூ ) has the best performance in the simulation than other estimators.Thus, these results agreed with Kibria et al. (2012) and Farghali (2014).
Finally, the estimators are applied to a real dataset, where the effect of changing the type and the quantity of wastes on the number of running waste recycling factories is explored, to show that the logistic Liu estimators with both (݀ መ ௨ and൫݀ መ ௨ ൯ ூ ) are practical.
on a simulation study, they reduced the p values obtained in (

Table 3 :
The estimated parameters and the standard errors of the different estimators