Shrinkage Estimation in the Random Parameters Logit Model

In this paper, we explore the properties of a positive-part Stein-like estimator which is a stochastically weighted convex combination of a fully correlated parameter model estimator and uncorrelated parameter model estimator in the Random Parameters Logit (RPL) model. The results of our Monte Carlo experiments show that the positive-part Stein-like estimator provides smaller MSE than the pretest estimator in the fully correlated RPL model. Both of them outperform the fully correlated RPL model estimator and provide more accurate information on the share of population putting a positive or negative value on the alternative attributes than the fully correlated RPL model estimates. The Monte Carlo mean estimates of direct elasticity with pretest and positive-part Stein-like estimators are closer to the true value and have smaller standard errors than those with fully correlated RPL model estimator.


Introduction
The random parameters logit (RPL) model is a generalization of the conditional logit model for multinomial choices.The conditional logit model is derived from an assumption that the errors in the underlying random utility functions for each choice alternative are statistically independent and identically distributed (iid) extreme value type I.This leads to the property known as the Independence of Irrelevant Alternatives (IIA): The ratio of the probability of two alternatives remains constant no matter how many choices there are.This is widely regarded to be a very restrictive assumption.
The key feature of the RPL model is that response parameters can vary randomly, following a chosen distribution, across the population from which samples are drawn.The random coefficients capture individual heterogeneity and the model does not suffer from the independence of irrelevant alternatives assumption.The random coefficients can be correlated in the RPL model as generally expected in reality, because the unobservable preference of each individual is used to evaluate the attributes of all alternatives in each choice situation.Estimation is by maximum simulated likelihood (MSL), which is described by [1].
In this paper we explore a problem that can exist in any correlated random parameters model.Let n y , 1, , n N =  be an observable outcome variable from a density ( ) f y x β , where n x is a vector of K explanatory variables and n β are random parameters with mean β and covariance matrix Σ .Using MSL we estimate the population parameters β and Σ .Allowing the random parameters to be correlated introduces potentially many new parameters, ( ) covariance terms, that are difficult to estimate.Most applied researchers will test the significance of the covariance parameters before deciding to rely on the fully correlated random parameter model instead the model in which the parameters are random but uncorrelated, so that Σ is diagonal.We explore whether a pretesting strategy improves postestimation inference.We also explore the use of a Stein-like shrinkage estimator as an alternative to pretesting.This estimator shrinks the estimates from the fully correlated parameter model towards the estimates of the uncorrelated random parameter model.In numerical experiments using the RPL model we find that both the pretest estimator and shrinkage estimators have improved mean squared error (MSE) relative to the MSL estimator of the fully correlated parameter model.Last, we analyze the share of the population putting a positive or negative value on the alternative attributes, and the Monte Carlo mean estimates of direct elasticity with fully correlated RPL model estimates and pretest and shrinkage estimates.Based on our Monte Carlo experiment results, pretest and shrinkage estimates provide more accurate estimates on both of them than the fully correlated RPL model estimates.

The Random Parameters Logit Model
The RPL model is described in [2] where θ contains the unknown mean and covariance parameters, the probability that individual n chooses alternative i is ( ) ( ) For estimation purposes we use Cholesky's decomposition and write AA′ Σ = , where A is lower triangular.The parameter means k β and elements of A are the objects of estimation.The parameters of the fully cor- related RPL model (FCRPLM), are ( ) , , , , , , , , where kk a are diagonal elements of A and jk a , j k < , are below the diagonal.If the random coefficients in the RPL model are uncorrelated, denoted UCRPLM, then θ is ( ) , , , , , where 2   2 k kk a σ = .

Stein-Like Shrinkage Estimation
Stein-rule estimators follow the work of [3] and [4] and combine sample information with non-sample infor-mation in a way that improve the precision of the estimation process and the quality of subsequent predictions.The Stein-rule estimator is a weighted average of the restricted and unrestricted estimators, the weight being a function of the magnitude of the test statistic used to test the restrictions.Following is the Stein-rule estimator which dominates the maximum likelihood estimator (MLE) in linear regression under weighted quadratic loss with weight matrix W, y X e β = + , where y is a ( ) and e is a ( ) independent linear restrictions on β , the Stein-rule estimator that combines sample and non-sample information is: where * β is the restricted estimator, obtained by minimizing the sum of squared errors subject to the set of re- strictions, Sufficient conditions for minimaxity, meaning that the estimator minimizes the maximum risk over the entire parameter space, are 2 G > restrictions and the scalar a chosen to lie within the interval ( ) max 0, a : where L η is the largest characteristic root of the matrix in braces.The estimator ( ) where u is the test statistic for the hypothesis R r β = , and . If the data support the nonsample information then u will be small and a relatively large weight is placed on the restricted estimator * β .
Conversely, if the data do not support the imposed restrictions, u will be large and the unrestricted estimator β is more heavily weighted.When * u c < , the Stein estimator reverses the sign of the estimator β , or the latter is shrunk beyond the hypothesis vector.The problem is resolved by the use of "positive rule" estimator, which preserves the sign of the estimates and dominates the Stein-rule estimator over the entire parameter space.
The positive-part Stein-like estimator ( ) θ + is a stochastically weighted convex combination of the MLE from an unrestricted model and a restricted MLE subject to J constraints.In our case the unrestricted MLE ( ) ˆf θ comes from the FCRPLM estimates and the restricted MLE from the UCRPLM estimates ( ) where is the indicator function of a test statistic u for the null hypothesis that the coefficient covariance matrix is diagonal, or equivalently that the Cholesky elements in A below the diagonal are zero.The scalar a controls the amount of shrinkage towards the UCRPLM estimates.The shrinkage estimator θ + becomes the UCRPLM estimator u θ when the test statistic u is less than the value of a.The larger the value of a, the more weight that is given to the UCRPLM estimates.[5] show that if the number of constraints 2 J > , then under information weighted quadratic loss the risk of the shrinkage estimator is smaller than the risk of the unrestricted maximum likelihood estimator for any 0 c > .Common choices for the shrinkage constant are ( ) is the number of covariance terms constrained to zero when obtaining the UCRPLM estimates.
With test statistic u, the pretest estimator * where c α is the critical value of chi-square distribution with J degrees of freedom and significance level α .With the given of degrees of freedom, the critical value c α is determined by the level of test significance α , which is between 0 and 1.When 0 α = , pretest estimator * θ becomes UCRPLM estimator u θ .When 1 α = , pretest estimator * θ is FCRPLM estimator f θ .

Design
In our experiments the number of choice alternatives is 4 M = and the number of individuals is 200 N = .Each individual is assumed to be observed once.The four explanatory variables for each individual and each alternative ni x are generated from independent log-normal distributions ( ) ln 1, 0.25 N .The coefficients for each individual n β are generated from multivariate normal distribution ( ) j k =  .The correlation ρ takes the values 0, 0.2, 0.4, 0.6, 0.8.The values of ni x and n β are held fixed over the 1000 NS AM = Monte Carlo samples in each experiment.The choice probability for each individual is generated with the logit-smoothed accept-reject simulator suggested by [6].
Our simulation and RPL model estimation were carried out in NLOGIT 5.0.Based on our Monte Carlo experiment results, [7] and [8], we use 100 Halton draws to simulate choice probabilities during MSL estimation.The positive-part Stein-like and pretest estimators were calculated based on the likelihood ratio (LR), Lagrange multiplier (LM) and Wald test statistics with 25%, 5% and 1% significance level.Because the empirical percentile values of LR test are closer to the related critical values than those of LM and Wald tests, we only provide the results based on the LR test statistic.Using Monte Carlo experiments to study the RPL model, especially with correlated parameters, is numerically challenging.Key elements that are worth mentioning are 1) for the uncorrelated parameter model conditional logit estimates were used as starting values; 2) for the correlated parameter model the estimates from 1) were used as starting values; 3) samples for which convergence was not achieved were discarded, only 0.3% of the results are unconverged in our Monte Carlo experiments.

Results
To study how the pretest and shrinkage estimators reduce the estimation risk of the FCRPLM estimators, we calculate the MSEs of the estimated parameters mean, variance, covariance with the pretest, shrinkage and FCRPLM estimators respectively.First, we compare the MSE of the fully correlated estimators and those of UCRPLM estimators, where MSE is the Monte Carlo average of the squared error loss ( ) Table 1, the MSEs of UCRPLM estimators are all smaller than those of FCRPLM estimators.The risk of the estimated parameters mean with the FCRPLM is more than twice that of the UCRPLM.The MSEs of the estimated variance with the UCRPLM are about 25% of those with the FCRPLM.With nonzero correlation ρ , the MSEs of estimated covariance parameters based on the FCRPLM are much bigger than those based on the UCRPLM.When the correlation 0.2 ρ = and 0.4, the ratios of MSEs of estimated covariance elements are relatively smaller compared to the results for higher correlations.This implies that when the specification error is small, the FCRPLM, which is the correct model, has a much larger relative MSE for parameter covariance elements than the UCRPLM.The covariance elements reveal important information about the joint effect of alternative attributes on people' decisions.If two random coefficients are highly positively correlated with each other, it means people are attracted and motivated by both of the related attributes.In our Monte Carlo experiments, the shrinkage estimators with higher shrinkage constant a outperform estimators with less shrinkage and most of the pretest estimators.
Since one of the advantages of RPL model is providing the information on the share of population that places a positive or negative value on the alternative attributes, we also calculate the joint probability of the first two estimated parameters are less than zero.Table 3 shows the share of population putting a negative value on the attributes.Compared to the results with UCRPLM and FCRPLM estimates, the joint probability with FCRPLM estimates are closer to the true value with larger MSEs, except for the 0 ρ = .From Table 3, the pretest and shrinkage estimates reduce the MSE of the joint probability estimator compared to the FCRPL model estimates.Even though the bias of the joint probability with pretest and shrinkage estimates are higher than UCRPLM and FCRPLM estimates, the difference is small in magnitude.
To analyze the sensitivity of the RPL model in response to a change in the level of alternative attribute, we calculate the mean estimates of direct elasticity with the true parameters ( )   x is chosen to calculate the related mean estimates of direct elasticity.
Comparing the results in Table 4 to Table 5, we find that the results with FCRPLM estimates are all higher than the true value.When the 0.2 ρ > , the results with pretest and shrinkage estimators are closer to the true value than those based on the FCRPLM estimators.The shrinkage estimators with the larger shrinkage constant have smaller bias of the Monte Carlo mean direct elasticity estimates than the pretest estimates and shrinkage estimates with smaller shrinkage constant.At the same time, the shrinkage and pretest estimators have smaller standard error of the Monte Carlo mean direct elasticity estimates than the FCRPLM estimates.Based on our Monte Carlo experiment results, the shrinkage and pretest estimates will give more reliable mean direct elasticity estimates than the FCRPLM estimates, especially with a larger shrinkage constant.

Conclusion
According to our Monte Carlo experiment results, the UCRPLM estimators have smaller estimation risk than the FCRPLM estimators.The pretest and positive-part Stein-like estimators both perform better than the FCRPLM estimators.The positive-part Stein-like estimators with higher shrinkage constant a outperform those with a smaller one and the pretest estimators.Shrinkage estimation reduces the risk of the FCRPLM estimators by shrinking the FCRPLM estimates towards the UCRPLM estimates.Providing the information on the share of population putting a negative or positive value on the alternative attributes is one of the advantages of the RPL model.When the random coefficients are correlated to each other, the FCRPLM estimator of this quantity has a smaller bias and slightly larger MSE than the UCRPLM estimator.Based on our Monte Carlo experiments, the pretest and shrinkage estimates can reduce the MSEs of the estimated results of share of the population putting a positive or negative value on alternative attributes as well.The Monte Carlo mean estimates of direct elasticity based on the pretest and shrinkage estimators with a larger shrinkage constant are closer to the true value with smaller standard errors than those based on the FCRPLM estimators.
. Consider individual n facing M alternatives.The random utility associated with alternative i is ni ni x are K observed explanatory variables for alternative i, ni ε is an iid type I extreme value error which is independent of n β and ni x .The random coefficients n β can be regarded as being composed of a mean β and deviations n β  .The RPL model decomposes the unobserved part of the utility into the extreme value term ni ε and the random part n ni x β ′  .Conditional on n β the pro- bability that individual n chooses alternative i is of the usual logistic form,

Table 1 .
The ratios of uncorrelated RPL model estimator MSE to the FCRPLM estimator MSE.

Table 2 ,
we compare the MSEs of LR based pretest and shrinkage estimators to those of FCRPLM estimators.All Table2ratios are less than one.The pretest and shrinkage estimators all perform better than the FCRPLM estimators.With a smaller level of test significance α , the UCRPLM estimator u θ is more fre- quently chosen as the pretest estimator and the pretest estimator has smaller MSE.However, compared to the

Table 4 ,
and the Monte Carlo mean estimates of direct elasticity based on pretest, positive-part Stein-like estimates and FCRPLM estimates,

Table 2 .
The ratios of LR Based pretest, shrinkage estimator MSE to the FCRPLM estimator MSE.

Table 3 .
The Share of population putting negative value on the first two attributes of each alternative, ( ) Note: [ ] provides the MSE results, {} provides bias results.

Table 4 .
The mean estimates of direct elasticity with true parameters ( )

Table 5 .
Since the pretest estimator with smaller level of test significance has smaller MSE, we use the pretest estimator with 1% significance level.The first explanatory variable in each alternative , ,1

Table 5 .
The Monte Carlo mean estimates of direct elasticity based on pretest, shrinkage and FCRPLM estimates.
Note: ( ) provides the standard error results.