1. Introduction
A shrinkage estimator is an estimator that, either explicitly or implicitly, incorporates the effects of shrinkage. In loose terms this means that a naive or raw estimate is improved by combining it with other information. The term relates to the notion that the improved estimate is made closer to the “true value” by the supplied information than the raw estimate. Shrinkage estimation is a technique used in inferential statistics to reduce the mean squared error (MSE) of a given estimator. The idea of shrinking an estimator came in 1956 when Stein [1] established that we can reduce the MSE of an estimator if we give up a little on bias. This means that given an estimator we can shrink it to obtain another estimator with lower MSE and the efficiency of the new estimator is desirable in the way it estimates the “true” parameter value. This works well when the number of parameters is more than two (
) called the “James-Stein classical condition”. When we shrink a maximum likelihood estimator (MLE) under the “James-Stein condition”, we obtain a new shrinkage estimator which is closer to the assumed true value compared to the MLE. The magnitude of the improvement depends on the distance between the “true” parameter value and the parametric restriction which yields a shrinkage target denoted by
. With all these modifications and restrictions to achieve this estimator, we ask ourselves if this desirable shrinkage estimator is asymptotically consistent and efficient.
Literature on shrinkage estimators is a lot but we just mention a few of the most relevant contributions to our study. James and Stein [2] used shrinking techniques to come up with an estimator called the James-Stein shrinkage estimator (JSSE) which has lower squared risk loss compared to the MLE. Baranchik [3] showed that the positive part James-Stein shrinkage estimator has a lower risk than an ordinary JSSE. Berger [4] gives a discussion on selecting a minimax estimator of a multivariate normal mean by considering different types of James-Stein type estimators. Stein [5] used shrinking techniques to estimate the mean for a multivariate normal distribution. Carter and Ullah [6] constructed the sampling distribution and F-ratios for a James-Stein shrinkage estimator obtained by shrinking an ordinary least square (OLS) in regression models. George [7] proposed a new minimax multiple shrinkage estimator that allows multiple specifications for selection of a set of targets to shrink a given estimator. Geyer [8] looked at the asymptotics of constrained M-estimators which also fall in the class of shrinkage estimators. Then Hansen [9] constructed a generalised James-Stein shrinkage estimator obtained by shrinking an MLE, and Hansen [10] derived its asymptotic distribution and showed that we can shrink towards a sub-parameter space.
2. Preliminaries
The theory of shrinkage techniques plays an important role in developing efficient statistical estimators which play a key role in statistical decision theory. Therefore, a clear understanding of the asymptotic behaviour of the James-Stein shrinkage estimator
provides knowledge on the stability and efficiency of the estimator when the sample size value n grows without bound.
This paper will investigate the asymptotic consistency and efficiency of the James-Stein shrinkage estimator (JSSE) obtained by shrinking a maximum likelihood estimator (MLE) when we have observed variables
. Though the shrinkage estimator we are interested in is biased its study is important because there is a realisation that efficiency (lower risk) dominates all other properties in estimation. Efron [11] discusses how bias dominates unbiasedness in estimation.
We proceed by considering the asymptotic distributions of all the three estimators important to this study by considering results in Hansen 2016. Using the asymptotic distribution derived by Hansen [10] , we employ Taylor’s theorem and some limit theorems to show that
the “true” parameter value as
. Then we evaluate the asymptotic distributional bias (ADB) for the estimators
,
and
and show that the variance of the latter achieves the Cramér-Rao bound (CRB) as
. The analysis is done along the sequences
as
. Simulation plots are produced in a statistical package R to compare the JSSE and MLE in terms of mean squared error (MSE), consistency and convergence.
The paper is organised as follows. Section 2.1 presents the parametric set up. Section 2.2 gives the form of the JSSE considered in the study while Section 2.3 discusses the asymptotic distributions of the estimators. In Section 3 we present the main results by first presenting a lemma on the convergence in probability of the shrinking factor as Section 3.1. We then show the consistency of the shrinkage estimator in Theorem 1. In Section 3.2 we evaluate the ADBs of the estimators in play. Then we show that the James-Stein shrinkage estimator is asymptotically efficient in Section 3.3 and also establish the rate of convergence in Section 3.4. In Section 3.5 we present MSE plots comparing the JSSE and MLE and in Section 4 we give a discussion and analysis of the whole study. Then conclude our study by stating the main results.
The following definitions are used to establish the asymptotic consistency and efficiency of the James-Stein shrinkage estimator
.
Definition 1
An estimator
is said to be consistent for
if it converges in probability to
. That is, if for all
or
where n is the sample size value.
Definition 2
Let
be independent and identically distributed (iid) according to a probability density
satisfying suitable regularity conditions. Suppose that
is asymptotically normal say that
for a positive definite matrix
where
is estimating
. Then a sequence of estimators
satisfying
for the fisher information
is said to be asymptotically efficient.
We now consider a statistical model. We describe the set up of the parameter of interest and the shrinking strategy used in the study.
2.1. Parametric Structure
Consider an unbiased estimator
for
such that
is a p-multivariate normal, where elements of Ω are p-dimensional parameter vectors. Let
(shrinkage target) be a restricted maximum likelihood estimator (RMLE) for
a sub-parameter space partitioned from the whole parameter space Ω by a parametric restriction
where
. The sub-parameter space
provides a simple model of interest to shrink to. If
then
(
is the kernel of
) a singleton zero vector and if
, we create a sub-model of particular interest.
Let
where
is a shrinkage matrix of dimension
. We introduce another matrix
which harmonises the dimension of the RMLE from m to p. Hence we have a mapping
such that
. The
matrix
is an
matrix when we consider a sub-parameter space
and
is a p-dimensional identity matrix when we have the whole parameter space Ω. We note that the matrix
is used to increase the dimension of the RMLE since it will be m-dimensional. Therefore we have a plug-in restricted maximum likelihood estimator
. The matrix
is a matrix harmonising the dimension due to shrinkage and the actual dimension of the parameters of interest p. The plug-in unrestricted MLE
in the shrinkage sense is denoted by
. With all parameters set, we present the generalised James-Stein shrinkage estimator
in the next section.
2.2. Positive Part James-Stein Shrinkage Estimator
Let
be the MLE for
and
be a restricted maximum likelihood estimator for
a sub-parameter space of the whole parameter space Ω such that the elements of the parameter space are in
as described before. Let
be the plug-in estimator of the RMLE of p-dimension. Then the James-Stein shrinkage estimator
obtained by shrinking the MLE towards the target
is given by
(1)
where
is positive trimming function and
. The shrinkage estimator in (1) can be expressed as a weighted average by letting
(2)
a distance statistic which is the same as the loss function
where
is a covariance matrix and
(3)
for
. Then (1) becomes
(4)
which is a weighted average of
and
. The James-Stein shrinkage estimator presented above has lower risk compared to the MLE as shown by Hansen [10] and James and Stein [2] . To check whether the shrinkage estimator is asymptotically consistent we need its asymptotic distribution. We therefore present the asymptotic distributions for the MLE, RMLE and JSSE in the next section.
2.3. Asymptotic Distribution
We assume that the maximum likelihood estimator satisfies the assumptions for regularity conditions given in Hansen [10] and Newey [12] . With these assumptions in mind, the asymptotic distributions of
and
are analysed along
the sequences
where
is the assumed true parameter value and
is a constant providing a neighbourhood for the true parameter value
. From the normality of the MLE we have
(5)
as
. Using (5), Hansen [10] obtained the asymptotic distribution of the restricted maximum likelihood estimator as
(6)
which has some shrinkage value effect
. As a consequence of the convergence in (5) and (6) we have
(7)
which is an asymptotic distribution the MLE converges to when it is estimating the RMLE where
. The distance statistic
in Equation (2) converges to some distribution given by a non central chi-squared distribution as described by Hansen [10]
(8)
where matrix
. Using (2) Hansen [10] showed that
(9)
which is a positive asymptotic distribution of an inverse of a chi-squared distribution with constant
for
. Therefore using (9) as
, Hansen [10] showed that
(10)
which is normally distributed with some shrinkage value effect. With the asymptotic distribution of the shrinkage estimator in place we now present the main results.
3. Main Results
In this section we present our main results. We show that the James-Stein shrinkage estimator
is asymptotically consistent. Secondly, we evaluate the asymptotic distribution bias of the three estimators in play. We then show that the shrinkage estimator is asymptotically efficient by showing that its variance achieves the Cramér-Rao bound. Further, we explore the convergence rate of the James-Stein shrinkage estimator and present the simulation plots for the MSE done in R.
3.1. Consistency of the James-Stein Shrinkage Estimator
We present Lemma 1 which shows the convergence in probability of the weight (shrinkage factor)
. The result is used when establishing the consistency of the James-Stein shrinkage estimator
.
Lemma 1
From Equation (8) we have
(11)
where
is a non central Chi-squared distribution with non centrality parameter
,
and
. Along the sequences
, if
then
(12)
and if
is fixed then
otherwise
(13)
where
is a constant such that
and
in (11) is a positive trimming function which keeps what is in the brackets greater than or equal to zero.
Proof.
We begin by considering the first case when
diverges to infinity. Suppose that
then
as
. (14)
Therefore from (14) we have
as
and
. (15)
Now considering that
as
. Using (15) we have
as
. (16)
Hence we have established (12).
Secondly suppose that
is fixed, then we have the sequence
which becomes
as
implying that
as
. Suppose
as
(17)
where D is a constant,
is fixed and
is not affected by an increase in n, then
where
. If
as
, then
If
then
will be negative and by definition of the positive streaming function we end up with zero. This will vary as p changes but still considering
the probability of
depends on the degrees of freedom p and will vary according to the chi-squared distribution, implying that the ratio
as
. Therefore we have
as
,
as
for a constant
. Proceeding in the same way we have
as
, for
,
as
by definition of
. Thus
as
. (18)
Otherwise, if the ratio
is such that
as
, we have
where
.
Lemma 1 above establishes convergence of the weight
which determines shrinkage. In this case we realise that the same weight determines the convergence in distribution and probability of the shrinkage estimator. From the regularity conditions we know that the MLE
is asymptotically consistent and this consistency extends to the RMLE
. With this fact in mind, we now present the main result which shows the consistency of the shrinkage estimator
.
Theorem 1
Let
, where Ω is a parameter space with elements in
. Suppose we have a James-Stein shrinkage estimator
which is obtained by shrinking the maximum likelihood estimator
of
where the shrinkage target
is the restricted maximum likelihood estimator of
a partitioned sub-parameter space from Ω by the restriction described in Section 2.1. Then the JSSE is given by
where
,
,
and
is a positive trimming function. If
is consistent for
as
then the James-Stein shrinkage estimator
is also consistent for
as
, where the sequence
is as defined in Section 2.3.
Proof.
Let
. To show that
is consistent for
as
we consider the value of
which determines the neighbourhood of the sequence
, when it diverges to infinity and when it is just fixed. Suppose that
diverges to infinity, to evaluate
(19)
as
, we first consider
from (3). By Lemma 1
as
. (20)
Hence from (20) and substituting
by 1 in (19) we have
which gives
(21)
as
where
. Thus we have
if
for any
. Hence
is consistent for
.
Secondly, suppose
is fixed within an imaginable value, then the sequence
becomes
as
. From this equality we have
and two conditions arise. The first one is that the sequence
will be within the restricted parameter space
with
. From the restriction
of
this means that the shrinkage target is exactly at the true value and our consideration will be just on one parameter space. Therefore, we have
, but from (6)
which will be the same as the asymptotic distribution of
since we only consider the sub-parameter space
, and the shrinkage value
affects it. Thus
as
(22)
because we are estimating
and from Section 2.1 there will be no difference between the MLE and RMLE. Due to this equality of the two maximum likelihood estimators, from (4)
and substituting (22) in (19) we have
as
, which becomes
as
, and then simplifies to
(23)
as
, which is the same as the asymptotic distribution of
. Therefore using the consistency of the RMLE and (23), the consistency of the James-Stein shrinkage estimator
follows from the consistency of
.
Lastly, we consider the case when we have two well defined parameter spaces,
and
. Then we have
. Analysing (19) further, we consider the shrinkage effect value
which is not affected by the sample size value n but it is a value affected by an increase or decrease in the number of parameters p. Since
then
(24)
by linearity property of the normal distribution. Also implying that
(25)
for some matrix
of dimension
. From (13) of Lemma 1 we have
if
(26)
as
. Therefore (19) becomes
for some shrinkage value effect matrix
. Evaluating this asymptotic distribution as
we have
as
since
. Thus
as
.
Hence the consistency of
follows from the consistency of
which is consistent since
is consistent. Similarly if
, the consistency of the James-Stein shrinkage estimator
follows from the consistency of the restricted maximum likelihood estimator and also the fact that
is consistent for
. Thus the shrinkage estimator
is asymptotically consistent for
.
In Theorem 1, we first consider the case when the sequence
has a neighbourhood which is not restricted by letting
diverge to infinity. When this is the case, the entire parameter space becomes of interest and for
we obtain
and
as
. Hence there is no difference on how the parameters in
and Ω are asymptotically distributed. As a result the asymptotic distribution of the James-Stein shrinkage estimator is the same as that of the initial maximum likelihood estimator under this condition. Therefore the consistency of the James-Stein shrinkage estimator follows from the consistency of the maximum likelihood estimator.
In the second case we take
as a fixed imaginable value. In this case the two parameter spaces are well defined and distinctive in terms of where the parameters of interest are located. When
then
since
. Thus
when we are within the restricted sub-parameter space
the maximum likelihood estimator and the restricted likelihood estimator are asymptotically distributed the same. The consequence of having the two maximum likelihood estimators (MLE and RMLE) distributed the same results in the James-Stein shrinkage estimator having the same asymptotic distribution as the MLE and RMLE. Furthermore, we have
as
. Stone [13] obtained similar results though under invariant estimators. In this case the two maximum likelihood estimators do not have to be necessarily invariant. Therefore the James-Stein shrinkage estimator
is asymptotically consistent for
.
In the next section we investigate the asymptotic distribution bias of
. The results in this section are used in showing the asymptotic efficiency of the shrinkage estimator
.
3.2. Asymptotic Distributional Bias
We study the asymptotic distributional bias (ADB) for the three estimators by analysing the asymptotic bias values. The ADB of an estimator
is given by
(27)
where the estimator
is estimating
. We present the asymptotic distributional bias for
,
and
in the theorem below.
Theorem 2
Suppose that regularity assumptions for the MLE and RMLE hold. Then under
a sequence of parameter dimension with the sample size value n and
, the ADBs of the estimators
,
and
are respectively
1.
2.
3.
where
.
Proof.
1.
(28)
since
as
.
2.
(29)
from Equation (6).
3.
From Equation (19) of Theorem 1 we have
where
Therefore,
where
,
and
as
since
.
Then
(30)
where
for
.
Remark 1
When the fixed constant
, the asymptotic distributional bias values of the three estimators are zero. Therefore
.
From Equation (28) of Theorem 2, the maximum likelihood estimator is asymptotically unbiased. Equations (29) and (30) show that the restricted maximum likelihood estimator and the James-Stein shrinkage estimator are both asymptotically biased. This means that both shrinking and partitioning of a parameter space brings bias to estimators.
Using the bias of the shrinkage estimator obtained above, we now analyse whether the James-Stein estimator
is asymptotically efficient.
3.3. Asymptotic Efficient
To check whether the shrinkage estimator
is asymptotically efficient, we use the Cramér-Rao bound for biased estimators. In the theorem below we show that the variance of the JSSE achieves this bound as
. We use concepts of the study by Hogdes and Lehman [14] .
Theorem 3
Let
be a James-Stein shrinkage estimator obtained by shrinking a maximum likelihood estimator
where the two estimators are as defined in Section 2.2. Given the asymptotic bias
of the JSSE
, the Cramér-Rao bound for
is given by
for
(31)
where
is the fisher information and
is the derivative of the jth element of the bias vector. Then
as
,
and thus the James-Stein shrinkage estimator
is asymptotically efficient for all
.
Proof. We analyse asymptotic efficiency by evaluating the Cramér-Rao bound as
. Consider the bias of the estimator
from part 3 of Theorem 2
(32)
where
for
. The expectation
of the fraction
which follows a distribution determined by the distribution
has a value (constant) free of the parameter
. Therefore we regard it as a constant. Let
then (32) becomes
(33)
and
will be
(34)
a matrix of dimension p. Using the definition of the CRB and then combining (32) and (34) we obtain
(35)
for
where
is the partial derivative of the jth element,
,
and
We begin our analysis of the bound by considering the terms involved. Thus
remains the same as
. We have
(36)
as
where the elements of
are zeros apart from the diagonal elements
which are ones for
since the observations are iid and follow a p-multivariate standard normal distribution. Thus from (36) we have
and
(37)
as
for
. This implies that
and
(38)
for
as
. Then from (38) we have
and
(39)
for
as
. Therefore from (38) and (39), then using (35) we have
(40)
as
for
. Since for all
we have
then
as
. Hence from (40) we have the variance for the James-Stein shrinkage estimator
,
converges to the CRB as
for all
. This means that
as
for all
. Thus the James-Stein shrinkage estimator
is asymptotically efficient.
Theorem 3 above shows that the James-Stein shrinkage estimator obtained from shrinking the MLE achieves the CRB asymptotically. This means that the shrinkage estimator is asymptotically efficient and hence stable when we have large sample size values. Since the initial estimator (MLE) is known to be asymptotically efficient then we see that the shrinking process has no effect on the asymptotic efficiency of an estimator we are shrinking.
3.4. Rate of Convergence
We now investigate the rate of convergence of the shrinkage estimator
(JSSE) by using concepts applied on the MLE discussed in Hoeffding [15] . To proceed we consider the shrinkage estimator of the form in (1) using plug-in maximum likelihood estimators
and
. Let
be a James-Stein shrinkage estimator obtained when we shrink the MLE
defined earlier before for
. We proceed to find the rate of convergence of this estimator by using its relationship with the MLE. Since the shrinkage target value may have no effect on the convergence rate, for easier transformation of our sequence
we set
implying
. Thus we have
which becomes
when we factor out
and drop out the n in the denominator to have a form with a lower MSE according to the James-Stein shrinkage strategy. Let
(41)
then
(42)
Now consider the sequence
(43)
for
where
is the “true” jth parameter value. From the equality in (42) we have
(44)
for the shrinkage value k. Therefore substituting the right hand side of (44) in (43) the sequence
becomes
hence we have the sequence
(45)
which is in terms of the shrinkage estimator with the shrinking effect value k such that
. Analysing this sequence further shows that it satisfies the smoothness regularity conditions for the MLE, therefore we can proceed.
Let
be the true value in the shrinkage sense which is obtained when we scale the true value
with the shrinkage factor k. Then the sequence (45) becomes
(46)
implying that
(47)
for all
. This means that
is still within the neighbourhood of
since
. Therefore, using the second order Taylor’s theorem we have
(48)
for
. Since
for the maximum likelihood estimator
, then also
implying that
(49)
for all
. Assuming that the log-likelihood function is differentiable, from (48) and (49) we have
(50)
and simplifying (50) becomes
implying
(51)
Rearranging (51) we have
(52)
for
where
where
is the variance of the jth element of the covariance matrix
of the distribution
and thus
is the standard normal distribution for the jth element of
. Now dividing the left and right hand side of (52) by
we obtain
(53)
where
is the distribution of the jth element of
and
. Using sequence (45), Equation (53) becomes
(54)
for some
and
where
and
. The distribution
is a normal distribution for each jth element of the plug-in estimator
as described before in the analysis above.
Thus Equation (54) establishes the condition which implies local asymptotic normality (LAN) and differentiability in quadratic mean (DQM) for the estimator
which implies that the rate of convergence is of order
and rate
. This can also be achieved if we use the fact that the risk bound of the James-Stein shrinkage estimator is bounded by that of the MLE and the latter converges at the rate
. Hence the James-Stein shrinkage estimator is
-consistent.
3.5. Simulation Plots
In this section the behaviour of the mean squared error (MSE) of the maximum likelihood estimator
is compared to that of the James-Stein shrinkage estimator
as the sample size n increases. The statistical package R is used to simulate plots of the MSE for different sample size values of n using the R package library (MASS) to stimulate data which follow a multivariate normal distribution. The data is generated using a 3 × 3 correlation matrix (
) to get the covariance matrix
given by
which is symmetric and the variance in the major diagonal is 1 representing a standard normal variance. Thus we take
and this meets the James-Stein classical condition of
. Since
, we have
for
(55)
making the MLE
a 3 × 1 matrix which implies that the dimension for the shrinkage estimator is also 3 × 1. Now knowing that the maximum likelihood estimator
is unbiased and the James-Stein shrinkage estimator
is biased, the following expressions were used to calculate the mean squared error (MSE) of the two estimators. Using (55), we have
(56)
as the mean squared error of
for
since
. Similarly we have
(57)
for the James-Stein shrinkage estimator which becomes
(58)
where k is a shrinkage value which shrinks the maximum likelihood estimator
to a James-Stein shrinkage estimator
for
. Thus the mean squared error of the shrinkage estimator
in (58) is obtained by using (56). The shrinkage value k is evaluated using the expression
(59)
for
. The commands for all expressions and plots produced in R are provided in the appendix.
We present MSE plots obtained by simulating the mean squared error using the sample size values of n = 30, 2000 and 100,000. The MSE plots for both the James-Stein shrinkage estimator and maximum likelihood estimator for each sample size value considered are plotted on the same graph for easy comparison of the MSE trends. We begin by considering a small sample size value of n = 30 to compare the the way the MSE line plots change from one particular point to the other. Since we are interested in asymptotic behaviour, we increase the sample size value to 2000 and then 100,000 to analyse the MSE trends and the rate at which the line plots become smooth.
Figure 1. MSE plots for the MLE and JSSE for n = 30.
Figure 2. MSE for the JSSE and MLE for n = 2000.
Figure 3. MSE plots for JSSE and MLE for n = 10,000.
Collectively the scaled plots show that there is some reduction in the mean squared error of the James-Stein shrinkage estimator compared to that of the initial estimator (MLE). The trend in mean squared error for both the maximum likelihood estimator and the James-Stein shrinkage estimator shows that as the sample size value n increases, the MSE values converge to some value. The MSE plots suggest that the James-Stein shrinkage estimator converges to a lower MSE value 0.9 compared to the maximum likelihood estimator which converges to a MSE value 1.0. They also show that the James-Stein shrinkage estimator converges faster compared to the MLE though the difference is minimal.
4. Conclusions and Suggestions
In the study, we explored the asymptotic properties of the James-Stein shrinkage estimator
which is obtained by shrinking a MLE
. Asymptotic consistency and efficiency of the shrinkage estimator
were investigated. From the regularity conditions, the MLE is known to be unbiased, consistent and efficient as the sample size value
. Therefore, the study analysed these asymptotic properties by checking whether the new estimator (shrinkage) obtained after shrinking possesses them. Thus the study examined whether the shrinking process has an effect on these properties. The results show that the James-Stein shrinkage estimator
is asymptotically consistent and efficient. The study also showed that the shrinkage estimator (JSSE) is asymptotically biased, a property it possesses even for small values of the sample size value n. The bias is brought by the shrinking factor k given in Equation (59). We therefore see that the shrinking process introduces bias to the estimators obtained but it preserves asymptotic consistency and efficiency, and more importantly, it reduces the MSE.
Thus the James-Stein shrinkage estimator obtained by shrinking techniques proves to be useful though it is biased. This estimator is more effective than the maximum likelihood estimator as shown in this study and by Hansen [10] . The study also showed that the JSSE is stable for large values of the sample size value n, making it suitable in practical applications since large sample size values are normally considered for effective estimation. Since error is always there in estimation, we justify shrinking (minimising error) as a very important technique for yielding effective estimators.
The study has investigated the asymptotic behaviour of the James-Stein shrinkage estimator. Asymptotic properties analysed in the study include rate of convergence, consistency and efficiency. The results show that the James-Stein shrinkage estimator has a lower mean squared error compared to the maximum likelihood estimator though it is biased. The results further show that the JSSE is asymptotically consistent and efficient.
Acknowledgements
The authors gratefully acknowledge the Department of Mathematics and Statistics at the University of Zambia for supporting this work. Sincere thanks to the managing editor Alline Xiao for a rare attitude of high quality.