Asymptotic Consistency of the James-Stein Shrinkage Estimator

Abstract

The study explores the asymptotic consistency of the James-Stein shrinkage estimator obtained by shrinking a maximum likelihood estimator. We use Hansen’s approach to show that the James-Stein shrinkage estimator converges asymptotically to some multivariate normal distribution with shrinkage effect values. We establish that the rate of convergence is of order  and rate , hence the James-Stein shrinkage estimator is -consistent. Then visualise its consistency by studying the asymptotic behaviour using simulating plots in R for the mean squared error of the maximum likelihood estimator and the shrinkage estimator. The latter graphically shows lower mean squared error as compared to that of the maximum likelihood estimator.

Share and Cite:

Mungo, A. and Nawa, V. (2023) Asymptotic Consistency of the James-Stein Shrinkage Estimator. Open Journal of Statistics, 13, 872-892. doi: 10.4236/ojs.2023.136044.

1. Introduction

A shrinkage estimator is an estimator that, either explicitly or implicitly, incorporates the effects of shrinkage. In loose terms this means that a naive or raw estimate is improved by combining it with other information. The term relates to the notion that the improved estimate is made closer to the “true value” by the supplied information than the raw estimate. Shrinkage estimation is a technique used in inferential statistics to reduce the mean squared error (MSE) of a given estimator. The idea of shrinking an estimator came in 1956 when Stein [1] established that we can reduce the MSE of an estimator if we give up a little on bias. This means that given an estimator we can shrink it to obtain another estimator with lower MSE and the efficiency of the new estimator is desirable in the way it estimates the “true” parameter value. This works well when the number of parameters is more than two ( p 3 ) called the “James-Stein classical condition”. When we shrink a maximum likelihood estimator (MLE) under the “James-Stein condition”, we obtain a new shrinkage estimator which is closer to the assumed true value compared to the MLE. The magnitude of the improvement depends on the distance between the “true” parameter value and the parametric restriction which yields a shrinkage target denoted by θ ˜ n o . With all these modifications and restrictions to achieve this estimator, we ask ourselves if this desirable shrinkage estimator is asymptotically consistent and efficient.

Literature on shrinkage estimators is a lot but we just mention a few of the most relevant contributions to our study. James and Stein [2] used shrinking techniques to come up with an estimator called the James-Stein shrinkage estimator (JSSE) which has lower squared risk loss compared to the MLE. Baranchik [3] showed that the positive part James-Stein shrinkage estimator has a lower risk than an ordinary JSSE. Berger [4] gives a discussion on selecting a minimax estimator of a multivariate normal mean by considering different types of James-Stein type estimators. Stein [5] used shrinking techniques to estimate the mean for a multivariate normal distribution. Carter and Ullah [6] constructed the sampling distribution and F-ratios for a James-Stein shrinkage estimator obtained by shrinking an ordinary least square (OLS) in regression models. George [7] proposed a new minimax multiple shrinkage estimator that allows multiple specifications for selection of a set of targets to shrink a given estimator. Geyer [8] looked at the asymptotics of constrained M-estimators which also fall in the class of shrinkage estimators. Then Hansen [9] constructed a generalised James-Stein shrinkage estimator obtained by shrinking an MLE, and Hansen [10] derived its asymptotic distribution and showed that we can shrink towards a sub-parameter space.

2. Preliminaries

The theory of shrinkage techniques plays an important role in developing efficient statistical estimators which play a key role in statistical decision theory. Therefore, a clear understanding of the asymptotic behaviour of the James-Stein shrinkage estimator β ^ n * provides knowledge on the stability and efficiency of the estimator when the sample size value n grows without bound.

This paper will investigate the asymptotic consistency and efficiency of the James-Stein shrinkage estimator (JSSE) obtained by shrinking a maximum likelihood estimator (MLE) when we have observed variables X ~ N p ( θ , Σ ) . Though the shrinkage estimator we are interested in is biased its study is important because there is a realisation that efficiency (lower risk) dominates all other properties in estimation. Efron [11] discusses how bias dominates unbiasedness in estimation.

We proceed by considering the asymptotic distributions of all the three estimators important to this study by considering results in Hansen 2016. Using the asymptotic distribution derived by Hansen [10] , we employ Taylor’s theorem and some limit theorems to show that β ^ o * p θ o the “true” parameter value as n . Then we evaluate the asymptotic distributional bias (ADB) for the estimators β ^ n , θ ˜ n o and β ^ o * and show that the variance of the latter achieves the Cramér-Rao bound (CRB) as n . The analysis is done along the sequences θ n as n . Simulation plots are produced in a statistical package R to compare the JSSE and MLE in terms of mean squared error (MSE), consistency and convergence.

The paper is organised as follows. Section 2.1 presents the parametric set up. Section 2.2 gives the form of the JSSE considered in the study while Section 2.3 discusses the asymptotic distributions of the estimators. In Section 3 we present the main results by first presenting a lemma on the convergence in probability of the shrinking factor as Section 3.1. We then show the consistency of the shrinkage estimator in Theorem 1. In Section 3.2 we evaluate the ADBs of the estimators in play. Then we show that the James-Stein shrinkage estimator is asymptotically efficient in Section 3.3 and also establish the rate of convergence in Section 3.4. In Section 3.5 we present MSE plots comparing the JSSE and MLE and in Section 4 we give a discussion and analysis of the whole study. Then conclude our study by stating the main results.

The following definitions are used to establish the asymptotic consistency and efficiency of the James-Stein shrinkage estimator β ^ n * .

Definition 1

An estimator T n = h ( X 1 , , X n ) is said to be consistent for θ n if it converges in probability to θ n . That is, if for all ε > 0

lim n P r ( | T n θ n | < ε ) = 1

or

lim n P r ( | T n θ n | > ε ) = 0

where n is the sample size value.

Definition 2

Let X 1 , , X n be independent and identically distributed (iid) according to a probability density f θ ( X ) satisfying suitable regularity conditions. Suppose that T n = h ( X 1 , , X n ) is asymptotically normal say that n ( T n θ n ) p N p ( 0, Σ n ) for a positive definite matrix Σ n where T n is estimating θ n . Then a sequence of estimators { T n } = { h ( X 1 , , X n ) } satisfying

l i m n [ n V a r ( T n ) ] = J n ( θ ) 1

for the fisher information J n ( θ ) is said to be asymptotically efficient.

We now consider a statistical model. We describe the set up of the parameter of interest and the shrinking strategy used in the study.

2.1. Parametric Structure

Consider an unbiased estimator θ ^ n for θ Ω such that θ ^ n ~ N p ( θ , Σ n ) is a p-multivariate normal, where elements of Ω are p-dimensional parameter vectors. Let θ ˜ n o (shrinkage target) be a restricted maximum likelihood estimator (RMLE) for θ Ω o a sub-parameter space partitioned from the whole parameter space Ω by a parametric restriction Ω o = { θ Ω : a ( θ ) = 0 } where a ( θ ) : p m . The sub-parameter space Ω o provides a simple model of interest to shrink to. If m = p then Ω o = { 0 } ( θ is the kernel of p ) a singleton zero vector and if m < p , we create a sub-model of particular interest.

Let A ( θ ) = θ a where A is a shrinkage matrix of dimension p × m . We introduce another matrix G which harmonises the dimension of the RMLE from m to p. Hence we have a mapping g ( θ ) : m p such that G ( θ ) = θ g . The

matrix G is an m × p matrix when we consider a sub-parameter space Ω o and G = I p is a p-dimensional identity matrix when we have the whole parameter space Ω. We note that the matrix G is used to increase the dimension of the RMLE since it will be m-dimensional. Therefore we have a plug-in restricted maximum likelihood estimator g ( θ ˜ n o ) = β ˜ n o . The matrix G is a matrix harmonising the dimension due to shrinkage and the actual dimension of the parameters of interest p. The plug-in unrestricted MLE θ ^ n in the shrinkage sense is denoted by β ^ n . With all parameters set, we present the generalised James-Stein shrinkage estimator β ^ n * in the next section.

2.2. Positive Part James-Stein Shrinkage Estimator

Let θ ^ n be the MLE for θ Ω and θ ˜ n o be a restricted maximum likelihood estimator for θ Ω o a sub-parameter space of the whole parameter space Ω such that the elements of the parameter space are in p as described before. Let g ( θ ˜ n o ) = β ^ n o be the plug-in estimator of the RMLE of p-dimension. Then the James-Stein shrinkage estimator β ^ n * obtained by shrinking the MLE towards the target θ ˜ n o is given by

β ^ n * = β ^ n ( p 2 n ( β ^ n β ˜ n o ) Τ Σ 1 ( β ^ n β ˜ n o ) ) + ( β ^ n β ˜ n o ) (1)

where ( a ) + is positive trimming function and p 3 . The shrinkage estimator in (1) can be expressed as a weighted average by letting

D n = n ( β ^ n β ˜ n o ) Τ Σ 1 ( β ^ n β ˜ n o ) (2)

a distance statistic which is the same as the loss function n l ( β ^ n , β ˜ n o ) where Σ is a covariance matrix and

w ^ = ( 1 τ D n ) + (3)

for τ = p 2 . Then (1) becomes

β ^ n * = w ^ β ^ n + ( 1 w ^ ) β ˜ n o (4)

which is a weighted average of β ^ n and β ˜ n o . The James-Stein shrinkage estimator presented above has lower risk compared to the MLE as shown by Hansen [10] and James and Stein [2] . To check whether the shrinkage estimator is asymptotically consistent we need its asymptotic distribution. We therefore present the asymptotic distributions for the MLE, RMLE and JSSE in the next section.

2.3. Asymptotic Distribution

We assume that the maximum likelihood estimator satisfies the assumptions for regularity conditions given in Hansen [10] and Newey [12] . With these assumptions in mind, the asymptotic distributions of β ˜ n o and β ^ n * are analysed along

the sequences θ n = θ o + n 1 2 h where θ o is the assumed true parameter value and h p is a constant providing a neighbourhood for the true parameter value θ o . From the normality of the MLE we have

n ( θ ^ n θ n ) d Z ~ N p ( 0 , Σ ) (5)

as n . Using (5), Hansen [10] obtained the asymptotic distribution of the restricted maximum likelihood estimator as

n ( θ ˜ n o θ n ) p Z Σ A ( A Τ Σ A ) 1 A Τ ( Z + h ) (6)

which has some shrinkage value effect k = Σ A ( A Τ Σ A ) 1 A Τ . As a consequence of the convergence in (5) and (6) we have

n ( β ^ n β ˜ n o ) d G Τ Σ A ( A Τ Σ A ) 1 A Τ ( Z + h ) (7)

which is an asymptotic distribution the MLE converges to when it is estimating the RMLE where β n = g ( θ n ) . The distance statistic D n in Equation (2) converges to some distribution given by a non central chi-squared distribution as described by Hansen [10]

D n = n l ( β ^ n , β ˜ n o ) d ( Z + h ) Τ B ( Z + h ) = ξ ~ χ p 2 ( h Τ B h ) (8)

where matrix B = A ( A Τ Σ A ) 1 A Τ Σ G Σ 1 G Τ Σ A ( A Τ Σ A ) 1 A Τ . Using (2) Hansen [10] showed that

w ^ d w ( Z ) = ( 1 p 2 ξ ) + (9)

which is a positive asymptotic distribution of an inverse of a chi-squared distribution with constant τ = p 2 for p 3 . Therefore using (9) as n , Hansen [10] showed that

n ( β ^ n * β n ) d w ( Z ) G Τ Z + ( 1 w ( Z ) ) ( G Τ Z G Τ Σ A ( A Τ Σ A ) 1 A Τ ( Z + h ) ) (10)

which is normally distributed with some shrinkage value effect. With the asymptotic distribution of the shrinkage estimator in place we now present the main results.

3. Main Results

In this section we present our main results. We show that the James-Stein shrinkage estimator β ^ n * is asymptotically consistent. Secondly, we evaluate the asymptotic distribution bias of the three estimators in play. We then show that the shrinkage estimator is asymptotically efficient by showing that its variance achieves the Cramér-Rao bound. Further, we explore the convergence rate of the James-Stein shrinkage estimator and present the simulation plots for the MSE done in R.

3.1. Consistency of the James-Stein Shrinkage Estimator

We present Lemma 1 which shows the convergence in probability of the weight (shrinkage factor) w ^ . The result is used when establishing the consistency of the James-Stein shrinkage estimator β ^ n * .

Lemma 1

From Equation (8) we have

w ^ d w ( Z ) = ( 1 τ ξ ) + (11)

where ξ = ( Z + h ) Τ B ( Z + h ) ~ χ p 2 ( h Τ B h ) is a non central Chi-squared distribution with non centrality parameter h Τ B h , τ = p 2 and p 3 . Along the sequences θ n , if h then

w ( Z ) p 1 (12)

and if h is fixed then

w ( Z ) p 0 otherwise w ( Z ) p r (13)

where r is a constant such that 0 < r < 1 and ( a ) + in (11) is a positive trimming function which keeps what is in the brackets greater than or equal to zero.

Proof.

We begin by considering the first case when h diverges to infinity. Suppose that h then

( Z + h ) as n . (14)

Therefore from (14) we have

( Z + h ) Τ B ( Z + h ) = ξ as h and n . (15)

Now considering that

w ^ d w ( Z ) = ( 1 τ ξ ) +

as n . Using (15) we have

w ( Z ) p 1 as n . (16)

Hence we have established (12).

Secondly suppose that h is fixed, then we have the sequence

θ n = θ o + n 1 2 h

which becomes θ n = θ o as n implying that θ ^ n θ o as n . Suppose

ξ = χ p 2 ( h Τ B h ) p D as n (17)

where D is a constant, h is fixed and B is not affected by an increase in n, then

w ( Z ) p ( 1 τ D ) +

where p 3 . If τ D = 1 as n , then

w ( Z ) p 0.

If τ D > 1 then ( 1 τ D ) will be negative and by definition of the positive streaming function we end up with zero. This will vary as p changes but still considering ξ ~ χ p 2 ( h Τ B h ) the probability of ξ depends on the degrees of freedom p and will vary according to the chi-squared distribution, implying that the ratio τ D = M 1 as n . Therefore we have

w ( Z ) p ( 1 τ D ) + as n ,

w ( Z ) p ( 1 M ) + as n

for a constant M > 1 . Proceeding in the same way we have

w ( Z ) p ( F ) + as n , for 1 M = F < 0 ,

w ( Z ) p 0 as n

by definition of ( x ) + . Thus

w ( Z ) p 0 as n . (18)

Otherwise, if the ratio τ D is such that 0 < τ D < 1 as n , we have

w ( Z ) p r

where r ( 0,1 ) .

Lemma 1 above establishes convergence of the weight w ^ which determines shrinkage. In this case we realise that the same weight determines the convergence in distribution and probability of the shrinkage estimator. From the regularity conditions we know that the MLE θ ^ n is asymptotically consistent and this consistency extends to the RMLE θ ¯ n o . With this fact in mind, we now present the main result which shows the consistency of the shrinkage estimator β ^ n * .

Theorem 1

Let θ Ω , where Ω is a parameter space with elements in p . Suppose we have a James-Stein shrinkage estimator β ^ n * which is obtained by shrinking the maximum likelihood estimator θ ^ n of θ Ω where the shrinkage target θ ˜ n o is the restricted maximum likelihood estimator of θ Ω o a partitioned sub-parameter space from Ω by the restriction described in Section 2.1. Then the JSSE is given by

β ^ n * = β ^ n ( p 2 n ( β ^ n β ˜ n o ) Τ Σ 1 ( β ^ n β ˜ n o ) ) + ( β ^ n β ˜ n o )

where β ^ n = g ( θ ^ n ) , β ˜ n o = g ( θ ˜ n o ) , p 3 and ( x ) + is a positive trimming function. If θ ^ n is consistent for θ n as n then the James-Stein shrinkage estimator β ^ n * is also consistent for θ n as n , where the sequence θ n is as defined in Section 2.3.

Proof.

Let β n = θ n . To show that β ^ n * is consistent for θ n as n we consider the value of h which determines the neighbourhood of the sequence θ n , when it diverges to infinity and when it is just fixed. Suppose that h diverges to infinity, to evaluate

n ( β ^ n * β n ) d w ( Z ) G Τ Z + ( 1 w ( Z ) ) G Τ ( Z Σ A ( A Τ Σ A ) 1 A Τ ( Z + h ) ) (19)

as n , we first consider w ( Z ) from (3). By Lemma 1

w ( Z ) p 1 as n . (20)

Hence from (20) and substituting w ( Z ) by 1 in (19) we have

n ( β ^ n * β n ) d G Τ Z ~ G Τ N p ( 0 , Σ )

which gives

n ( β ^ n * β n ) d N p ( 0 , Σ β ) (21)

as n where Σ β = G Τ Σ G . Thus we have

β ^ n * p β n

if

lim n P ( | β ^ n * β n | > ε ) = 0

for any ε > 0 . Hence β ^ n * is consistent for β n = θ n .

Secondly, suppose h is fixed within an imaginable value, then the sequence

θ n = θ o + n 1 2 h

becomes θ n = θ o as n . From this equality we have θ ^ n = θ ^ o and two conditions arise. The first one is that the sequence θ n will be within the restricted parameter space Ω o with θ o Ω o . From the restriction { θ Ω : a ( θ ) = 0 } of Ω o this means that the shrinkage target is exactly at the true value and our consideration will be just on one parameter space. Therefore, we have θ ^ n = θ ˜ n o , but from (6)

n ( θ ˜ n o θ n ) d ζ = Z Σ A ( A Τ Σ A ) 1 A Τ ( Z + h )

which will be the same as the asymptotic distribution of θ ^ n since we only consider the sub-parameter space Ω o , and the shrinkage value Σ A ( A Τ Σ A ) 1 A Τ ( Z + h ) affects it. Thus

n ( θ ^ n θ n ) d ζ = Z Σ A ( A Τ Σ A ) 1 A Τ ( Z + h ) as n (22)

because we are estimating θ Ω o and from Section 2.1 there will be no difference between the MLE and RMLE. Due to this equality of the two maximum likelihood estimators, from (4)

β ^ n * = w ^ β ^ n + ( 1 w ^ ) β ˜ n o

and substituting (22) in (19) we have

n ( β ^ n * β n ) d w ( Z ) G Τ ( Z Σ A ( A Τ Σ A ) 1 A Τ ( Z + h ) ) + ( 1 w ( Z ) ) G Τ ( Z Σ A ( A Τ Σ A ) 1 A Τ ( Z + h ) )

as n , which becomes

n ( β ^ n * β n ) d [ w ( Z ) G Τ ( Z Σ A ( A Τ Σ A ) 1 A Τ ( Z + h ) ) ] [ w ( Z ) G Τ ( Z Σ A ( A Τ Σ A ) 1 A Τ ( Z + h ) ) ] + G Τ ( Z Σ A ( A Τ Σ A ) 1 A Τ ( Z + h ) )

as n , and then simplifies to

n ( β ^ n * β n ) d G Τ ( Z Σ A ( A Τ Σ A ) 1 A Τ ( Z + h ) ) (23)

as n , which is the same as the asymptotic distribution of β ˜ n o = g ( θ ˜ n o ) . Therefore using the consistency of the RMLE and (23), the consistency of the James-Stein shrinkage estimator β ^ n * follows from the consistency of θ ˜ n o .

Lastly, we consider the case when we have two well defined parameter spaces, Ω o and Ω Ω o . Then we have θ ^ n θ ˜ n o . Analysing (19) further, we consider the shrinkage effect value Σ A ( A Τ Σ A ) 1 A Τ which is not affected by the sample size value n but it is a value affected by an increase or decrease in the number of parameters p. Since

Z ~ N p ( 0 , Σ )

then

Z + h ~ N p ( h , Σ ) (24)

by linearity property of the normal distribution. Also implying that

η ( Z + h ) ~ N p ( η h , η Τ Σ η ) (25)

for some matrix η of dimension p × p . From (13) of Lemma 1 we have

w ( Z ) p 0 if τ D n 1 (26)

as n . Therefore (19) becomes

n ( β ^ n * β n ) d w ( Z ) G Τ Z + ( 1 w ( Z ) ) G Τ ( Z η ( Z + h ) )

for some shrinkage value effect matrix η = Σ A ( A Τ Σ A ) 1 A Τ . Evaluating this asymptotic distribution as n we have

w ( Z ) G Τ Z + ( 1 w ( Z ) ) G Τ ( Z η ( Z + h ) ) d G Τ ( Z η ( Z + h ) )

as n since w ( Z ) p 0 . Thus

n ( β ^ n * β n ) d G Τ ( Z Σ A ( A Τ Σ A ) 1 A Τ ( Z + h ) ) as n .

Hence the consistency of β ^ n * follows from the consistency of β ˜ n o = g ( θ ˜ n o ) which is consistent since θ ˜ n o is consistent. Similarly if w ( Z ) r ( 0,1 ) , the consistency of the James-Stein shrinkage estimator β ^ n * follows from the consistency of the restricted maximum likelihood estimator and also the fact that θ ^ n is consistent for θ n . Thus the shrinkage estimator β ^ n * is asymptotically consistent for θ n .

In Theorem 1, we first consider the case when the sequence θ n has a neighbourhood which is not restricted by letting h diverge to infinity. When this is the case, the entire parameter space becomes of interest and for h we obtain ξ p and w ^ p 1 as n . Hence there is no difference on how the parameters in Ω o and Ω are asymptotically distributed. As a result the asymptotic distribution of the James-Stein shrinkage estimator is the same as that of the initial maximum likelihood estimator under this condition. Therefore the consistency of the James-Stein shrinkage estimator follows from the consistency of the maximum likelihood estimator.

In the second case we take h as a fixed imaginable value. In this case the two parameter spaces are well defined and distinctive in terms of where the parameters of interest are located. When n then θ n = θ o since n 1 2 h 0 . Thus

when we are within the restricted sub-parameter space Ω o the maximum likelihood estimator and the restricted likelihood estimator are asymptotically distributed the same. The consequence of having the two maximum likelihood estimators (MLE and RMLE) distributed the same results in the James-Stein shrinkage estimator having the same asymptotic distribution as the MLE and RMLE. Furthermore, we have n ( β ^ n β ˜ n o ) p 0 as n . Stone [13] obtained similar results though under invariant estimators. In this case the two maximum likelihood estimators do not have to be necessarily invariant. Therefore the James-Stein shrinkage estimator β ^ n * is asymptotically consistent for θ n .

In the next section we investigate the asymptotic distribution bias of β ^ n * . The results in this section are used in showing the asymptotic efficiency of the shrinkage estimator β ^ n * .

3.2. Asymptotic Distributional Bias

We study the asymptotic distributional bias (ADB) for the three estimators by analysing the asymptotic bias values. The ADB of an estimator T n is given by

A D B ( T n ) = lim n E [ n ( T n θ n ) ] (27)

where the estimator T n is estimating θ n . We present the asymptotic distributional bias for θ ^ n , θ ˜ n o and β ^ n in the theorem below.

Theorem 2

Suppose that regularity assumptions for the MLE and RMLE hold. Then under { P n } a sequence of parameter dimension with the sample size value n and p 3 , the ADBs of the estimators θ ^ n , θ ˜ n o and β ^ n * are respectively

1. A D B ( θ ^ n ) = 0

2. A D B ( θ ˜ n o ) = Σ A ( A Τ Σ A ) 1 A Τ h

3. A D B ( β ^ n * ) = ϑ G Τ Σ A ( A Τ Σ A ) 1 A Τ h

where ϑ = E θ [ p 2 ξ ] .

Proof.

1.

A D B ( θ ^ n ) = lim n E θ ( n ( θ ^ n θ n ) ) = lim n 0 = 0 A D B ( θ ^ n ) = 0 (28)

since n ( θ ^ n θ n ) d Z ~ N p ( 0 , Σ ) as n .

2.

A D B ( q ˜ n o ) = lim n E θ ( n ( θ ˜ n o θ n ) ) = lim n ( Σ A ( A Τ Σ A ) 1 A Τ h ) = Σ A ( A Τ Σ A ) 1 A Τ h A D B ( q ˜ n o ) = Σ A ( A Τ Σ A ) 1 A Τ h (29)

from Equation (6).

3. A D B ( β ^ n * ) = lim n E θ ( n ( β ^ n * β n ) ) . From Equation (19) of Theorem 1 we have

n ( β ^ n * β n ) d w ( Z ) G Τ Z + ( 1 w ( Z ) ) G Τ ( Z Σ A ( A Τ Σ A ) 1 A Τ ( Z + h ) )

where

w ( Z ) = ( 1 p 2 ξ ) + .

Therefore,

E θ [ w ( Z ) ] = [ 1 ( p 2 ξ ) ] = 1 E θ [ p 2 ξ ] = 1 ϑ

where ϑ = E θ [ p 2 ξ ] , p 3 and E θ ( Z ) = 0 as n since Z ~ N p ( 0 , Σ ) .

Then

A D B ( β ^ n * ) = lim n [ ( 1 ϑ ) 0 + ( 1 1 + ϑ ) ( G Τ ( Σ A ( A Τ Σ A ) 1 A Τ h ) ) ] = lim n [ ϑ G Τ Σ A ( A Τ Σ A ) 1 A Τ h ] = ϑ G Τ Σ A ( A Τ Σ A ) 1 A Τ h A D B ( β ^ n * ) = ϑ G Τ Σ A ( A Τ Σ A ) 1 A Τ h (30)

where ϑ = E θ [ p 2 ξ ] for p 3 .

Remark 1

When the fixed constant h = 0 , the asymptotic distributional bias values of the three estimators are zero. Therefore h 0 .

From Equation (28) of Theorem 2, the maximum likelihood estimator is asymptotically unbiased. Equations (29) and (30) show that the restricted maximum likelihood estimator and the James-Stein shrinkage estimator are both asymptotically biased. This means that both shrinking and partitioning of a parameter space brings bias to estimators.

Using the bias of the shrinkage estimator obtained above, we now analyse whether the James-Stein estimator β ^ n * is asymptotically efficient.

3.3. Asymptotic Efficient

To check whether the shrinkage estimator β ^ n * is asymptotically efficient, we use the Cramér-Rao bound for biased estimators. In the theorem below we show that the variance of the JSSE achieves this bound as n . We use concepts of the study by Hogdes and Lehman [14] .

Theorem 3

Let β ^ n * be a James-Stein shrinkage estimator obtained by shrinking a maximum likelihood estimator θ ^ n where the two estimators are as defined in Section 2.2. Given the asymptotic bias b θ ( β ^ n * ) of the JSSE β ^ n * , the Cramér-Rao bound for β ^ n j * is given by

C R B = [ 1 + b θ j ( β ^ n * ) ] 2 J j j ( θ ) for j = 1,2, , p (31)

where J ( θ ) is the fisher information and b θ j is the derivative of the jth element of the bias vector. Then

C R B Σ j j ( β ^ n * ) = 1 as n ,

and thus the James-Stein shrinkage estimator β ^ n * is asymptotically efficient for all j = 1,2, , p .

Proof. We analyse asymptotic efficiency by evaluating the Cramér-Rao bound as n . Consider the bias of the estimator β ^ n * from part 3 of Theorem 2

b θ ( β ^ n * ) = ϑ G Τ Σ A ( A Τ Σ A ) 1 A Τ h (32)

where ϑ = E [ p 2 ξ ] for p 3 . The expectation E [ p 2 ξ ] of the fraction p 2 ξ which follows a distribution determined by the distribution ξ ~ χ p 2 ( h B h ) has a value (constant) free of the parameter θ . Therefore we regard it as a constant. Let α = ϑ then (32) becomes

b θ ( β ^ n * ) = α G Τ Σ A ( A Τ Σ A ) 1 A Τ h (33)

and b θ ( β ^ n * ) will be

b θ ( β ^ n * ) = θ b θ ( b ^ n * ) = α θ ( G Τ Σ A ( A Τ Σ A ) 1 A Τ h ) (34)

a matrix of dimension p. Using the definition of the CRB and then combining (32) and (34) we obtain

[ 1 + b θ j ( β ^ n * ) ] 2 J j j ( θ ) = [ 1 + α θ j ( G Τ Σ A ( A Τ Σ A ) 1 A Τ h ) ] 2 J j j ( θ ) (35)

for j = 1,2, , p where θ j is the partial derivative of the jth element, Σ = Σ ( θ ) = ( i = 1 n 2 θ θ log f θ ( X i ) ) 1 , A = A ( θ o ) = θ a ( θ ) and

J = J ( θ ) = E [ i = 1 n 2 θ θ log f θ ( X i ) ] .

We begin our analysis of the bound by considering the terms involved. Thus

θ A = 2 θ θ a ( θ )

remains the same as n . We have

Σ ( θ ) = ( i = 1 n 2 θ θ log f θ ( X i ) ) 1 Σ p (36)

as n where the elements of Σ p are zeros apart from the diagonal elements Σ j j ( θ ) which are ones for j = 1,2, , p since the observations are iid and follow a p-multivariate standard normal distribution. Thus from (36) we have

θ j Σ j j ( θ ) 0 and θ Σ 0 (37)

as n for j = 1,2, , p . This implies that

θ j G Τ Σ A ( A Τ Σ A ) 1 A Τ h 0 and θ G Τ Σ A ( A Τ Σ A ) 1 A Τ h 0 (38)

for j = 1,2, , p as n . Then from (38) we have

b θ j ( β ^ n * ) 0 and b θ ( β ^ n * ) 0 (39)

for j = 1,2, , p as n . Therefore from (38) and (39), then using (35) we have

[ 1 + b θ j ( β ^ n * ) ] 2 J j j ( θ ) = [ 1 + α θ j ( G Τ Σ A ( A Τ Σ A ) 1 A Τ h ) ] 2 J j j ( θ ) = [ 1 + 0 ] 2 J j j ( θ ) = J j j ( θ ) 1 (40)

as n for j = 1,2, , p . Since for all j = 1,2, , p we have Σ j j ( θ ) = J j j ( θ ) 1 then Σ = J 1 as n . Hence from (40) we have the variance for the James-Stein shrinkage estimator β ^ n j * , Σ j j ( β ^ n * ) converges to the CRB as n for all j = 1,2, , p . This means that

C R B Σ j j ( β ^ n * ) Σ j j ( β ^ n * ) Σ j j ( β ^ n * ) = 1 as n

for all j = 1,2, , p . Thus the James-Stein shrinkage estimator β ^ n * is asymptotically efficient.

Theorem 3 above shows that the James-Stein shrinkage estimator obtained from shrinking the MLE achieves the CRB asymptotically. This means that the shrinkage estimator is asymptotically efficient and hence stable when we have large sample size values. Since the initial estimator (MLE) is known to be asymptotically efficient then we see that the shrinking process has no effect on the asymptotic efficiency of an estimator we are shrinking.

3.4. Rate of Convergence

We now investigate the rate of convergence of the shrinkage estimator β ^ n * (JSSE) by using concepts applied on the MLE discussed in Hoeffding [15] . To proceed we consider the shrinkage estimator of the form in (1) using plug-in maximum likelihood estimators β ^ n and β ˜ n o . Let β ^ n * be a James-Stein shrinkage estimator obtained when we shrink the MLE β ^ n = g ( θ ^ n ) defined earlier before for p 3 . We proceed to find the rate of convergence of this estimator by using its relationship with the MLE. Since the shrinkage target value may have no effect on the convergence rate, for easier transformation of our sequence θ n we set θ ˜ n o = 0 implying β ˜ n o = 0 . Thus we have

β ^ n * = β ^ n ( p 2 n ( β ^ n Τ V 1 β ^ n ) ) + β ^ n

which becomes

β ^ n * = ( 1 p 2 β ^ n Τ V 1 β ^ n ) + β ^ n

when we factor out β ^ n and drop out the n in the denominator to have a form with a lower MSE according to the James-Stein shrinkage strategy. Let

k = ( 1 p 2 β ^ n Τ V 1 β ^ n ) + , (41)

then

β ^ n * = k β ^ n . (42)

Now consider the sequence

β ^ n j = β o j + O p ( 1 n ) (43)

for j = 1,2, , p where β o j is the “true” jth parameter value. From the equality in (42) we have

β ^ n = 1 k β ^ n * (44)

for the shrinkage value k. Therefore substituting the right hand side of (44) in (43) the sequence β ^ n j becomes

1 k β ^ n j * = β o j + O p ( 1 n ) ,

hence we have the sequence

β ^ n j * = k β o j + O p ( 1 n ) k (45)

which is in terms of the shrinkage estimator with the shrinking effect value k such that 0 < k 1 . Analysing this sequence further shows that it satisfies the smoothness regularity conditions for the MLE, therefore we can proceed.

Let β o j * = k β o j be the true value in the shrinkage sense which is obtained when we scale the true value β o j with the shrinkage factor k. Then the sequence (45) becomes

β ^ n j * = β o j * + O p ( 1 n ) k (46)

implying that

β ^ n j * β o j * = O p ( 1 n ) k (47)

for all j = 1,2, , p . This means that ( β ^ n j * β o j * ) is still within the neighbourhood of 1 n since 0 < k 1 . Therefore, using the second order Taylor’s theorem we have

ln i = 1 n f β ^ n j * ( x i ) i = 1 n f β o j * ( x i ) = ( β ^ n j * β o j * ) n I j j ( β o * ) Z j n 2 ( β ^ n j * β o j * ) 2 I j j ( β o * ) + O p ( 1 ) (48)

for j = 1,2, , p . Since

β ln L ( β ^ n ) = 0

for the maximum likelihood estimator β ^ n , then also

β ln L ( β ^ n * ) = 0

implying that

β j ln L ( β ^ n j * ) = 0 (49)

for all j = 1,2, , p . Assuming that the log-likelihood function is differentiable, from (48) and (49) we have

β ^ n j * ln ( i = 1 n f β ^ n j * ( x i ) i = 1 n f β o j * ( x i ) ) = ( β ^ n j * β o j * ) 1 1 n I j j ( β o * ) Z j 2 n 2 ( β ^ n j * β o j * ) 2 1 I j j ( β o * ) + O p ( 1 ) (50)

and simplifying (50) becomes

β ^ n j * ln ( i = 1 n f β ^ n j * ( x i ) i = 1 n f β o j * ( x i ) ) = n I j j ( β o * ) Z j n ( β ^ n j * β o j * ) I j j ( β o * ) + O p ( 1 ) = 0

implying

n I j j ( β o * ) Z j n ( β ^ n j * β o j * ) I j j ( β o * ) + O p ( 1 ) = 0. (51)

Rearranging (51) we have

n I j j ( β o * ) ( β ^ n j * β o j * ) + O p ( 1 ) = n I j j ( β o * ) Z j (52)

for j = 1,2, , p where Z j ~ ( 0, V β j ) where V β j is the variance of the jth element of the covariance matrix V β of the distribution = G Τ N p ( 0 , V ) and thus Z j is the standard normal distribution for the jth element of β ^ n . Now dividing the left and right hand side of (52) by n I j j ( β o * ) we obtain

n I j j ( β o * ) ( β ^ n j * β o j * ) + O p ( 1 ) = Z j (53)

where Z j is the distribution of the jth element of β ^ n and Z ~ N p ( 0 , V β ) . Using sequence (45), Equation (53) becomes

k n I j j ( β o * ) ( β ^ n j β o j ) + O p ( 1 ) = Z j (54)

for some β o j * β o j and j = 1,2, , p where Z ~ N p ( 0 , V β ) and V β = G Τ V G . The distribution Z j is a normal distribution for each jth element of the plug-in estimator β ^ n as described before in the analysis above.

Thus Equation (54) establishes the condition which implies local asymptotic normality (LAN) and differentiability in quadratic mean (DQM) for the estimator β ^ n * which implies that the rate of convergence is of order 1 k n and rate

k n . This can also be achieved if we use the fact that the risk bound of the James-Stein shrinkage estimator is bounded by that of the MLE and the latter converges at the rate n . Hence the James-Stein shrinkage estimator is k n -consistent.

3.5. Simulation Plots

In this section the behaviour of the mean squared error (MSE) of the maximum likelihood estimator θ ^ n is compared to that of the James-Stein shrinkage estimator β ^ n * as the sample size n increases. The statistical package R is used to simulate plots of the MSE for different sample size values of n using the R package library (MASS) to stimulate data which follow a multivariate normal distribution. The data is generated using a 3 × 3 correlation matrix ( ρ ) to get the covariance matrix Σ given by

Σ = [ 1 0.3 0.1 0.3 1 0.2 0.1 0.2 1 ]

which is symmetric and the variance in the major diagonal is 1 representing a standard normal variance. Thus we take p = 3 and this meets the James-Stein classical condition of p 3 . Since X ~ N 3 ( θ , Σ ) , we have

θ ^ n = X ¯ n = 1 n i = 1 n X i n for p = 3 (55)

making the MLE θ ^ n a 3 × 1 matrix which implies that the dimension for the shrinkage estimator is also 3 × 1. Now knowing that the maximum likelihood estimator θ ^ n is unbiased and the James-Stein shrinkage estimator β ^ n * is biased, the following expressions were used to calculate the mean squared error (MSE) of the two estimators. Using (55), we have

M S E θ ( θ ^ n ) = Σ n ( θ ^ n ) = Σ n ( X ¯ n ) = V a r ( X ¯ n ) (56)

as the mean squared error of θ ^ n for p = 3 since b θ ( θ ^ n ) = 0 . Similarly we have

M S E θ ( β ^ n * ) = Σ n ( β ^ n * ) + [ b θ ( β ^ n * ) ] 2 = V a r ( β ^ n * ) + [ b θ ( β ^ n * ) ] 2 (57)

for the James-Stein shrinkage estimator which becomes

M S E θ ( β ^ n * ) = Σ n ( k θ ^ n ) + [ b θ ( k θ ^ n ) ] 2 = V a r ( k θ ^ n ) + [ b θ ( k θ ^ n ) ] 2 (58)

where k is a shrinkage value which shrinks the maximum likelihood estimator θ ^ n to a James-Stein shrinkage estimator β ^ n * for p = 3 . Thus the mean squared error of the shrinkage estimator β ^ n * in (58) is obtained by using (56). The shrinkage value k is evaluated using the expression

k = ( 1 1 X ¯ n Τ Σ n ( X ¯ n ) X ¯ n ) = 1 1 X ¯ n Τ V a r ( X ¯ n ) X ¯ n (59)

for p = 3 . The commands for all expressions and plots produced in R are provided in the appendix.

We present MSE plots obtained by simulating the mean squared error using the sample size values of n = 30, 2000 and 100,000. The MSE plots for both the James-Stein shrinkage estimator and maximum likelihood estimator for each sample size value considered are plotted on the same graph for easy comparison of the MSE trends. We begin by considering a small sample size value of n = 30 to compare the the way the MSE line plots change from one particular point to the other. Since we are interested in asymptotic behaviour, we increase the sample size value to 2000 and then 100,000 to analyse the MSE trends and the rate at which the line plots become smooth.

Figure 1. MSE plots for the MLE and JSSE for n = 30.

Figure 2. MSE for the JSSE and MLE for n = 2000.

Figure 3. MSE plots for JSSE and MLE for n = 10,000.

Collectively the scaled plots show that there is some reduction in the mean squared error of the James-Stein shrinkage estimator compared to that of the initial estimator (MLE). The trend in mean squared error for both the maximum likelihood estimator and the James-Stein shrinkage estimator shows that as the sample size value n increases, the MSE values converge to some value. The MSE plots suggest that the James-Stein shrinkage estimator converges to a lower MSE value 0.9 compared to the maximum likelihood estimator which converges to a MSE value 1.0. They also show that the James-Stein shrinkage estimator converges faster compared to the MLE though the difference is minimal.

4. Conclusions and Suggestions

In the study, we explored the asymptotic properties of the James-Stein shrinkage estimator β ^ n * which is obtained by shrinking a MLE θ ^ n . Asymptotic consistency and efficiency of the shrinkage estimator β ^ n * were investigated. From the regularity conditions, the MLE is known to be unbiased, consistent and efficient as the sample size value n . Therefore, the study analysed these asymptotic properties by checking whether the new estimator (shrinkage) obtained after shrinking possesses them. Thus the study examined whether the shrinking process has an effect on these properties. The results show that the James-Stein shrinkage estimator β ^ n * is asymptotically consistent and efficient. The study also showed that the shrinkage estimator (JSSE) is asymptotically biased, a property it possesses even for small values of the sample size value n. The bias is brought by the shrinking factor k given in Equation (59). We therefore see that the shrinking process introduces bias to the estimators obtained but it preserves asymptotic consistency and efficiency, and more importantly, it reduces the MSE.

Thus the James-Stein shrinkage estimator obtained by shrinking techniques proves to be useful though it is biased. This estimator is more effective than the maximum likelihood estimator as shown in this study and by Hansen [10] . The study also showed that the JSSE is stable for large values of the sample size value n, making it suitable in practical applications since large sample size values are normally considered for effective estimation. Since error is always there in estimation, we justify shrinking (minimising error) as a very important technique for yielding effective estimators.

The study has investigated the asymptotic behaviour of the James-Stein shrinkage estimator. Asymptotic properties analysed in the study include rate of convergence, consistency and efficiency. The results show that the James-Stein shrinkage estimator has a lower mean squared error compared to the maximum likelihood estimator though it is biased. The results further show that the JSSE is asymptotically consistent and efficient.

Acknowledgements

The authors gratefully acknowledge the Department of Mathematics and Statistics at the University of Zambia for supporting this work. Sincere thanks to the managing editor Alline Xiao for a rare attitude of high quality.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] Stein, C. (1956) Inadmissibility of the Usual Estimator for the Mean of a Multivariate Normal Distribution. In: Neyman, J., ed., Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, Volume I, Statistical Laboratory of the University of California, Berkeley, 197-206.
https://doi.org/10.1525/9780520313880-018
[2] James, W. and Stein, C. (1961) Estimation with Quadratic Loss. In: Neyman, J., ed., Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Statistical Laboratory of the University of California, Berkeley.
[3] Baranchik, A.J. (1964) Multiple Regression and Estimation of the Mean of a Multivariate Normal Distribution. Technical Report No. 51. Department of Statistics, Stanford University, Stanford, CA.
[4] Berger, J.O. (1976) Minimax Estimation of Multivariate Normal Mean with Arbitrary Quadratic Loss. Journal of Multivariate Analysis, 6, 256-264.
https://doi.org/10.1016/0047-259X(76)90035-X
[5] Stein, C. (1981) Estimation of the Mean of a Multivariate Normal Distribution. Annals of Statistics, 9, 1135-1151.
https://doi.org/10.1214/aos/1176345632
[6] Carter, R.L. and Ullah, A. (1984) The Sampling Distribution of Estimators and Their F-Ratios in Regression Model. Journal of Econometrics, 25, 109-122.
https://doi.org/10.1016/0304-4076(84)90040-X
[7] George, E. I. (1986) Minimax Multiple Shrinkage Estimation. Annals of Statistics, 14, 188-205.
https://doi.org/10.1214/aos/1176349849
[8] Geyer, C.J. (1994) On the Asymptotic of Constrained M-Estimation. Annals of Statistics, 22, 1993-2010.
https://doi.org/10.1214/aos/1176325768
[9] Hansen, E.B. (2008) Generalized Shrinkage Estimators.
https://web-docs.stern.nyu.edu/old_web/emplibrary/shrink3.pdf
[10] Hansen, E.B. (2016) Efficient Shrinkage in Parametric Models. Journals of Econometrics, 190, 188-205.
https://doi.org/10.1016/j.jeconom.2015.09.003
[11] Efron, B. (1975) Biased versus Unbiased Estimation. In Advances in Mathematics, Academic Press, New York.
https://doi.org/10.1016/0001-8708(75)90114-0
[12] Newey, W.K. and Mcfadden, D.L. (1994) Large Sample Estimation and Hypothesis Testing. University Press, Holland, 2113-2245.
https://doi.org/10.1016/S1573-4412(05)80005-4
[13] Stone, C.J. (1974) Asymptotic Properties of Estimators of a Location Parameter. The Annals of Statistics, 6, 1127-1137.
https://doi.org/10.1214/aos/1176342869
[14] Hogdes, J.L. and Lehman, E.L. (1975) Some Applications of the Cramér-Rao Inequality. In Proceedings of Second Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, Berkeley, CA, 13-22.
[15] Hoeffding, W. (1948) A Class of Statistics with Asymptotically Normal Distribution. Annals of Mathematical Statistics, 19, 293-325.
https://doi.org/10.1214/aoms/1177730196

Copyright © 2023 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.