Comparison of the Sampling Efficiency in Spatial Autoregressive Model

A random walk Metropolis-Hastings algorithm has been widely used in sampling the parameter of spatial interaction in spatial autoregressive model from a Bayesian point of view. In addition, as an alternative approach, the griddy Gibbs sampler is proposed by [1] and utilized by [2]. This paper proposes an acceptance-rejection Metropolis-Hastings algorithm as a third approach, and compares these three algorithms through Monte Carlo experiments. The experimental results show that the griddy Gibbs sampler is the most efficient algorithm among the algorithms whether the number of observations is small or not in terms of the computation time and the inefficiency factors. Moreover, it seems to work well when the size of grid is 100.


Introduction
Spatial models have been widely used in various research fields such as physical, environmental, biological science and so on.Recently, a lot of researches are also emerging in econometrics (e.g., [3] [4] and so on), and [5] gave an excellent survey from the viewpoint of econometrics.When we focus on the estimation methods, properties of several estimation methods are studied.For example, the efficient maximum likelihood (ML) method was proposed by [6], and [7] first formally proved that the quasi maximum likelihood estimator had the usual asymptotic properties, including n -consistency, asymptotic normality, and asymptotic efficiency.A class of moment estimators was examined by [8] and [9].The Bayesian approach was first considered by [10] and [11] proposed a Markov chain Monte Carlo (hereafter MCMC) method to estimate the parameters of the model.We have to mention that in economic analysis typically the sample size is small, for instance, areal data such as state-level data is widely used.The maximum likelihood methods depend on their asymptotic properties while the Bayesian method does not, because the latter evaluates the posterior distributions of the parameters conditioned on the data.Therefore, it is reasonable to examine the properties of Bayesian estimators (see [12]).
Although there are a lot of works using spatial models in a Bayesian framework, previous literature has rarely examined sampling methods for the parameter of spatial correlation.[13] proposed a random walk Metropolis-Hastings (hereafter RMH) algorithm.This method is widely used (e.g., [11] [12] [14] and so on).On the other hand, [2] applied a griddy Gibbs sampler (hereafter GGS) proposed by [1] and showed the GGS got an advantage over the RMH method from a simulated data and estimated the regional electricity demand in Japan.However, [2] has examined only one case.In this paper, we compare the properties of the GGS in the case that the number of observation is small (or large) through the Monte Carlo experiments.Desirable properties for sampling methods in the Bayesian inference are efficiency and well mixing, which yield fast convergence.In addition to these properties, computational requirements and model flexibility are important for applied econometrics.Therefore, the purpose of this paper is to investigate the properties of some sampling algorithms given several parameters of a model.
In this paper, we examine the efficiency of the existing Markov chain Monte Carlo methods for the spatial autoregressive (hereafter SAR) model which is the simplest and most commonly used model in the spatial models.Moreover, we propose an acceptance-rejection Metropolis-Hastings (hereafter ARMH) algorithm as an alternative MH algorithm, which is proposed by [15] because it is well known that the RMH is inefficient.This algorithm is widely used for the acceleration of MCMC convergence, for example, in the time series models (see [16]- [18] and so on).The advantage of this method is that the computational requirement is very small since it is irrelevant to the shape of the full conditional density.Therefore, we apply the algorithm to the SAR model.
We illustrate the properties of these algorithms using simulated data set given the three number of observations and the seven values of spatial correlation.From the results, we find that the GGS is the most efficient method whether the number of observations is small or not in terms of both the computation time and the inefficiency factors.Furthermore, we show that it is efficient when the number of grid in the GGS sampler is one hundred.These results give a benchmark of sampling the spatial correlation parameter of the models.
The rest of this paper is organized as follows.Section 2 summarizes the SAR model.Section 3 discusses the computational strategies of the MCMC methods, and reviews three sampling methods for spatial correlation parameter.Section 4 gives the Monte Carlo experiments using simulated data set and discusses the results.Finally, we summarize the results and provide concluding remarks.

Spatial Autoregressive (SAR) Model
Spatial autoregressive model explains the spatial spillover using a weight matrix (see [19]).There are numerous approaches to construct the weight matrix, which plays an important role in the model.For example, those are a first order contiguity matrix, inverse distance one and so on.Among the approaches, [20] recommended the first order contiguity dummies, because they showed that the first order contiguity weight matrix identifies the true model more frequently than the other matrices through the Monte Carlo simulations.Thus, we also utilize the first order contiguity dummies as the weight matrix.
Let C be an n n × matrix of contiguity dummies, with 1 ij c = if areas i and j are adjacent and 0 ij c = otherwise (with 0 ii c = ).We standardized the weight matrix as follows , where ij w denotes the spatial weight on the j -th unit with respect to the i -th unit.
Note that we have Next, let i y and i x be a dependent variable and a 1 k × vector of covariates on the i th unit for 1, , i n =  , respectively.Then, the SAR model conditioned on the parameters ρ , β , 2 σ is written as follows: ( ) where ρ and 2 σ indicates the spatial correlation, and the variance of the disturbance term, respectively.As is shown in [21], we know that 1 min 1 λ − = − amd 1 max 1 λ − = , where min λ and max λ denote the minimum and maximum eigenvalue of W , since we standardize the weight matrix like W . Thus, we restrict ρ to ( ) Then the likelihood function of the model ( 1) is given as follows: , , , , 2π exp 2 where ( )

Joint Posterior Distribution
Since we adopt the Bayesian approach, we complete the model by specifying the prior distribution over the parameters.We use the following independent prior distribution: , , Given a prior density ( ) and the likelihood function given in (2), the joint posterior distribution can be expressed as Finally, we assume the following prior distributions: , IG a b denotes an inverse gamma distribution with scale and shape parameters a and b .Since the joint posterior distribution is given by (3), we can now adopt the MCMC method.The Markov chain sampling scheme can be constructed from the full conditional distributions of ρ , β and 2 σ .

Sampling ρ
From (3), the full conditional distribution of ρ is written as ( ) As it is difficult to sample from the standard distribution, we examine three approaches for sampling ρ .First, we introduce the GGS, which is applied by [2].Second, we overview the RMH algorithm, which is extended by [13].Finally, we propose an ARMH algorithm.These sampling methods are summarized in the following.

Griddy Gibbs Sampler
The GGS was proposed by [1].This sampling algorithm approximates a cumulative distribution function of the full conditional distribution by each kernel function over a grid of points and uses a numerical integration method, and is sampling method from the full conditional distribution by using the inverse transform method.Let the grid be as follows Thus, we select the grid i a * with probabilities, ( ) ∑ Finally, we sample ρ from the uniform ( ) . [22] stated that the choice of the grid of points has to be made carefully and constitute the main difficulty in applying GGS.In this paper, we select the equal interval among as in [1].Then, our numerical experiments examines to choice the size of grid for estimating the spatial correlation.

Random Walk Metropolis-Hastings Algorithm
The RMH method is a simple algorithm because it needs the previous value and a random walk process such as , where old φ is the parameter of the previous sampling, and τ denotes the tuning para- meter, respectively.Therefore, the following Metropolis step is used:  where s is the tuning parameter.In the numerical example below, we select the tuning parameter such that the acceptance rate lies between 0.4 and 0.6 (see [13]).Next, we evaluate the acceptance probability 1,1 − because the constraint is part of the target density.Thus, if the proposed value of ρ is not within the interval, the conditional posterior is zero, and the proposal value is rejected with probability one (see [23]).It is well known that the method is not efficient because the convergence is slow for using the previous sampled parameter.

Acceptance-Rejection Metropolis-Hastings Algorithm
An acceptance-rejection Metropolis-Hastings (ARMH) algorithm method was proposed by [15].This algorithm samples the parameter using the AR and MH steps.Suppose that there is a candidate function ( ) g ρ such that it is possible to sample directly from ( ) g ρ by some known method.Then, the AR step proceeds as follows.
We sampling the parameter from the candidate function ( ) g ρ , and accepts the candidate draw with probability ( ) ( ) ρ .This step is iterated until the candidate draw is accepted.
Next, suppose the candidate new ρ is produced from above AR step.The MH part proceeds as follows.We calculate the acceptance probability, q as following: In this step, the candidate draw is accepted with probability q and rejected with probability 1 q − .If the draw is rejected, the previously sampled value is sampled again.If q is small, the probability of sampling the same value consecutively is high, causing high autocorrelation across sample values (see [24]).Hence, we should also make q as close to one as possible.The advantage of this method is that it is free to functional form which differs from the GGS and RMH.In this paper, in order to construct the candidate function, we utilize the result of [7], which showed the consistency and asymptotic normality of quasi-ML estimators of model parameters, to the candidate density.Then, we construct the candidate density ( )

Sampling Other Parameters
The full conditional distributions of β and 2 σ are .These parameters are easily sampled from the Gibbs sampler (see [25]).

Measures of Efficiency for Comparison
In this section, we examine the properties of three MCMC methods by simulated data sets.Desirable properties for sampling methods in MCMC are efficiency and well mixing, which yield fast convergence.[17] compared from the view point of acceptance rate in the AR and MH step.[26] [27] evaluated the efficiency of sampling methods, comparing the inefficiency factor and time of MCMC simulation.Following previous literatures, we also compare inefficiency factor and computational time.
The inefficiency factor is defined as where s r is the sample autocorrelation at lag s calcu- lated from the sampled values.It is used to measure how well the chain mixes and is the ratio of the numerical variance of the sample posterior mean to the variance of the sample mean from the hypothetical uncorrelated draws (see [28]).

Data Generating Process and Estimation Procedures
We now explain the simulated data for an experiment.First, we give the weight matrix as an exogenous variable.We construct the spatial weight matrix W as follows: 1) generate ij c for i j > from Bernoulli distribution with a probability of success 0.3, 2) set ij ji c c = for i j = / and 0 ij c = for i j = , and 3) compute for all i , j .Next, for the independent variables ( ) , we take the standard normal variates and set the X , which are 4 n × covariate matrices.Given W , X , 0.9, 0.6, 0.3, 0, 0.3, 0.6, 0.9 ρ = − − − , and 50, 100, 200 n = , the true data generating process is as follows: where the i  is normally and independently distributed with ( ) The parameter is set to be ( ) ( ) , respectively.The parameters of ρ for simulated data reflect the values obtained in [12].All the results in this paper were calculated using the Ox version 5.1 (see [29]).
The prior distributions are as follows: ,100 , 1.0 2, 0.01 2 , and 1,1 We perform the MCMC procedure by generating 35,000 draws in a single sample path and discard the first 20,000 draws as the initial burn-in.For the GGS, we consider the number of grid, 50, 100, 300 m = for estimating the parameters.

Results of Comparison
Table 1 reports inefficiency factors by using three methods.Although there are some differences, the performances of the GGS are almost equivalent to those of the ARMH.In addition, these algorithms are more efficient than RMH.For example, from the table in 50 n = , the inefficiency factors calculated by the ARMH are smaller than the other methods.However, if spatial correlation is positive strong such as 0.9 ρ = , the value by the GGS ( ) has the smallest inefficiency factor.Next, we focus on the results in 100 n = . In this case, the GGS ( ) 50,100 m = perform the best for 0.6, 0.9 ρ = , respectively.In the case of 200 n = , the values of the GGS ( ) and the ARMH are similar in each parameter.We can also find such similarity in sample paths and autocorrelation functions.Table 1.Inefficiency factor of models.
Observation:  Table 2 shows CPU time on a Pentium Core2 Duo 2.4GHz including discarded and rejected draws.For the GGS, the computation time depends on the number of grid because the increase of grid number causes the cost of computation time.In all cases, the GGS ( ) , the GGS needs much shorter time than the RMH and ARMH methods.Summarizing the results of inefficiency factors and computational time, if the number of observation is not only small (like 50 n = ) but also large, then it is suitable to use the GGS.In addition, the choice of grid number affects to the computational time.In this numerical experiments, the results of selecting 100 m = seem to work well in terms of inefficiency factors and computational time.
Table shows the results with acceptance probabilities in both AR and MH parts in the ARMH.From the table, the acceptance probabilities in those part are exceed 89%.This result shows that our candidate function seems to work well, and the probabilities of sampling the same value consecutively are low.However, our ARMH algorithm does not improve the values of inefficiency factor.Thus, we think that the SAR model has the problem of identification.
Figure 2 and Table 4 depict the sample path and the correlation among the parameters in the case of 100 n = , 0.9 ρ = , 100 m = using the GGS.From 2 β to 4 β and 2 σ in the figure, the MCMC draws seem to be well mixing.In addition, correlations among these parameters are very small.On the other hand, strong correlation between 0 β and ρ can be found from the figure.Moreover, the correlation between 0 β and ρ is 0.995 − from the table.Therefore, we assume that the spatial correlation and constant term is weakly identified.

Concluding Remarks
This paper reviewed the MCMC estimation procedures for sampling the spatial correlation of SAR model, and proposed the ARMH algorithm as more efficient than the RMH in order to show the property of the GGS proposed by [2].To illustrate the differences between the estimates of three MCMC methods, we compared these algorithms by simulated data set.From the Monte Carlo experiments, we found that the GGS was the most efficient algorithm with respect to the mixing, efficiency and computational requirement of the MCMC.Moreover,    the results of selecting 100 m = seem to work well in terms of inefficiency factors and computational time.Therefore, the GGS is beneficial algorithm for estimating the spatial parameter as same as the result of [22].
Finally, we will state our remaining issues.In this paper, we found that the GGS was the most efficient algorithm in sampling the intensity of spatial interaction.On the other hand, we showed the problem of the SAR model such that the spatial correlation and constant term was weakly identified.Thus, we have to construct the model which is identified, or appropriate algorithm to sample the intensity of spatial interaction.Furthermore, we found that the number of grids is appropriate when 100 m = . In this paper, we could not derive the theoretical reason why 100 m = was appropriate number of grids, that was, we only showed the results of Monte Carlo experiments.However, it is important to know the properties of the existing sampling methods, though research on the convergence of the GGS algorithm has never been examined.We think that, in this respect, our experiment gives the benchmark in applied econometrics.
value of ρ is not truncated to the interval ( )

g
ρ as an approximation to the the conditional posterior density by omitting the determinant .Thus we sample new ρ from the distribution, and apply the ARMH algorithm.

Figure 1
shows the results of MCMC simulation in each method in the cases of 0shows that the marginal posterior densities (middle of the figure)
overwhelms the others.If we focus on the case of 50 n = , the computational time of the GGS ( ) 100 m = are as same as those of the RMH and ARMH methods.Futhermore, if 200 n = parameter is 0.9.The number of observation set to be 100.

Table 2 .
Time of convergence.

Table 3 .
Acceptance probability of the ARMH methods.

Table 4 .
Correlation of parameters.