A Simulation Study of Hierarchical Bayesian Fusion Spatial Small Area Model for Binary Outcome under Spatial Misalignment

Simulation (stochastic) methods are based on obtaining random samples s θ from the desired distribution ( ) p θ and estimating the expectation of any function ( ) h θ . Simulation methods can be used for high-dimensional distributions, and there are general algorithms which work for a wide variety of models. Markov chain Monte Carlo (MCMC) methods have been important in making Bayesian inference practical for generic hierarchical models in small area estimation. Small area estimation is a method for producing reliable estimates for small areas. Model based Bayesian small area estimation methods are becoming popular for their ability to combine information from several sources as well as taking account of spatial prediction of spatial data. In this study, detailed simulation algorithm is given and the performance of a non-trivial extension of hierarchical Bayesian model for binary data under spatial misalignment is assessed. Both areal level and unit level latent processes were considered in modeling. The process models generated from the predictors were used to construct the basis so as to alleviate the problem of collinearity between the true predictor variables and the spatial random process. The performance of the proposed model was assessed using MCMC simulation stu-dies. The performance was evaluated with respect to root mean square error (RMSE), Mean absolute error (MAE) and coverage probability of corresponding 95% CI of the estimate. The estimates from the proposed model perform better than the direct estimate.


Introduction
Simulation (stochastic) methods are based on obtaining random samples of the parameters of interest from the desired distribution and estimating the expectation of any function of the parameters. It can be used for high-dimensional distributions, and there are general algorithms which work for a wide variety of models; where necessary, more efficient computation can be obtained by combining these general ideas with tailored simulation methods, deterministic methods, and distributional approximations. Markov chain Monte Carlo (MCMC) methods have been important in making Bayesian inference practical for generic hierarchical models in small area estimation.
Small Area Estimation (SAE) is a strategy for improving estimation accuracy and delivering reliable parameter estimates for small areas, where a small area refers small sub-population in terms of sample size [1] [2] [3] [4]. To improve accuracy and reliability, a variety of estimators have been developed that combine survey data for the target small areas with data from outside the survey, frequently related to recent censuses and current administrative data.
In small area estimation, statistical models can be utilized in a model assisted approach or model based approach. Even when a model is misspecified, [5] that use working models, in which a model is specified but desirable design based attributes are preserved, are called model assisted estimators.
There are now a variety of model based approaches available, depending on the nature of the measurement (e.g., binary, count or continuous variable) [6] [7] and on the auxiliary data available for the individual units in the population versus those available only at the aggregate level (areal level or unit level) [8] [9].
Unit level estimations are expected to be more precise than areal level estimates on average [10]. Access to auxiliary data at the unit level, on the other hand, may be difficult. As a result, a fusion model that contains both unit and areal level data should be considered to accommodate this. The fusion small area model makes use of census auxiliary data at the area level, which is available in both observation and prediction regions, as well as individual survey variables. The fusion predictive model links individual survey variables with a hierarchical areal level model, allowing prediction to be carried out to other geographically defined small areas. A fusion small area model, which was successfully used by other authors [11], comprising both unit level survey and areal level auxiliary census data was adopted.
The Bayesian approach to modeling has the ability to combine information from several sources using melding or data fusion [12]. The evaluation of prediction accuracy is an important aspect of SAE [13] although its computation may be complicated. The Bayesian approach solves the problem automatically, resulting to realizations of the target quantities' from the posterior distribution [13].
A generalized linear model with or without spatial correlation structure was proposed by Ghosh et al. [14] using a hierarchical Bayesian framework. In addition, a hierarchical Bayesian model was proposed by Bakar et al. [11] for several categorical data under purely spatial setting. Besides, misalignment issues were addressed following this. Furthermore, difficulties with misalignment were resolved as a result of this. Under the hierarchical Bayesian framework, misaligned data models have recently been presented to solve the issue of making inference for domains other than those for which survey data is available [11] [15] [16]. Following this, Trevisani and Gelfand [17] proposed a model to handle spatial misalignment for count data. For a categorical data under spatial misalignment, Bakar et al. [11] had proposed a hierarchical Bayesian model to handle this issue.
Furthermore, Bakar and Jin [18] proposed for estimating a binary outcome variable under purely spatial setting for secondary geography where survey data were not collected. Primary and secondary geographies are spatially misaligned. However, the study by Bakar and Jin [18] considered a real level data.
A fusion model developed by Muchie et al. [19] for the binary data has some appealing features. However, it has to be evaluated using simulation studies. As the model has closed forms for all required full conditionals, Gibbs sampler will be used for generating samples.
As a result, the aim of this study was to assess the performance of the hierarchical Bayesian spatial fusion small area model for binary data under spatial misalignment.

Model
Here we introduce a spatial hierarchical statistical model for the binary data. Let and L is the number of predictors or covariates in the model.

( ) r
A ω , is spatially correlated process, modeled explicitly to take the statistical dependency due to spatial proximity. As understanding and defining the spatial is often important for constructing the full conditional distribution of ω .
The Moran's I basis function [20] is one of the popular ways to model areal spatial process ω , and consequently, is useful for addressing problems with large spatial data [21]. Hence, for dimension reduction, m eigenvectors of a Moran's I operator matrix are used, where m n < . Similarly, dimension reduction approach is used with a set of multiresolution spatial basis function [22] [23]. In multiresolution models [23] m knots are defined over the spatial domain by using reduced rank approximation [24] [25].
Hence, the spatial basis function for the spatial process ω comprises the Moran's I and multiresolution which is written as: is a combination of eigenvectors  is an independent and identically distributed error process to capture the remaining random component so as to take the uncertainities arising from The process models generated from the predictors ( µ ) were used to construct the basis so as to alleviate the well-known problem of collinearity [26] between the true predictor variables and the spatial random process. Accordingly, the Moran's I operator matrix is defined as ; φ is specified to be 1.5 times the minimum distance between basis function centers of the same resolution [22]. Another issue that is related to ( ) A R is the selection of the number and position of knots. A regular grid, or its modification, is often a popular choice for the position of knots [28] [29]. An orthogonal spatial basis ( ) constructed on the basis of the distances between the centroid of the areas and knot points. We model η , which is a vector of spatial random effect, by using a Gaussian distribution with zero mean and a lower dimensional covariance matrix Σ , . Here Σ comprises a smoothing parameter φ and where Q is n n × matrix with non-diagonal entries 1 ij q = if regions i and j are neighbors and 0 otherwise, and diagonal entries ii q are equal to the number of region i's neighbors [26]. It can be computed using the neighborhood weight , and 1 is a vector of 1 s. Since Q is singular [30], we use a spectral decomposition of Σ to obtain the full conditional distribution of the spatial process η . Thus, we can write where Ψ is an orthogonal matrix of order m m × and Λ is a diagonal matrix, see details in [21]. Following [20], in this research we assume that eigen vectors of Σ are known and its eigen values are known up to a multiplicative constant.

Predictive Distributions
The geographical areas The change of support for the mean process ( ) where, * A is the area of the areal unit * A . The integral in Equation (9) is however difficult to solve, and there are approximation methods available to address this difficulty [31]. However, the approximations are best suited for situations when

Gibbs Sampler Algorithm
The Gibbs sampler, introduced by [32], has been developed by [33]. The Gibbs sampler simulates from multidimensional posterior distributions by iterative sampling from the lower-dimensional full conditional posterior distributions.
The conditional distributions are required for implementation of the model through Markov Chain Monte Carlo (MCMC) simulation. Full conditional densities are derived by abstracting out from the full joint posterior distribution on-ly those elements including the parameter of interest and treating other components as constants [34]. Accordingly, we obtain the full conditional distribution for the variance parameters and fixed effect coefficients of the model as given below where the detailed derivations are given in a manuscript done by the same authors [19]: The Gibbs sampler updates the chain one component at a time, instead of updating the entire vector. In our case, starting from an initial values ( ) The densities on the right hand sides of the above are called the complete conditional distributions or full conditional distributions. We now provide step by step instructions for running this MCMC algorithm in the list below.   Hence, in step 4, we aggregate the results by averaging the individual predicted probabilities in each * k A for each draw (t). Note that, in the modelling hierarchy, probabilities are aggregated only at the prediction stage.

Measures of Model Evaluation
The R iterations for the direct and SAE model estimates were evaluated for bias, relative bias, mean squared error (MSE), root mean squared error (RMSE), percent coverage for a 95% nominal rate, and 95% confidence interval width for direct estimates and 95% credible interval width for the SAE models. Each of these metrics were calculated in two ways: 1) for an individual small area, and 2) average across all individual small areas.
Accordingly the first, Bias was calculated as the average difference between the posterior mean of the population total, ˆi The third, Mean squared error (MSE) was calculated as the average squared deviations of the posterior mean of the population total from the true population, and RMSE was the square root of these deviations: where; Ti Y refers true parameter value for the th i small area, ˆi θ is the posterior mean of the population total, and R refers the number of iterations. A trade-off between bias and precision (mean squared error, ...) is present when applying a SAE model.
The last one, Percent coverage was defined as the average number of times the 95% SAE credible interval for ij θ , or the direct estimate 95% confidence interval, included truth for each small area. The proportion of times the 95% CI of the estimated summary mean contains the true value.
( ) 1 19 5% CI of ; It is desirable to have coverage near 95%. Coverage higher than 95% indicates an inefficient estimator whereas coverage less than 95% indicates an inaccurate estimator.

Simulation Design
Proper and weakly informative prior distributions for the model parameters were used. For the fixed effect coefficients ( β ) we used conjugate normal distribution as the prior distribution with mean zero and a variance of 10 to represent weakly informative and proper prior specification. For the variance and smoothing parameters we consider a conjugate inverse-gamma prior distributions, weakly Open Journal of Statistics informative hyper-parameters for the inverse-gamma prior.
As noted in [35], inverse-gamma prior with very small hyper-parameters does not have any proper limiting posterior distribution. Hence, we consider a proper and weakly informative invers gamma prior with reasonable values of the shape hyper-parameter 2 a = ; and the rate hyper-parameters 1 b = [36] [37], which is popularly suggested in literature [38] [39] [40]. Therefore, for the variance parameters we let the shape and rate hyper-parameters of the inverse gamma prior distribution be 2 and 1, as the inverse gamma (2, 1) distribution is often treated as proper and weakly informative [36] [37].
To get small area estimation at r A , a total of 400 square grid cells were considered where To clarify the insight of the data layout over the domain of the study area we have clearly stated it as follows. Firstly, we took a random sample of 10% [20] [41] of the total population and identify those points inside the square grid cells. Hence, the sample data did not contain all the square grid cells. Figure 1 & Figure 2 showed the squared grid cells and a representation of the samples in a grid cell together with the population points that were generated in that cell. Figure 1 showed that the sampled grid cells (shaded in grey colour) (  ) were superimposed,  which is based on randomly selected sample points; two different resolutions of knot points ( * , ◊ ) used for the model proposed were also superimposed. Secondly, the sample values of the response variable, the aggregated values of auxiliary variables and other spatial components were used to find estimates of hyper-parameters under the SA1.
Thirdly, based on data in r A , we generate small area estimates for the second geographic areas * k A (i.e. SA2) having spatial misalignment with the SA1, r A .
The data are generated in a manner similar to that of r A small area estimation.
However, we now aim predictions for ( ) * i k p A , individual level probability at the 100 grid cells, using different knot point resolutions.

MCMC Output and Diagnosis
Regarding the software implementation, all simulations were performed using the freely available software "R version 4.0.3 (2020-10-10)". It took 9 hours to generate an MCMC sample of size 50,000 using a machine with Intel(R) Core(TM) i7-4600U CPU @ 2.10 GHz 2.70 GHz and RAM of 8.00 GB (7.90 GB usable). A gibbs sample of size 50,000 was generated and the first 10,000 samples were discarded as the burn-in period. Further, thinning was applied, every 10 th sample selected discarding all others, so as to reduce auto-correlation, to reduce burden on computer memory/storage and to facilitate great deal of post-processing. Hence, 4000 posterior samples were collected after burn-in and thinning. Accordingly, an MCMC sample of size 4000 were used for the entire analysis here under. Accordingly, the trace plot, density plot and auto-correlation plots based on the final 4000 MCMC samples were given below. Figure 3 showed the trace plots of the MCMC samples for selected model parameters of the model. Each plot illustrated that the chain was sampling from their corresponding stationary distributions. The MCMC chain of variance parameters and the entries of φ were mixed well and converged quickly, while the MCMC samples of regression coefficients, i β , appear to have relatively slow to converge.
The quick convergence of the predictive sample guarantees that the subsequent analyses (such as the averaged predictive means of Similarly, the kernel density plot (Figure 4) also showed there is no serious problem of convergence as the plots are uni-modal. Fast decay of auto-correlation ( Figure 5) as the lags increase also supplements the above statement regarding convergence.
Henceforth, probabilities for the second geographical areas were computed from posterior predictive distribution. All the following model comparisons were based on these probabilities.

Model Comparison
The estimates from the model, hierarchical Bayesian fusion spatial small area model, were compared with direct estimate. The RMSE, MAE and 95% probability coverage were considered for comparison purpose. Accordingly, the RMSE    and MAE of the estimates from proposed model is less than that of the direct estimate showing the proposed model performs better (Table 1). Additionally, the 95% probability coverage for the estimate from the proposed model was greater than that of the direct estimate. This also strengthens the improvement of the proposed model.

Concluding Remarks
In this study, we have assessed the performance of the recently developed spatial hierarchical Bayesian fusion spatial small area model for binary data under spatial misalignment. The process models generated from the predictors were used to construct the basis so as to alleviate the well-known problem of collinearity between the true predictor variables and the spatial random process. The performance of the proposed model was assessed using simulation studies. The model proposed has the ability to address the large dimensional spatial problem by reducing dimension through the use of basis function knots. The proposed model performs better than the direct estimate. The developed model can be applied in many broad areas including but not limited to health sciences, public health, agriculture, and economics.