Dynamic Spatio-Temporal Modeling in Disease Mapping

Abstract

Spatio-temporal models are valuable tools for disease mapping and understanding the geographical distribution of diseases and temporal dynamics. Spatio-temporal models have been proven empirically to be very complex and this complexity has led many to oversimply and model the spatial and temporal dependencies independently. Unlike common practice, this study formulated a new spatio-temporal model in a Bayesian hierarchical framework that accounts for spatial and temporal dependencies jointly. The spatial and temporal dependencies were dynamically modelled via the matern exponential covariance function. The temporal aspect was captured by the parameters of the exponential with a first-order autoregressive structure. Inferences about the parameters were obtained via Markov Chain Monte Carlo (MCMC) techniques and the spatio-temporal maps were obtained by mapping stable posterior means from the specific location and time from the best model that includes the significant risk factors. The model formulated was fitted to both simulation data and Kenya meningitis incidence data from 2013 to 2019 along with two covariates; Gross County Product (GCP) and average rainfall. The study found that both average rainfall and GCP had a significant positive association with meningitis occurrence. Also, regarding geographical distribution, the spatio-temporal maps showed that meningitis is not evenly distributed across the country as some counties reported a high number of cases compared with other counties.

Share and Cite:

Otieno, F. , Tamba, C. , Okenye, J. and Orawo, L. (2023) Dynamic Spatio-Temporal Modeling in Disease Mapping. Open Journal of Statistics, 13, 893-916. doi: 10.4236/ojs.2023.136045.

1. Introduction

Disease mapping has a long history and is becoming an indispensable tool for epidemiologists in analyzing disease incidence in a geographical location and over time. Often, disease counts are characterized by space and time structures. In most cases, spatial and temporal dependencies are modelled independently, implying that the spatial variation and temporal variation are modelled separately without considering their joint interactions [1] . Spatial and temporal dependencies are a phenomenon in which observations made at one location and time are impacted by the observations made at different locations in past and present times. The spatio-temporal models used in disease mapping have been proven to be very complex due to the inherent interactions between space and time. This complexity of the spatio-temporal models often leads to oversimplification of the model by separating space and time as independent dimensions for computational efficiency [2] . Even though separable spatio-temporal models on disease mapping are common, it has their drawbacks. One of the limitations is that it often leads to spatial and temporal autocorrelation problems, bias estimates, and in certain situations, it fails to allow for space-time interactions between locations [3] [4] . To fully understand the dynamic evolution of a disease in both space and time, it is essential to consider their interdependencies jointly to allow for space-time interactions between locations in the modeling process.

Over the years, studies on spatio-temporal disease mapping have incorporated the Besag York Mollie (BYM) model [5] [6] [7] . The BYM model is a type of generalized linear mixed model (GLMM) that has two spatial random effects; the spatially structured and spatially unstructured random effects and a temporal component. Details about BYM model are discussed in [8] . Most work in spatio-temporal disease mapping with BYM model takes into account the spatial and temporal dependencies independently. Most commonly, the Conditional Autoregressive (CAR) models are used to model the spatial dependency, and first-order autoregressive (AR1) structures are used to account for temporal dependency. CAR models are usually assigned as priors to one of the spatial random effects. However, it has been pointed out that CAR priors are improper, and they have the undesirable property of inducing spatial autocorrelation through adjacency structure, and informative hyperpriors are needed for posterior inference [9] [10] . Other approaches based on splines have also been developed to account for temporal dependency in disease mapping. For example, [11] , proposed the use of Basis spline (B-spline) model and [12] used Penalized spline (P-spline) model to account for temporal dependency. One limitation about splines is that they are heavily influenced by hyperprior specification especially when the data is limited [13] . This study has extended the BYM model by incorporating the spatial and temporal dependencies jointly.

In this paper, the main focus is to develop a Bayesian Spatio-temporal model that accounts for the spatial and temporal dependencies jointly. This will be facilitated by using the non-separable, stationary matern covariance function, specifically, the matern exponential covariance function for space-time process to model the spatial and temporal dependencies jointly. This will incorporate the space-time interaction between the geographical locations and also eliminate the spatial and temporal autocorrelation problems. The model will then be applied to the simulated data set and Kenya meningitis incidence data from 2013 to 2019. The study also seeks to identify significant risk factors associated with meningitis and the construction of spatio-temporal occurrence maps for different years both on the simulation data and real data will be done.

The rest of the paper is structured as follows: section 2 presents materials and methods utilized in the paper. In section 3, results are discussed both for simulation study and real data application, and finally, in section 4, conclusions to this study are presented.

2. Materials and Methods

2.1. Spatio-Temporal Model

We develop a spatio-temporal model based on the Poisson regression model. For each county we assume y i t ~ P o i s s o n ( η i t ) , where, y i t denotes meningitis cases (or counts) in the ith county at time t and η i t is the mean for Poisson distribution or the estimated disease count for meningitis cases. We then, formulate a spatio-temporal model as follows;

η i t = α + x i t β _ + w i t + ε i t (1)

where α is a common intercept for the entire study region, β _ is a vector of coefficients, x i t is a vector of covariates for all the 47 counties, w i t is the spatial-temporal dependencies, and ε i t is the white-noise error term for space-time interaction.

2.2. Prior Specifications

The common intercept α , was assigned a conjugate prior, which was assumed to have a normal distribution, N ( μ α , σ α 2 ) . The vector of coefficients β _ was assumed to have a Multivariate Normal distribution, M V N ( μ _ β , Σ β ) . The white-noise error term for space-time interaction ε i t was assumed to have a normal distribution, ε i t ~ N ( 0 , τ ε 2 ) . The precision parameter τ ε 2 was assigned a conjugate prior of an inverse gamma distribution τ ε 2 ~ I n v - G a m m a ( a ε , b ε ) . The spatial and temporal dependencies w i t , followed a zero-mean Multivariate Normal distribution with the matern covariance function, specifically, the matern exponential covariance function. That is; w i t ~ M V N ( 0 , Σ ( t ) ) , where Σ ( t ) is the matern exponential covariance function that is dependent on time t. This can be expressed as

Σ ( t ) = [ exp ( | S _ i S _ j | θ t ) , i , j = 1 , , n ; t = 1 , , T ]

where | S _ i S _ j | is the Euclidean distance between two locations defined by their longitude and latitude coordinates. θ t = ϕ θ θ t 1 + ξ t , ξ t ~ N ( 0 , σ ξ 2 ) i.e. θ t is A R ( 1 ) process. We assumed ϕ θ to follow a Uniform distribution

ϕ θ ~ U n i f o r m ( 1 , 1 ) and σ ξ 2 was assigned a priori with a conjugate prior of inverse gamma distribution σ ξ 2 ~ I n v - G a m m a ( a ξ , b ξ ) . The set of parameters of interest was λ = ( α , β _ , ϕ θ , τ ε 2 , σ ξ 2 ) and the vector of hyperparameters was ψ = ( μ α , σ α 2 , μ β , Σ β , a ε , b ε , a ξ , b ξ ) . The choice of hyperparameters was selected in a manner that the prior specification does not swamp the data and hence has minimal effect on the overall result.

2.3. Bayesian Hierarchical Model

The Bayesian hierarchical model consists of the data model, process model, and parameter model.

2.3.1. The Data Model

The data model is the probability distribution of the observed cases given the parameters of the model. In this study, the observed cases ( y i t ) followed a Poisson distribution with mean η i t i.e., y i t ~ P o i s s o n ( η i t ) , i = 1 , , n , t = 1 , , T which was written conditionally as y i t | η i t , λ ~ P o i s s o n ( η i t ) and the likelihood function was given as:

P ( y 1 _ , , y T _ | η 1 _ , , η T _ , λ ) = P ( y { 11 } , , y { n , T } | η { 11 } , , η { n , T } , λ ) = i = 1 n t = 1 T e η i t η i t y i t y i t ! (2)

2.3.2. The Process Model

The process model was generated based on the spatio-temporal model specified in Equation (1) and further expressed in vector form as:

η t _ = X t β _ + w t _ + ε t _ (3)

where X t in Equation (3) is a design matrix at time t (non-stochastic) with n × ( p + 1 ) dimension and β _ = [ α β 1 β p ] has a ( p + 1 ) × 1 dimension. We assumed ε t _ to be independent, thus; η t _ X t β _ w t _ = ε t _ ~ N ( 0 _ , τ ε 2 I ) was expressed conditionally as; η t _ | X t β _ , θ t , τ ε 2 ~ N ( 0 _ , τ ε 2 I ) . Written as:

f ( η t _ | X t , β _ , θ t , τ ε 2 ) = t = 1 T | τ ε 2 I | 1 2 exp { 1 2 τ ε 2 t = 1 T ( η t _ x t β _ w t _ ) I ( η t _ x t β _ w t _ ) } (4)

The spatio-temporal dependency, w t _ was also expressed conditionally as:

f ( w t _ | θ t ) = | Σ ( t ) | 1 2 exp { 1 2 w t _ ( Σ ( t ) ) 1 w t _ } . (5)

The matern exponential covariance function, Σ ( t ) was given as:

Σ ( t ) = { σ i j t } = [ exp ( | S _ i S _ j | θ t ) , i , j = 1 , , n ; t = 1 , , T ] ,

where, we assumed θ t in the matern exponential covariance function to follow an AR1 process written as; θ t = ϕ θ θ t 1 + ξ t and, ξ t ~ N ( 0 , σ ξ 2 ) . We generated the conditional distribution of θ t from t = 1 , , t as:

f ( θ 1 , , θ t ) = f ( θ 1 ) f ( θ 2 | θ 1 ) f ( θ 3 | θ 2 ) f ( θ t | θ t 1 ) = 1 2 π ( σ ξ 2 1 ϕ θ 2 ) exp { 1 ϕ θ 2 2 σ ξ 2 ( θ 1 ) 2 } t = 2 T 1 2 π σ ξ 2 exp { 1 2 σ ξ 2 ( θ t ϕ θ θ t 1 ) 2 } (6)

Now, the process model was given as the product of Equation (4), Equation (5), and Equation (6) denoted as P ( η 1 _ , , η T _ , λ ) , written as:

P ( η 1 _ , , η T _ , λ ) { t = 1 T | τ ε 2 I | 1 2 exp { 1 2 τ ε 2 t = 1 T ( η t _ x t β _ w t _ ) I ( η t _ x t β _ w t _ ) } } × { | Σ ( t ) | 1 2 exp { 1 2 w t _ ( Σ ( t ) ) 1 w t _ } } × { 1 2 π ( σ ξ 2 1 ϕ θ 2 ) exp { 1 ϕ θ 2 2 σ ξ 2 ( θ 1 ) 2 } t = 2 T 1 2 π σ ξ 2 exp { 1 2 σ ξ 2 ( θ t ϕ θ θ t 1 ) 2 } } (7)

2.3.3. The Parameter Model

The parameter model was given as the product of the prior distributions of the set parameters λ , which was denoted as P ( λ ) . In this case, α was contained in β _ and β _ was assumed to follow a Multivariate Normal distribution, M V N ( μ _ β , Σ β ) . Thus, P ( λ ) is given as:

P ( λ ) = { ( 2 π ) p + 1 2 | Σ β | 1 2 exp { 1 2 ( β _ μ _ β ) ( Σ β ) 1 ( β _ μ _ β ) } } × { ( 1 τ ε 2 ) a ε + 1 exp ( b ε τ ε 2 ) } × { ( 1 σ ξ 2 ) a ξ + 1 exp ( b ξ σ ξ 2 ) } (8)

The joint posterior distribution was given as the proportional of the product of the data model, the process model, and the parameters model denoted as P ( η 1 _ , , η T _ , λ | y 1 _ , , y T _ ) . Thus, by applying Bayes’ rule, the joint posterior distribution was expressed as:

P ( η 1 _ , , η T _ , λ | y 1 _ , , y T _ ) P ( y 1 _ , , y T _ | η 1 _ , , η T _ , λ ) P ( η 1 _ , , η T _ | λ ) P ( λ ) (9)

where P(.) in Equation (9) is the probability density function. The right-hand side of Equation (9) entails, P ( y 1 _ , , y T _ | η 1 _ , , η T _ , λ ) the data model, P ( η 1 _ , , η T _ | λ ) the process model, and P ( λ ) the parameter model. The study assumed the hyperparameters are independent of the parameter model i.e., P ( λ ) = i dim ( λ ) P ( λ i ) .

2.4. Parameter Estimation

In the Bayesian framework, the unknown parameters of the model are treated as random variables with assigned prior distributions. While the Bayesian method has proven to be effective for parameter estimation, it often presents challenges due to the intractability of the posterior distribution, making direct sampling impossible especially when the conditional distributions of the parameters are unknown. The remedy to this problem is to employ the Markov Chain Monte Carlo (MCMC) method called Metropolis-Hastings within Gibbs sampling algorithm to sample from the unknown distribution [14] [15] . The Metropolis-Hastings within Gibbs sampling algorithm is a sampling technique that integrates the Metropolis-Hastings algorithm within the iterative process of a Gibbs sampler [14] . The full conditional for β _ , ϕ θ , τ ε 2 , σ ξ 2 , θ t , and η t were determined as follows:

P ( β _ | ϕ θ , τ ε 2 , σ ξ 2 , η t , θ t , y t _ ) P ( η 1 _ , , η T _ , λ | y 1 _ , , y T _ ) P ( β _ | ϕ θ , τ ε 2 , σ ξ 2 , η t , θ t , y t _ )

P ( ϕ θ | β _ , τ ε 2 , σ ξ 2 , η t , θ t , y t _ ) P ( η 1 _ , , η T _ , λ | y 1 _ , , y T _ ) P ( ϕ θ | β _ , τ ε 2 , σ ξ 2 , η t , θ t , y t _ )

P ( τ ε 2 | β _ , ϕ θ , σ ξ 2 , η t , θ t , y t _ ) P ( η 1 _ , , η T _ , λ | y 1 _ , , y T _ ) P ( τ ε 2 | β _ , ϕ θ , σ ξ 2 , η t , θ t , y t _ )

P ( σ ξ 2 | β _ , τ ε 2 , ϕ θ , η t , θ t , y t _ ) P ( η 1 _ , , η T _ , λ | y 1 _ , , y T _ ) P ( σ ξ 2 | β _ , τ ε 2 , ϕ θ , η t , θ t , y t _ )

P ( η t _ | β _ , σ ξ 2 , τ ε 2 , ϕ θ , θ t , y t _ ) P ( η 1 _ , , η T _ , λ | y 1 _ , , y T _ ) P ( η t _ | β _ , σ ξ 2 , τ ε 2 , ϕ θ , θ t , y t _ )

The full conditional distribution for θ t was determined in three phases using the formula below. The first phase was when time t = 1 , the second phase was when time t = 2 to time t = T 1 , and the last phase was when time t = T .

P ( θ t = 1 | β _ , σ ξ 2 , τ ε 2 , ϕ θ , η t _ , y t _ ) P ( η 1 _ , , η T _ , λ | y 1 _ , , y T _ ) P ( θ t = 1 | β _ , σ ξ 2 , τ ε 2 , ϕ θ , η t _ , y t _ )

P ( θ t = 2 to T 1 | β _ , σ ξ 2 , τ ε 2 , ϕ θ , η t _ , y t _ ) P ( η 1 _ , , η T _ , λ | y 1 _ , , y T _ ) P ( θ t = 2 to T 1 | β _ , σ ξ 2 , τ ε 2 , ϕ θ , η t _ , y t _ )

P ( θ t = T | β _ , σ ξ 2 , τ ε 2 , ϕ θ , η t _ , y t _ ) P ( η 1 _ , , η T _ , λ | y 1 _ , , y T _ ) P ( θ t = T | β _ , σ ξ 2 , τ ε 2 , ϕ θ , η t _ , y t _ )

The full conditional distribution for τ ε 2 and σ ξ 2 were found to be known and that of β _ , ϕ θ , η t _ , and θ t were unrecognizable, thus difficult to sample from them directly leading to the utilization of Metropolis-Hastings within Gibbs algorithm to sample from the conditional posterior distributions of the parameters in the spatio-temporal model formulated. The samples obtained were used to estimate the model parameters under the Squared Error Loss Function (SELF) and Mean Absolute Error Loss Function (MAELF). For each parameter, both the posterior mean and median as well as the Bayesian credible intervals were calculated. The convergence of the samples generated from the posterior marginals of the parameters was assessed using the trace plot as shown in the appendix.

2.5. Identification of Risk Factors and Model Selection

A significant interest in understanding the evolution of meningitis lies in identifying associated risk factors that influence its occurrence. This study utilized the stepwise regression approach, in particular, the backward selection method in building regression models. Backward selection is a criterion for building regression models starting with a full model (model with all covariates) to a null model (model with the intercept only). The study builds regression models permutationally based on the number of covariates.

The study later computed a scale deviance for all the reduced models to test the significance and contribution of each covariate in the model. Scale deviance denoted by D * was defined as:

D * = 2 ( log L R e d log L F u l l )

where L R e d is the likelihood of the reduced model and L F u l l is the likelihood of the full model.

The log-likelihood of each reduced model in the study was calculated based on the number of regression coefficients present in each model. The scale deviance, D * followed a chi-square distribution with k-p degrees of freedom, D * ~ χ k p 2 , where k is the total number of regression coefficients in the full model and p is the total number of regression coefficients in the reduced model. The scaled deviance, D*, was later used in performing the hypotheses tests on the sets of regression coefficients in the model. Statistically significant predictors were considered useful and cannot be dropped from the model, and thus, they were identified as risk factors that influence the occurrence of meningitis based on this study.

The model performance was examined via Deviance Information Criterion (DIC) which is due to [16] . The DIC was computed for each model and the model with the smallest DIC was assumed the best.

2.6. Disease Mapping

The spatio-temporal maps were obtained by mapping stable posterior means for the specific location and time from the best model that includes the significant risk factors. The posterior means were evaluated from the posterior samples obtained in the MCMC sampling. This enabled the examination of the spatio-temporal variability (geographic patterns and temporal dependence) and also helped in evaluating risk areas. All the computation was performed in R programming language.

3. Results

This section presents the results obtained for both the simulation study and real data application.

3.1. Simulation Study

The spatio-temporal model formulated was fitted to the simulated data to test the methodological development of the study. The simulation is based on yearly disease counts along with three covariates based on 47 counties in Kenya for 10 years. The years were assumed to start from 2011 to 2020. A real case study will be examined in section 3.2. The dataset used for simulation was generated using the following algorithm.

1) Setting the value of the true parameters ( α , β _ , ϕ θ , τ ε 2 , σ ξ 2 ) = ( 0.5 , 0.5 , 0.5 , 0.01 , 0.01 ) respectively (these parameter values were chosen arbitrarily).

2) Builds a data frame consisting of the C = 47 counties based on their respective longitude and latitude of the county headquarters (county center).

3) Create a Euclidean distance D.

4) Specify the time frame.

5) Create a function consisting of the 47 counties, time, and the true parameters of the Spatio-temporal model, set N to be the dimension of the counties.

6) For time t = 1 to T, do the following steps.

Step 6.1: Simulate the matern parameter θ t = ϕ θ θ t 1 + ξ t , ξ t ~ N ( 0 , σ ξ 2 )

Step 6.2: Create matern exponential covariance matrix, Σ ( t ) , using a Euclidean distance

Step 6.3: Simulate w t = M V N ( 0 , Σ ( t ) )

Step 6.4: For iteration s = 1 to C (county), do the following sub-steps

Step 6.4.a: Simulate the covariate vector

x 1 , s , t = r u n i f o r m ( a 1 = 1 , b 1 = 2 ) , x 2 , s , t = r u n i f o r m ( a 2 = 1 , b 2 = 5 ) and x 3 , s , t = r u n i f o r m ( a 3 = 0 , b 3 = 2 ) , and equate it to x s , t = ( x 1 , s , t , x 2 , s , t , x 3 , s , t )

Step 6.4.b: Simulate the white noise error term for space-time interaction (random error term) ε s , t ~ N ( 0 , τ ε 2 )

Step 6.4.c: Compute η s , t = α + x s , t β _ + w s , t + ε s , t

Step 6.4.d: Simulate Y s , t = P o i s s o n ( η s , t )

Due to the complexity of the model, the study simulated only one set of data which was used to sample the parameters of interest from the joint posterior distribution via the Metropolis-Hastings within Gibbs sampling technique. For each parameter, trace plots were constructed to assess the convergence of the samples generated from the posterior marginals of the parameters, see Appendix A. Based on the plots, we can observe that the trace plot of β _ , ϕ θ , τ ε 2 , σ ξ 2 , θ t and η t were first in converging while that for α (intercept) was slow in converging but it eventually converged. The poor mixing of chain for α could as a result of high variation of the parameter α . The true value, estimated posterior mean value, and 95% credible intervals for each parameter were also included in their respective trace plot. We can observe that the true values for β _ , ϕ θ , σ ξ 2 , θ t and η t all fall within their corresponding 95% credible set and they were also close to their estimated mean value apart from τ ε 2 which was outside its 95% credible set. We believe the reason why the true value for τ ε 2 happens to be outside its credible set could have been attributed to the characteristics of the sample set generated for this study, which, in turn, led the credible set to be narrower. Point estimate and Bayesian credible set for each parameter were also computed and the results are shown from Tables 1-4. Table 1 provides the summaries of the parameters α , ϕ θ , τ ε 2 , and σ ξ 2 respectively from the posterior sample of size 20,000 (i.e. the number of MCMC iterations). It can be seen that the true values of α , ϕ θ , and σ ξ 2 are all within their corresponding 95% credible interval set apart from τ ε 2 which is outside the credible set which could be a result of the sample generated in this study. The table further provides the posterior mean and the posterior median of the parameters stated via SELF and MAELF respectively. Table 2 provides the summaries of the regression coefficients β 1 , β 2 and β 3 from the posterior sample of size 20,000. It can be observed that the true values are all within the 95% credible intervals. It also provides the estimates using SELF which is the posterior mean and the MAELF which is the posterior median.

Table 3 shows the summaries of the matern parameter θ t that governs the interaction of the locations at different times. θ t depends on time t and it was simulated to take values from time, t = 1 to t = 10 . It can be observed from Table 3 that the true value for θ 3 to θ 10 are within their corresponding 95%

Table 1. Posterior mean, median, and 95% credible intervals for α , ϕ θ , τ ε 2 , and σ ξ 2 .

Table 2. Posterior mean, median, and 95% credible intervals for β _ .

credible set apart from θ 1 and θ 2 which falls out of the 95% credible set. Note that we were evolving the matern parameter θ t via the AR1 process, which means that the observations at the current time point are influenced by the recent observation. In this note, θ 1 lack prior information, which also affects θ 2 , thus causing the credible set to be wider. Despite the true value of θ 1 and θ 2 falling outside the credible set due to the aforementioned reasons, there is a positive indication that the matern exponential covariance function has captured the space-time interaction effect between locations and time.

Table 4 provides the summaries of posterior mean, true value, median, and 95% credible intervals for parameter η i , t (the mean for Poisson distribution) which depends on both space (county) and time, t for a few locations and times

Table 3. Posterior mean, median, and 95% credible intervals for θ t at different times.

Table 4. Posterior mean, median, and 95% credible intervals for η t _ at different counties and at different times (t).

that were randomly chosen. It can be observed that all the credible intervals for η t contained the true value.

For model selection and identification of significant risk factors for the simulation study, the study built 8 spatio-temporal models (one full model and seven reduced models) based on the three simulated covariates, x 1 i t , x 2 i t , and x 3 i t using the stepwise regression approach explained in section 2.5. The 8 models are shown below and the results for each of the model is presented in Table 5. The importance of using the stepwise regression was to build regression models by removing covariates step by step until the pre-set significance level is met for all the covariates. This was done to assess the significance and contribution of each covariate in the model.

Model 1; y ^ = α ^ + β ^ 1 x 1 i t + β ^ 2 x 2 i t + β ^ 3 x 3 i t

Model 2; y ^ = α ^ + β ^ 1 x 1 i t + β ^ 2 x 2 i t

Model 3; y ^ = α ^ + β ^ 1 x 1 i t + β ^ 3 x 3 i t

Model 4; y ^ = α ^ + β ^ 2 x 2 i t + β ^ 3 x 3 i t

Model 5; y ^ = α ^ + β ^ 1 x 1 i t

Model 6; y ^ = α ^ + β ^ 2 x 2 i t

Model 7; y ^ = α ^ + β ^ 3 x 3 i t

Model 8; y ^ = α ^

Model selection is the process of selecting the best model from a set of candidate models. In this work, the approach utilized in model selection was based primarily on DIC. The DIC was computed for all eight models as −2loglikelihood + 2pD, where pD is the effective number of the parameters present in that particular model. The DIC computed was later compared to all the eight models and the model with the smallest DIC was considered the best. It is clear from Table 5 that the full model (model 1) has the lowest DIC value (213.2963) suggesting that model 1 is the best in the simulation study. The subsequent analysis which was the construction of spatio-temporal maps was therefore generated based on the best model which in this case was the full model. To compare disease cases, two spatio-temporal maps were created. One displays the simulated disease

Table 5. Model comparison for the 8 Spatio-temporal models.

count (see Figure 1) and the other one presents the estimated disease count from the best model that includes the significant risk factors (see Figure 2).

Figure 1 shows the spatio-temporal maps for simulated disease counts from 2011 to 2020. The regions coloured red signify the high number of disease cases while the regions coloured purple indicate the low number of cases. The maps reveal an increasing trend in disease incidence over the 10 years. Notably, counties in the Rift Valley, Coastal, Nairobi, North Eastern, and Western regions of Kenya exhibit a significantly high risk of disease. The persistence of high incidence in these regions suggests the presence of underlying factors contributing to their elevated risk throughout the study period. Figure 2 displays the spatio-temporal

Figure 1. Spatio-temporal maps for simulated disease counts from 2011-2020.

Figure 2. Spatio-temporal maps for estimated disease counts from the best model using simulated data.

evolution of the estimated disease counts in Kenya from 2011-2020 using simulated data. The maps were obtained by mapping posterior means from the specific location and time from the best model as stated in section 2.6. There is a rising trend for estimated disease counts from the model over the years (2011-2020). Considering that the spread of the disease was linked to three simulated covariates that showed significant influence, we believe that these factors might have played a role in the observed increase in risk over time. Upon visual inspection, there is a close resemblance between the map for observed simulated counts and that of estimated counts.

3.2. Application to Kenya Meningitis Incidence Data

The model formulated in Equation (1) was also fitted to a real dataset, in particular, the Kenya meningitis incidence data from 2013 to 2019. We chose meningitis data as a practical application to our model. The dataset was obtained from the Kenya Health Services for all the 47 counties. Due to data availability constraints, we incorporated only two covariates in the model, Gross County Product (GCP) and averaged rainfall for all 47 counties in Kenya. The GCP and averaged rainfall data were extracted from the Kenya National Bureau of Statistics (KNBS) and weather and climate websites respectively. The two covariates were in different units and therefore standardization technique, in particular, the Z-transformation technique was applied to normalize them, thus making them directly comparable. Posterior estimates for each parameter were obtained using the MCMC approach and the results are shown in Tables 6-9. Table 6 shows the posterior summary statistics (mean, median, and 95% credible interval) of α , ϕ θ , τ ε 2 , and σ ξ 2 respectively. From the table, it can be observed that the posterior mean of the common intercept α is 25.9428 and it lies within a 95% credible interval (23.5370, 28.4434). This shows that the average number of meningitis occurrences in the region is about 25.9428. Moreover, the posterior mean for ϕ θ , τ ε 2 , and σ ξ 2 are all within their corresponding 95% confidence interval.

Table 7 shows the posterior summary statistics of the covariate coefficient β _ . It can be observed that there is a significant and positive relationship between average rainfall and meningitis occurrence. This is because the estimated posterior mean of average rainfall is positive (4.2376) and is within the 95% credible set. Similarly, a notable significant and positive relationship exists between GCP and meningitis occurrence as evidenced by the positive posterior mean (31.4808) falling within a 95% credible interval. This therefore indicates that average rainfall and GCP are factors associated with meningitis occurrence in Kenya. These significant risk factors will therefore help policymakers in the development (in particular where and when) and implementation of the intervention to be undertaken concerning the management of meningitis in Kenya. Average rainfall was used to account for climatic factors and since Kenya has two distinct seasons (dry and wet), it served as a proxy to determine if a region

Table 6. Posterior mean, median, and 95% credible intervals for α , ϕ θ , τ ε 2 , and σ ξ 2 .

Table 7. Posterior mean, median, and 95% credible intervals for covariate coefficient β _ .

Table 8. Posterior mean, median, and 95% credible intervals for θ t at different time.

Table 9. Posterior mean, median, and 95% credible intervals for η t _ at different counties and at different times (t).

experienced a dry or wet season at a particular time. Meningitis incidence is high during the dry season and less during the rainy season [17] . During the rainy season, there are fewer irritating conditions that prevent the outbreak from developing further, and during the dry season, there is a presence of bacteria which facilitates the spread of meningitis. GCP accounted for socioeconomic factors and was used as a measure of the county's economic level. According to [18] , regions with high poverty levels, high economic burdens, and low per capita income have the highest burden of meningitis. This is due to the adverse impact on quality of life caused by poverty and economic strain, including poor living conditions and difficulties accessing healthcare.

Table 8 shows the posterior summary statistics (mean, median, and 95% credible interval) of the matern parameter θ t . From the table, it is clear that all the posterior mean estimates of θ t that is from time 1 to 7 lies within their corresponding 95% credible interval, suggesting that the matern exponential covariance function has captured the interaction effect of locations at different time points. Table 9 displays the posterior summary statistics (mean, median, and 95% credible interval) of some of the estimates of η t which were randomly chosen. It can be noted that all the credible intervals for η t contained the estimated mean value.

Factors such as climatic, socioeconomic, and demographic have been found to have a quantitative impact on meningitis occurrence [19] . This study examined two risk factors; climatic factors and socioeconomic factors as factors associated with meningitis occurrence in Kenya. Climatic factors were accounted for with average rainfall and socioeconomic factors were accounted for with GCP. Four spatio-temporal models (one full model and three reduced models) were formulated using the method highlighted in section 2.5. The four models are given below and the results obtained are given in Table 10 below and explained in the cases thereafter in Appendix B.

Model 1: y ^ = α ^ + β ^ 1 a v g r a i n + β ^ 2 g c p

Model 2: y ^ = α ^ + β ^ 1 a v g r a i n

Model 3: y ^ = α ^ + β ^ 2 g c p

Model 4: y ^ = α ^

From cases 1 to 3, see Appendix B, it is evident that average rainfall and GCP are jointly significant predictors in the model. This therefore suggests that climatic factors and socioeconomic factors are risk factors associated with meningitis

Table 10. Model comparison for the 4 Spatio-temporal models.

occurrence in Kenya. Table 10 also shows DIC values for the four models stated above. It can be observed from Table 10 that model 1 (full model) has the lowest DIC value (724.2428) compared to other spatio-temporal models fitted to meningitis count data. Similar to the simulation study, the subsequent analysis, which involves creating spatio-temporal maps was performed using the best model, which includes the significant risk factors. We created spatio-temporal maps for the observed meningitis counts (see Figure 3) and the estimated counts from the best model that includes the significant risk factors (see Figure 4).

Figure 3. Spatio-temporal maps of observed meningitis count across the 47 counties in Kenya from 2013-2019.

Figure 4. Posterior estimated Spatio-temporal maps of meningitis count across the 47 counties in Kenya from the best spatio-temporal model.

Figure 3 shows the spatio-temporal maps for observed meningitis cases across the 47 counties in Kenya for seven years (2013-2019). The regions colored dark red indicate the high number of meningitis cases while light-colored regions show the low number of cases. From the figure, the maps keep changing significantly over the seven years, highlighting areas where more attention should be given by policymakers It is clear from Figure 3 that the lowest cases of meningitis were reported in the years 2013, 2015, and 2016 while the highest cases were recorded in 2014, 2017, 2018, and 2019. Counties reporting high incidences are counties in Rift Valley, Nairobi, Coastal, and Western side of Kenya which appeared as hot spot areas. The spread of meningitis was found to be associated with GCP and average rainfall, suggesting that these factors may have contributed to the high risk in these counties. Therefore, further attention should be given to these counties especially when planning prevention and intervention and also in managing the disease.

Figure 4 displays the Spatio-temporal maps of estimated meningitis counts from the best spatio-temporal model for the year 2013 to 2019. The model has shown some apparent hot spots in the years 2015, 2016, 2017, 2018, and 2019. In 2015, 2016, and 2017 the hot spot areas are counties in the Rift Valley, Nairobi, and western side of Kenya. In 2018 and 2019 the hot spot areas increased with some appearing along the Eastern side and the North Eastern region of Kenya. Upon visual inspection, the spatial and temporal patterns of the estimated meningitis counts show a close relationship with the actual count data. For instance, continuities can be seen with the clustering of high meningitis cases in the Western side, Rift Valley, and Nairobi region when compared with the observed counts map. However, we note that in the actual count data, Tana River County had a high incidence of meningitis cases in the last three years, yet the model did not yield similarly elevated estimates for this county. There could be several reasons attributed to this. One possibility is that there could be an unforeseen surge in cases that the model cannot capture. Alternatively, there might have been an improvement in surveillance measures within Tana River County, resulting in reported meningitis cases that the model cannot fully explain. It's also possible that we are missing an important covariate, such as population, that could provide a more accurate understanding of the situation. Consequently, a further investigation should be able to be carried out to explain this phenomenon in Tana River County. In this regard, it is worth mentioning that meningitis is not evenly distributed across the country as some counties record a high number of cases compared to others.

4. Conclusion

The study highlights the importance of incorporating and modelling spatial and temporal dependencies jointly in the spatio-temporal disease mapping. The study contributes to the field of disease mapping by developing a spatio-temporal model in the Bayesian hierarchical framework that accounts for the spatial and temporal dependencies jointly across both space and time. The spatial and temporal dependencies were dynamically modelled together via the matern exponential covariance function where the parameter of the matern exponential θ t was treated as a function of time with first-order autoregressive structure so that it captures the space-time interactions between geographical locations and over time. Parameter estimation was performed using the MCMC approach and the posterior estimates were computed from a posterior sample of size 20,000 which was used to explain the model results. Spatio-temporal maps were obtained by mapping stable posterior means for the specific location and time from the best model that includes the significant risk factors. Both simulation data and Kenya meningitis incidence data from 2013 to 2019 were employed in the model alongside two covariates GCP and average rainfall. The study found that both average rainfall and GCP are positively associated with meningitis occurrence in Kenya. Concerning the geographical distribution of meningitis cases in Kenya, the spatio-temporal maps showed that meningitis is not evenly distributed across the country as some counties reported a high number of cases compared to other counties. Counties situated in the Rift Valley, Western side, Coastal, and Nairobi regions exhibit a notable high risk of meningitis, thus requiring prioritized attention in intervention planning. We believe the model developed will aid the policymakers and public health authorities in understanding the joint dynamic of the evolution of disease across different locations and over time. In this paper, we did not do a comparative study with existing models in the literature. We propose a further study on the comparison with other models be undertaken. A more Bayesian approach to selecting relevant significant predictors such as Bayesian shrinkage approach could be explored in this model over stepwise regression. Additionally, alternative measures of model selection such as BIC or Log-marginal likelihood could be considered for this model.

Appendix A. Trace Plots for Parameters in the Simulation Study

The red horizontal line displayed in the plots is the true value of the parameter. The green horizontal line is the estimated posterior mean value while the black horizontal lines are the Bayesian credible interval values

Figure A1. Trace plots for α , ϕ θ , τ ε 2 , and σ ξ 2 .

Figure A2. Trace plots for the components of β.

Figure A3. Trace plots for matern parameter θ t .

Figure A4. Trace plots for a few locations and time ( η i , t ).

Appendix B. Hypotheses Tests on the Sets of Regression Coefficients in the Real Case Study

Case 1: Model 1: y = α + β 1 a v g r a i n + β 2 g c p

In case 1, we tested the significance of average rainfall and GCP jointly in the model. The hypothesis was given as:

H 0 : β 1 = β 2 vs. H A : at least one of β i ( i = 1 , 2 ) is not equal to zero.

It can be observed from Table 10 that the model is useful overall and average rainfall and GCP are jointly significant predictors in the model ( D * = 229.6140 , p-value = 0.0000).

Case 2: Model 2: y = α + β 1 a v g r a i n

In case 2 the significance of the regression coefficient β 2 was tested using the following hypothesis

H 0 : β 2 = 0 vs. H A : β 2 0 .

It can be perceived on Table 10 that D * = 40.6339 and when compared to a χ ( 1 ) 2 the p-value was.0000 which leads to rejection of H 0 . This therefore suggest that GCP is a significant predictor of meningitis occurrence given average rainfall is in the model and hence it cannot be removed from the model.

Case 3: Model 3: y = α + β 2 g c p

In this case, we examine the significance of the regression coefficient β 1 in the model. The hypothesis was given as:

H 0 : β 1 = 0 vs. H A : β 1 0 .

From Table 10, D * = 72.2695 with a p-value of 0.0000. Consequently, there is enough evidence that average rainfall is a significant predictor of meningitis incidence given GCP is in the model. That means it cannot be dropped from the model.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] Nobre, A.A., Schmidt, A.M. and Lopes, H.F. (2005) Spatio-Temporal Models for Rapping the Incidence of Malaria in Para. Environmetrics, 16, 291-304.
https://doi.org/10.1002/env.704
[2] Adegboye, O., Leung, D. and Wang, Y. (2017) Modelling of Spatio-Temporal Zero Truncated Patterns in Infectious Disease Surveillance Data. Preprints, 2017, Article 2017030065.
https://doi.org/10.20944/preprints201703.0065.v1
[3] Abd Naeeim, N.S., Abdul Rahman, N. and Muhammad Fahimi, F.A. (2019) A Spatial-Temporal Study of Dengue in Peninsular Malaysia for the Year 2017 in Two Different Space-Time Model. Journal of Applied Statistics, 47, 739-756.
https://doi.org/10.1080/02664763.2019.1648391
[4] Bernardinelli, L., Clayton, D., Pascutto, C., Montomoli, C., Ghislandi, M. and Songini, M. (1995) Bayesian Analysis of Space-Time Variation in Disease Risk. Statistics in Medicine, 14, 2433-2443.
https://doi.org/10.1002/sim.4780142112
[5] D’Angelo, N., Abbruzzo, A. and Adelfio, G. (2021) Spatio-Temporal Spread Pattern of COVID-19 in Italy. Mathematics, 9, Article 2454.
https://doi.org/10.3390/math9192454
[6] Samat, N.A. and Mey, L.W. (2017) Malaria Disease Mapping in Malaysia Based on Besag-York-Mollie (BYM) Model. Journal of Physics: Conference Series, 890, 012167.
https://doi.org/10.1088/1742-6596/890/1/012167
[7] Semakula, M., Niragire, F., & Faes, C. (2020) Bayesian Spatio-Temporal Modeling of Malaria Risk in Rwanda. PLOS ONE, 15, e0238504.
https://doi.org/10.1371/journal.pone.0238504
[8] Besag, J., York, J. and Molli, A. (1991) Bayesian Image Restoration, with Two Applications in Spatial Statistics. Annals of the Institute of Statistical Mathematics, 43, 1-20.
https://doi.org/10.1007/bf00116466
[9] Botella-Rocamora, P., López-Quílez, A. and Martinez-Beneito, M.A. (2012) Spatial Moving Average Risk Smoothing. Statistics in Medicine, 32, 2595-2612.
https://doi.org/10.1002/sim.5704
[10] MacNab, Y.C. (2012) On identification in Bayesian Disease Mapping and Ecological-Spatial Regression Models. Statistical Methods in Medical Research, 23, 134-155.
https://doi.org/10.1177/0962280212447152
[11] MacNab, Y.C. and Dean, C.B. (2001) Autoregressive Spatial Smoothing and Temporal Spline Smoothing for Mapping Rates. Biometrics, 57, 949-956.
https://doi.org/10.1111/j.0006-341x.2001.00949.x
[12] Ugarte, M.D., Goicoa, T. and Militino, A.F. (2009) Spatio-Temporal Modeling of Mortality Risks Using Penalized Splines. Environmetrics, 21, 270-289.
https://doi.org/10.1002/env.1011
[13] MacNab, Y.C. and Gustafson, P. (2007) Regression B-Spline Smoothing in Bayesian Disease Mapping: With an Application to Patient Safety Surveillance. Statistics in Medicine, 26, 4455-4474.
https://doi.org/10.1002/sim.2868
[14] Ghirmai, T. (2015) Applying Metropolis-Hastings-within-Gibbs Algorithms for Data Detection in Relay-Based Communication Systems. 2015 IEEE Signal Processing and Signal Processing Education Workshop (SP/SPE), Salt Lake City, UT, 9-12 August 2015.
https://doi.org/10.1109/dsp-spe.2015.7369547
[15] Van de Schoot, R., Depaoli, S., King, R., Kramer, B., Märtens, K., Tadesse, M.G., Vannucci, M., Gelman, A., Veen, D., Willemsen, J. and Yau, C. (2021) Bayesian Statistics and Modelling. Nature Reviews Methods Primers, 1, Article No. 1.
https://doi.org/10.1038/s43586-020-00001-2
[16] Spiegelhalter, D.J., Best, N.G., Carlin, B.P. and Van Der Linde, A. (2002) Bayesian Measures of Model Complexity and Fit. Journal of the Royal Statistical Society Series B: Statistical Methodology, 64, 583-639.
https://doi.org/10.1111/1467-9868.00353
[17] Cooper, L.V., Kristiansen, P.A., Christensen, H., Karachaliou, A. and Trotter, C.L. (2019) Meningococcal Carriage by Age in the African Meningitis Belt: A Systematic Review and Meta-Analysis. Epidemiology and Infection, 147, e228.
https://doi.org/10.1017/s0950268819001134
[18] Abdussalam, A.F. and Qaffas, Y. (2016) Spatiotemporal Patterns and Social Risk Factors of Meningitis in Nigeria. Open Access Library Journal, 3, 1-13.
https://doi.org/10.4236/oalib.1102909
[19] Mazamay, S., Broutin, H., Bompangue, D., Muyembe, J. and Guégan, J. (2020) The Environmental Drivers of Bacterial Meningitis Epidemics in the Democratic Republic of Congo, Central Africa. PLOS Neglected Tropical Diseases, 14, e0008634.
https://doi.org/10.1371/journal.pntd.0008634

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.