Spatial Modeling of COVID-19 Occurrence and Vaccination Rate across Counties in Ohio State from Jan. 2020 to April 2023 ()
1. Introduction
The emergence of the Severe Acute Respiratory Syndrome Coronavirus 2019, otherwise known as SARS-CoV-19, was first reported in late December 2019 in Wuhan, China [1] [2]. The occurrence of the novel disease attracted the attention of medical geographers globally because of its high Reproduction Ratio (Ro), and the diagnoses show that it is a pneumonia-related disease that kills quickly during the first waves of the disease transmission [1]-[3]. In the United States, the disease’s first occurrence and community spread were recorded in February 2020 [3] [4]. However, studies in the United States have identified different waves of the transmission of the disease across seasonal changes that necessitated urgent attention to curtail the spread of the disease [5] [6]. Ohio State is one of the states within the contiguous United States that has experienced a recurrence of COVID-19 disease that has resulted in thousands of deaths [7]. On March 9, 2020, Ohio State announced its first clustered cases of the disease [8]. In fact, towards the end of 2021, the state had recorded close to one million cases and accounted for close to 19,000 deaths [8].
According to Sorrels, Arduser [9], the COVID-19 events statistics across the 88 counties in Ohio were a story of public health disaster that reveals the social and health disparities across the state. As of July 2023, there are over 3 million cases and over 42,000 deaths that have been attributed to the novel disease [10]. The reported disease cases profile had been reduced recently because of the home testing kits that became available to Ohioans [9]. Since the outbreak of COVID-19 disease in early 2020 in the United States, several attempts have been made to reduce the spatial transmission of the disease. For example, the implementation of Non-Pharmaceutical Interventions (NPI) like social distancing and “shelter-in-place” orders, banning social gatherings in public places, and school closures, among others, have all proven to be more effective in reducing the transmission [11] [12].
During the first wave of COVID-19 transmission in the United States between January 2020 and May 2020, there was an absence of pharmaceutical interventions like vaccines, thus no effective and efficient ways of taking care of patients in critical health conditions. Notable pharmaceutical companies like Pfizer, Johnson and Johnson, and Moderna received swift permission from the US Food and Drug Administration (FDA) to swing into COVID-19 vaccine development. In the second week of December 2020, precisely December 14, the first doses were delivered [12]. According to the Centre for Disease Control (CDC), six months later, the total number of administered vaccines was over 350 million, which accounted for about 78% of those vaccines that were delivered to the FDA (CDC, 2020). Indeed, several empirical studies have revealed that understanding COVID-19 and vaccination coverage patterns and determining the factors that explain the spread of the disease and vaccination coverage in the United States relies on the agglomeration of different social and epidemiological datasets [1] [2]. Access to reliable long-term social and epidemiological datasets would plausibly ascertain the advanced spatial analysis using modern-day technologies.
Most of the theories used in medical and geographical studies of infectious diseases have affirmed that the occurrence and transmission of diseases rely heavily on the Natural Foci of the Disease and socio-cultural behavioral characteristics of the individuals where the disease occurs [13]-[16]. Thus, the disease incidence would vary with the dynamics of environmental risk cells that operate in an area at any point in time and across space. Also, the vaccination rate in an area could be better explicated in terms of socio-cultural, socio-economic, and individual perception of a particular disease, which could, in turn generate individual “actions” towards prevention or exposure to the disease. Oftentimes, these “actions” are spatial and could be modeled spatially [12]. In this study, Epidemiological Transition Theory, Ecological Model of Disease of Disease Causation, and Spatial Diffusion Theory were the models that underpins this research. According to McGlashan and Harington [17], the occurrence and mortality rate of diseases solemnly depends on the socio-economic and health conditions of people. Hence, the COVID-19 occurrence and vaccination rate across the eighty-eight (88) counties in Ohio could be modeled with advanced spatial statistical techniques. Since the emergence of different waves of COVID-19 occurrence and population response to the disease in terms of vaccination, many scientists have attempted to study the geographic variation of COVID-19 incidence [1]-[3], and the accessibility of individual to vaccination centers [18].
Several studies, both within and outside the United States, have identified that COVID-19 incidence and vaccination coverage tends to reveal spatial variations across global and local scales with heavy reliance on socioeconomic and underlying health conditions as risk factors [1]-[3]. Savla, Roberto [19], as one of the studies that evaluated vaccination coverage between rural and urban centers in the United States, affirmed that COVID-19 incidence and mortality are higher in rural counties than in urban counties, while disparities in vaccination coverage between rural and urban counties tend to show visible spatial disparities. Several published empirical literatures have identified the paucity of literature that focuses on the explanatory factors that underline the observed variation in vaccination coverage. Akinwumiju, Oluwafemi [2] identified that socio-economic factors and underlying health conditions, for example, obesity and diabetes, are significant underlying health conditions that explain COVID-19 variations mostly delineated in the South, Midwest, and Northeastern parts of the United States. In this study, we adopted a geospatial evaluation approach to explain the patterns of COVID-19 cases and response to vaccination in Ohio. The study further investigates and identifies parameters that define the observed pattern to create early warning surveillance during public health emergencies.
2. Methodology
2.1. Study Area
According to Hossain and Smirnov [20], Ohio is the most densely populated in the mid-west of the United States, which serves as the residence for over 11.8 million people across 44,825 square miles. Ohio was also identified as the tenth most populous state in the United States, with eighty-eight (88) counties, 2952 Census tracts, and 9238 block groups across both rural and urban communities (See Figure 1). It is important to highlight that the twelve Metropolitan Statistical Areas (MSA) accommodate 85% of the entire population [20]. Ohio is one of the states in the mid-west United States where heavy recurrence of COVID-19 disease has been recorded. Indeed, the first COVID-19 occurrence was on March 9, 2020, and since then, COVID-19 incidence has increased over the first, second, and third waves of COVID-19 occurrence [8]. On the other hand, the progress of mass vaccination campaigns is attributable to the willingness of individuals to get vaccinated and the acceptance of individuals to vaccination programs.
Recent literature on COVID-19 vaccination coverage studies in the United States has revealed that between 56% to 74% are willing to receive the COVID-19 vaccination [21]-[23]. Thus, recent literature has identified spatial variations in vaccination acceptance across both rural and urban communities in the United States [21] [23] [24]. This study area includes all 88 counties in Ohio, where COVID-19 incidence and vaccination coverage data are available from January 2020 to April 2023. The data also includes socio-economic variables like income and race, among others.
2.2. Data Collection and Preprocessing
The Ohio Department of Health continues to monitor county-level data on COVID-19 disease occurrence across the state. Secondary data was used for this study. The dataset comprises county-level counts of COVID-19 cases across the 88 counties in Ohio from January 2020 to April 2023 from USA Facts (usafacts.org). The vaccination records were acquired from the Ohio vaccination tracker dashboard. The socio-economic variables like poverty, black race, and population density were acquired from the American Community Survey by the US Census Bureau. The epidemiological and socio-economic data were subjected to data cleaning and consistency tests. The datasets were built into a file geodatabase in the ArcGIS 10.8 environment, and the socio-economic variables were populated as themes in the ArcGIS environment.
2.3. Model Descriptions
We employed four different global and local regressions (OLS, SLM, SEM, and GWR) to examine the relationship between COVID-19 occurrence (dependent variable) and four different explanatory variables that have low Variance Inflation Factor (VIF) (Poverty, Population density, Black race and vaccination count) for the first equation. These explanatory variables were selected based on the epidemiological and spatial theoretical models that we adopted. The second equation also examined the relationship between vaccination count and explanatory variables (poverty, population density, Black race, and COVID-19 case count). The regression models below were implemented in GIS and coordinated stand-alone algorithm to further explain the relationship between COVID-19 occurrence and the selected explanatory variables. The comprehensive definitions of the regression models are presented below.
2.3.1. Global Regression Models
Ordinary Least Square (OLS) is considered one of the global regression models. In the case of OLS, its basic assumption is that parameter values are independent, which implies that parameter values do not affect each other. In direct contrast, COVID-19 occurrence and vaccination coverage exhibit spatial dependence. Hence, the assumptions of OLS to determine spatial interactions are violated, and OLS is perfectly unsound in modeling the relationship between COVID-19 occurrence and vaccination coverage [1] [2] [5] [25]. Thus, we adopted spatial autoregressive models (Spatial Lag model; SLM, and Spatial Error Model; SEM) that are both variants of OLS [26] [27], and both take into cognizance spatial interactions and model it separately in different manners.
2.3.2. Ordinary Least Square (OLS)
OLS is defined as:
(1)
where,
is the dependent variable,
are the predictor variables,
is the intercept,
are the partial regression coefficients and
is the error term. In this study, COVID-19 occurrence (case count) is a dependent variable, while variables like poverty, population density, Black race, and vaccination count are the explanatory variables. The assumption of OLS further reveals that prediction errors of the sum of squared are minimized to optimize beta [5] [25]. Also, it is expected that error terms are uncorrelated [28].
2.3.3. Spatial Lag Model (SLM)
Equation two defines the spatial lag model as stated below where:
(2)
where,
is the spatial autoregressive variable (i.e. the spatial lag parameter), and
is a row of the matrix of spatial weights (that is, vector of the spatial weights). The origin of Equation (2) is rooted in the decomposition of the error term in Equation (1) [2] [5] [27]. In this model, W indicates the spatial weight of the neighbors around county i and. Hence, it describes the influence of the explanatory variables on the dependent variable at the boundaries around county i [5] [25].
The SLM assumption is that observations are not independent, which implies that both dependent and independent variables incorporate spatial autocorrelation. Hence, the spatial lag model accounts for the “spatially lagged dependent variable” [2] [26] [27]. SLM is applicable when investigating the potential influence of spatial autocorrelation that is embedded in spatial data. SLM uses the Maximum Likelihood method in model estimation and adopts both Queens and Rooks Contiguity Matrix in determining the spatial weight matrix [1].
2.3.4. Spatial Error Model (SEM)
Spatial Error Model is a spatial autoregressive model and is defined as:
(3)
where,
implies the spatial component of the error,
connotes the existing correlation rate among the components, and εi represents the non-correlated spatial error term. The main assumption in SEM is that spatial dependency in the error term of OLS putrefies the error term in Equation (1) into two separate terms (
and
) [2] [5] [26]. SEM further assumes that the dependent variable relies heavily on observed local characteristics, which posit error terms being autocorrelated across space.
2.3.5. Local Regression Model
The general assumption of the local regression model is that variables change across space and time. The local regression model, for example, Geographically Weighted Regression (GWR) allows spatial variability of both predictors and dependent variables. Brunsdon, Fotheringham [29] proposed the GWR as a local regression model that allows the parameters of a regression estimation to vary over the spatial domain [30].
2.3.6. Geographically Weighted Regression
According to Lin and Wen [30], GWR is a localized regression model that focuses on the spatial variabilities of parameters of the regression estimates across space and time. According to Fotheringham and Oshan [31], the GWR model can be expressed as:
(4)
where:
is the value for the COVID-19 incidence rate,
is the intercept,
is the jth regression parameter,
is the value of the jth explanatory parameter, and
is a random error term. Parameter estimates for each explanatory variable and at each county in matrix form are given by [31]:
(5)
where
denotes the parameter estimates’ vector (m × 1), X stands for the selected explanatory variables’ matrix (n × m), W(i) denotes the spatial weights’ matrix (n × n), and y is the dependent variable observations’ vector (m × 1) [31]. The diagonal matrix (W(i)) is a function of the weights of individual parameter observation, which is proportional to its positional distance from location i, and the calibration is based on a locally weighted regression [5] [31] [32]. The computation of W(i) requires the specification of both kernel function and bandwidth. Among the available kernel functions, Gaussian and bi-square are mostly preferred, and the functions’ bandwidth is a derivative of the Euclidean distance on the population of the nearest neighbors. Nevertheless, the preferred type of bandwidth will certainly influence the local weighting occurrence neighborhood.
3. Results and Discussion
This study explores counties’ spatial patterns, distributions, and associations related to COVID-19 prevalence and vaccinations in Ohio. The study investigates the normality of the vaccination and prevalence data and corrects it by taking the natural logarithm of the vaccinations and case rate. Figure 1 shows similarities in the spatial association of COVID-19 cases and its natural logarithm. Hamilton, Franklin, and Cuyahoga counties have the highest cases in the period investigated in this study. Counties with moderately high COVID-19 cases also surround these epicenters. Over 50% of the counties in Ohio had COVID-19 cases, between 3747 and 18,185. Ibukun, Oluwafemi [33] also found a similar pattern of counties with high COVID-19 cases located closer to other counties with high or moderately high COVID-19 cases from January 2020 to April 2023 in Georgia. A pattern that corresponds to counties with high COVID-19 cases in Ohio having high COVID-19 vaccinations is also observed in Figure 2. It can be deduced that Hamilton, Franklin, and Cuyahoga counties’ residents took between 557,618 and 918,759 COVID-19 vaccines of at least one dose between January 2020 and April 2023. These vaccination epicenters are also surrounded by counties with moderately high vaccinations. The epidemiological transition model adopted in this study explains the cumulative effect of population theory on the COVID-19 occurrence across the counties. The results of the exploratory analysis explain the long-run changes in demographic structures in this study area and how they influence the COVID-19 occurrence across counties with higher population concentrations. The Spatial Diffusion Theory that was also adopted in the study explains how the disease moves across study area from counties of higher populations to counties of lower populations.
Ibukun, Oluwafemi [33] investigate the cluster and outlier distribution of COVID-19 cases and vaccinations, specifically to ascertain the most prevalent clustering pattern. They found the prevalence of high clusters of COVID-19 in counties in the northwestern part of Georgia, which is synonymous with the epicenters. Figure 3 shows that most of the counties, approximately 81% (seventy-one counties) in Ohio, have insignificant clustering related to COVID-19 cases. However, counties in the northeastern region, namely Cuyahoga, Medina,
Figure 1. Spatial distribution of COVID-19 cases.
Summit, and Portage exhibit high-high clustering. Also, Hamilton, Butler, and Warren Counties in the southwestern part have significant high-high clustering. The implication of this is that these counties exhibit statistically significant concentrations of high COVID-19 cases or spatial dependence. These four counties are areas that have high COVID-19 cases and are surrounded by other counties with high COVID-19 cases. Tuscarawas and Muskingum counties exhibit high-low outliers. This means these counties have unusually high COVID-19 cases but are surrounded by neighboring counties with low COVID-19 cases. The same pattern of clustering is also observed for COVID-19 vaccination (Figure 4). Eight
Figure 2. Spatial distribution of COVID-19 vaccinations.
counties, Defiance, Jackson, Athens, Washington, Noble, Belmont, Jefferson, and Van Wert have low-low clusters. This shows that those counties have low COVID-19 vaccinations and are surrounded by other counties with low vaccination rates. Preble County is the only county that has low COVID-19 vaccination and is surrounded by counties with high vaccination rates.
This study also employs hot spot analysis to picture the regions or counties with statistically significant clustering of high and low COVID-19 cases and vaccinations in Ohio. We implemented this using the Getis-Ord Hot Spot algorithm in ArcGIS 10.8 to rank the cases and vaccinations. Hot-spot analysis also indicates similar patterns in COVID-19 cases and vaccination from January 2020 to April
Figure 3. Cluster and outlier analysis of COVID-19 cases.
Figure 4. Cluster and outlier analysis of COVID-19 vaccination.
2023. The majority of the Hotspots are in the counties in the Northeastern and Southwestern parts of the state, with a very high level of significance (p < 0.05) as shown in Figure 5 and Figure 6. For COVID-19 cases, eleven counties have hotspots, with seven counties, Cuyahoga, Medina, Summit, Portage, Hamilton, Butler, and Warren, having significant levels at 95% and 99%. Compared to COVID-19 cases, fourteen counties have hotspots for vaccinations, out of which eight counties, Cuyahoga, Medina, Summit, Portage, Hamilton, Butler, Warren, and Franklin are significant at 95% and 99% levels.
Figure 5. Hotspot analysis of COVID-19 cases.
Figure 6. Hotspot analysis of COVID-19 vaccinations.
Relationship between COVID-19 Cases, Vaccinations and Selected Predictors
This study also investigates the important selected predictors that drive the spatial prevalence of COVID-19 cases in Ohio counties using both spatial lag and spatial error models. Both backward and forward stepwise spatial regression methods were employed in both models to determine the predictors that explain the most variation in COVID-19 cases in Ohio between January 2020 and April 2023. Since the COVID-19 cases did not follow normal distribution, its natural logarithm was taken and used in the regression models. The predictors included in the spatial lag and spatial error models are % Black, % poverty, COVID-19 vaccination, and population density. Table 1 indicates that two of the included predictors are statistically significant in explaining the variation in COVID-19 prevalence.
The result from both spatial lag and spatial error models shows that the selected predictors significantly explain the variation in COVID-19 cases with r-squared of 68.42% and 68.37%, respectively. The best-performing model is the spatial error model, as evidenced by its Akaike information criterion (AIC) of 158.627, which is smaller than that of the spatial lag model (160.51). The percentage of Blacks has a statistically positive relationship with the prevalence of COVID-19 in Ohio. Population density also has a positive relationship with COVID-19, but the relationship is insignificant. COVID-19 vaccination has a negative relationship with COVID-19 cases. This shows the importance of COVID-19 vaccination interventions in reducing the prevalence of COVID-19. These findings buttress those of Ibukun, Oluwafemi [33] and Akinwumiju, Oluwafemi [2], that vulnerable populations suffer more COVID-19 incidence, and COVID-19 vaccinations reduce the spread.
Table 1. Spatial lag and spatial error model of the predictors of COVID-19 cases.
Variable |
Coefficient |
Standard error |
Z-Value |
P-value |
SLM |
SEM |
SLM |
SEM |
SLM |
SEM |
SLM |
SEM |
Constant |
10.1144 |
9.80612 |
0.762623 |
0.202956 |
13.2626 |
48.3164 |
0.00000*** |
0.00000*** |
Blacks |
0.0542895 |
0.0541224 |
0.0258412 |
0.0257943 |
2.10089 |
2.09823 |
0.03565*** |
0.03588*** |
Poverty |
−0.0286117 |
−0.0286658 |
0.0138736 |
0.0139245 |
−2.06231 |
−2.05866 |
0.03918*** |
0.03953*** |
Vaccination |
−1.54459e−07 |
9.69299e−08 |
1.97152e−06 |
1.97853e−06 |
1.49089 |
0.0489909 |
0.13599 |
0.96093 |
Population
density |
0.00109488 |
0.00111937 |
0.00073438 |
0.000736245 |
1.49089 |
1.52038 |
0.13599 |
0.12842 |
R-squared |
0.684178 |
0.683719 |
|
|
|
|
|
|
AIC |
160.51 |
158.627 |
|
|
|
|
|
|
Breusch-Pagan test |
4.0782 |
4.0118 |
|
|
|
|
|
|
Note: ***p < 0.01.
Table 2 comparatively presents the spatial lag and spatial error model of the predictors driving COVID-19 vaccinations across the counties in Ohio. As with the COVID-19 cases, both spatial lag and spatial error model significantly explain the spatial variation in COVID-19 vaccinations with R-squared of 70.70% and 70.77%, respectively. The spatial error model also performed the best in explaining the variations in COVID-19 vaccinations, with the Akaike information criterion (173.083) smaller than the spatial lag model (175.198). The percentage of Blacks, poverty, and COVID-19 cases are statistically significant. For every percent increase in the black population, there is a 0.055 percent increase in COVID-19 vaccination. The negative coefficient of poverty indicates that for every percent increase in poverty, there is a 0.048% reduction in COVID-19 vaccination in Ohio. This could be attributable to the fact that a percentage of the poor population may not be insured and consequently have difficulty accessing COVID-19 vaccines. The positive coefficient of COVID-19 cases shows that as the cases increase, the number of COVID-19 vaccines also increases in Ohio between January 2020 and April 2023.
Table 2. Spatial lag and spatial error model of the predictors of COVID-19 vaccinations.
Variable |
Coefficient |
Standard error |
Z-Value |
P-Value |
SLM |
SEM |
SLM |
SEM |
SLM |
SEM |
SLM |
SEM |
Constant |
10.7575 |
10.5804 |
0.781904 |
0.216978 |
13.7581 |
48.7627 |
0.00000*** |
0.00000*** |
Blacks |
0.0545982 |
0.0541461 |
0.0275929 |
0.0276693 |
1.9787 |
1.9569 |
0.04785*** |
0.05036** |
Poverty |
−0.0483366 |
0.0497281 |
0.0151726 |
0.0149856 |
3.18577 |
3.3184 |
0.00144*** |
0.00091*** |
Population density |
0.000311312 |
0.000199569 |
0.000762657 |
0.000757287 |
0.408194 |
0.263532 |
0.68313 |
0.79214 |
Covid-19 cases |
8.20621e−06 |
9.16963e−06 |
5.36179e−06 |
5.31897e−06 |
1.5305 |
1.72395 |
0.12589 |
0.08472* |
R-squared |
0.707099 |
0.707756 |
|
|
|
|
|
|
AIC |
175.198 |
173.083 |
|
|
|
|
|
|
Breusch-Pagan test |
11.0742 |
11.6855 |
|
|
|
|
|
|
Note: ***p < 0.01, **p < 0.05, *p < 0.10.
This study also evaluates how well the Geographically Weighted Regression (GWR) model explains the variability in COVID-19 cases and vaccinations across the counties in Ohio by assessing the spread of the coefficient of determination across the different counties. The local coefficient of determination R2 at each county is representative of the variation in COVID-19 and vaccination that is explained by the selected predictors. Figure 7 shows the variations in COVID-19 cases that are explained by % blacks, poverty, vaccination, and population density. In the Western Counties (Red), there are higher R2 values ranging from 0.6834 to 0.6835. This shows that the predictors included in the spatial models explain the variation in COVID-19 cases in great proportion and are very effective in capturing the spatial association between COVID-19 cases and the predictors. The Eastern Counties (Blue) have a lower local coefficient of determination, which means the model explains a little less variation in COVID-19 cases and the selected predictors.
Figure 8 shows how much variation in COVID-19 vaccination in Ohio from January 2020 to April 2023 is explained by the % of Black population density, poverty, and COVID-19 cases across the counties. In the western counties, the coefficient of determination ranges from 0.72 to 0.83. This implies that the variables included in the model highly explain the variations in COVID-19 vaccination
Figure 7. Coefficient of determination map of local geographically weighted regression for COVID-19 cases.
Figure 8. Coefficient of determination map of local geographically weighted regression for COVID-19 vaccinations.
in those counties. On the other hand, the eastern counties have a coefficient of determination that ranges between 0.59 and 0.70, which is indicative of a lesser explanation of the variation in COVID-19 vaccination in the counties.
The difference in the coefficient of determination across the counties for both COVID-19 cases and COVID-19 vaccinations shows that the relationship between COVID-19 cases and vaccinations and the predictors is not uniform across Ohio. There may be regional factors contributing to these differences.
The findings of the study compared favorably with the findings of Ibukun, Oluwafemi [33], Akinwumiju, Oluwafemi [2], and Oluwafemi and Oladepo [1], where the spatial error model reflected a better r squared and AIC in evaluating COVID-19 rate. Our study also complies with Xu and Wang [34], that GWR seems to model disease diffusion and vaccination rate across locations where spatial resolution of analysis is at the county level or finer spatial resolution.
4. Conclusion
The result of the study revealed that the occurrence of COVID-19 disease and vaccinations varies across space and time. The study further highlighted certain socioeconomic and demographic characteristics are responsible for the incidence of COVID-19 and the vaccination rate. In addition, the study further suggested that the spatial auto-regressive model, precisely spatial error, seems to be a better model for evaluating COVID-19 and vaccination rates. Indeed, there are spatial disparities in the patterns of the output of the three models (SLM, SEM, and GWR). We conclude that GWR could capture the spatial disparities of diseases closely. The result obtained further showed clearly that the COVID-19 rate dictates the growth in vaccination rate across the study area. Thus, it is recommended that COVID-19 disease prevention efforts should focus on the spatial characteristics of economic and demographic variables, most importantly poverty and race.