Assessing the Association between Heart Attack, High Blood Pressure, and Heart Disease Mortality Rates and Particulate Matter and Socioeconomic Status Using Multivariate Geostatistical Model

This study addresses the public concerns of potential adverse health effects from ambient fine particulate matter as well as socioeconomic factors. Heart attack, high blood pressure, and heart disease mortality rates were investigated against fine particulate matter and socioeconomic status, for all counties in the United States in 2013. Multivariate multiple regressions as well as multivariate geostatistical predictions show that these are significant factors towards assessing the causal inferences between exposure to air pollution and socioeconomic status and the three mortality rates.


Introduction
The cardiovascular disease of heart disease is the number one cause of death in the World with its mortality rate representing one third of all global deaths.Its documented risk factors include smoking, physical activity, diet, obesity, cholesterol, and diabetes [1].As for heart attack mortality rates, research shows that despite the progress made in faster hospital care, its rates remain unchanged [2].The third health outcome of interest is high blood pressure mortality rate which contributes to thirteen percent of all deaths in the US and is continually increasing [3].The three health outcomes have been associated with air pollution as a consequence of oxidative stress leading to inflammation, which generates the physiological processes evolving as cardiovascular symptoms like narrowing of airways, shortness of breath, wheezing, cough, and the ability of particles to penetrate the lung wall accumulating in the pulmonary interstitium between the lung and the bloodstream [4]- [6].The outcomes have also been negatively associated with socioeconomic status (SES) through life style effects [7] [8].
The purpose of this study is to quantify the association between the three health outcomes and exposure to fine particulate matter (PM2.5) and the different aspects of SES in the US, assessing causal inferences between exposure to fine particulate matter and mortality rates of heart attacks, high blood pressure, and heart disease provides epidemiological evidence regarding the adverse health effects of air pollution.Studying this association at the county level can address the question of whether the current network of air quality monitoring stations adequately represent the populations and locations with highest risk for respiratory and cardiovascular that has not been fully answered yet.An important underlying reason for this problem is the uniform approach that has been implemented to select monitoring sites locations, which ignores the dissimilarities among sites e.g., Houston, Texas versus Jackson, Wyoming [9] [10].In addition, the effect of local air pollution versus neighboring levels is not fully explored yet [11] [12], which points to the importance of studying community-related characteristics because of the previously-mentioned factors' contribution to the causal inference between air pollution and its respiratory and cardiovascular health.

Methods
To achieve the study objective, data was collected from the Centers for Disease Control and Prevention (CDC) [13] and the US Environmental Protection Agency (EPA) [14], for the year 2013.Table 1 gives the twenty four variables collected and analyzed.Statistical analyses were conducted using R version 3.1.1.

Results
Table 2 gives some of the descriptive statistics for the main variables.The high departures from skewness and kurtosis from 0 and 3, respectively, which are the characteristic values of normal distribution show non symmetric nature of heart attack mortality, high blood pressure mortality, heart disease mortality, educational attainment, poverty, unemployment rate, percentage of population aged 65 years and older, percentage of population within a half a mile of a park, annual average ambient PM2.5, and percentages of households living with severe housing problems.Figure 1 presents average mortality rates for heart attacks, heard disease and high blood pressure per type of county.Heart disease mortality rates are lowest in rural areas compared to urban and metro counties.High blood pressure mortality rates follow a somewhat opposite pattern whereas heart attack mortality rates are lowest in metro counties and highest in urban counties.As for pairwise associations (Table 3), the percentage of households living with severe housing problems was positively associated with poverty, percentage of food stamp/supplemental nutrition assistance program recipients, and percentages of Native Americans, Hispanics, and African Americans.It was negatively associated with percentage of population aged 65 years and older and represented about third of the population.Ease of access to nearby parks (Percentage of Population Living within Half a Mile of a Park) was negatively associated with both heart attack and heart disease mortalities.Heart attack mortality rate was positively associated with heart disease mortality rates, educational attainment, and poverty.High blood pressure mortality rate was positively associated with heart disease, poverty, educational attainment, and percentage of African Americans.Heard disease mortality rate was positively associated with educational attainment, poverty, and percentage of African Americans.Understandingly, percentage of population in poverty was positively associated with the percentage of food stamps recipients, median income and unemployment.
Multivariate analysis of covariance (MANOVA) tests with the three dependent variables heart attack, high blood pressure, and heart disease mortality rates using Pillai Trace, Wilk's Lambda, and Hotelling Trace gave p-values of zeros, which reject the null hypothesis that the population has no effect on the three different health outcomes at the significance level of 0.05.Three univariate (one dependent variable) linear regressions were performed after applying the logarithmic transformation function on the independent variables of the most significant variables of Percentage without high school diploma, families with female head of household, percentage living in poverty, percentage of Asians, percentage of African Americans, percentage of Hispanics, percentage   of white population, percentage aged 65 years or older, percentage of population living within half a mile of a park, and annual average for PM2.5.The other variables were excluded because of their minimal association or because of their strong association with the other predictors.Table 4 presents the results of the three univariate multiple linear regressions for each of the health outcomes whereas Table 5 presents the results for the multivariate multiple linear regression.The three univariate models are statistically significant.Also, the significant predictors are the same between the three univariate and the multivariate multiple regression model.However, one of the advantages of conducting a multivariate multiple regression is the ability to test the coefficients across the different health outcome variables.In this case, ten different hypotheses were tested simultaneously: testing that the coefficient is zero for each of the ten predictors in all three equations.The residuals mean is near 0.0001 but its variance is as high as 769.Spatial distribution of the residuals is given in Figure   Codispersion coefficients for the three possible combinations of dependent variables (heart attack mortality vs. high blood pressure mortality, heart attack mortality vs. heart disease mortality, and high blood pressure mortality vs. heart disease mortality) were not constant.Thus, the hypothesis of intrinsic correlation could not be accepted.This means that the correlation structure of the three dependent variables is influenced by the spatial scale, which calls for cokriging [15].

Discussion
Research has shown associations between air pollution and mortality of respiratory and cardiovascular systems [16]- [19].Socioeconomic status (SES) has also been associated with these outcomes.Associations between heart attack, high blood pressure, and heart disease mortality rates and between them and SES have been docu-  mented as well [20]- [24].Nevertheless, such associations are seldom investigated in the same context of exposure to air pollution.Application of multivariate geostatistical techniques can assist in this regards because it has the advantage of making use of the spatial correlations between observations pertinent to communities to predict values at unsampled places.These spatial associations can exist even with lack of pair-wise correlations, known as "intrinsic correlation".Multivariate intrinsic correlations allow simplification of data modeling, which in turn enables us to address previously unidentified or seldom studied associations [25]- [27].Consequently, new public health findings can be sought that are relevant to each community.
"Hotspots" are geographic areas that form at specific times and places where the mortality or morbidity rate is consistently rising.This kind of clustering reflects high occurrence and frequency of disease, to the point that rates become highly predictable.However, the definition of a hotspot can be ambiguous due to lack of quantitative measures that can be incorporated or agreed upon among other issues.Nevertheless, studying the three mortality rates at the county level shows consistent spatial distribution where certain areas persistently have higher rates than the rest.Paying more attention to such hotspots can lead research to potentially stronger causal inferences [28]- [32].

Conclusion
This study investigated the three mortality rates of heart attack, high blood pressure, and heart disease in the United States, and their association with fine particulate matter and socioeconomic status represented by educational attainment, poverty, household income, unemployment rate, demographics, access to a nearby park, and urban-rural classification, in 2013.The multivariate geostatistical model used latitude, longitude, educational attainment, poverty, percentage Asian, percentage African American, percentage white population, and annual PM2.5 concentration as the mean covariates.These variables were also significant in univariate and multivariate multiple linear regressions.This study had the limitations of not exploring temporal variations, inherent issues with the data, non-consideration of other air pollutants and demographic factors like gender and age.Nevertheless, it pointed to significant findings including the county level which showed consistent spatial distribution where certain areas persistently had higher rates than the rest.Paying more attention to such hotspots could lead research to potentially stronger causal inferences.

Figure 1 .
Figure 1.Rates of health outcomes in rural/urban counties.

Figure 2 .
Figure 2. Residuals of the multivariate multiple linear regression model.model, which is supported by the low adjusted R square values.Codispersion coefficients for the three possible combinations of dependent variables (heart attack mortality vs. high blood pressure mortality, heart attack mortality vs. heart disease mortality, and high blood pressure mortality vs. heart disease mortality) were not constant.Thus, the hypothesis of intrinsic correlation could not be accepted.This means that the correlation structure of the three dependent variables is influenced by the spatial scale, which calls for cokriging[15].Figure3gives the direct variograms (diagonal) and cross variograms (off-diagonal) along with fitted linear model of coregionalization (straight line), with the distance maxed at 90 kilometers.The three mortality rates are predicted to keep rising with the persisting spatial concentrations

Figure 3
gives the direct variograms (diagonal) and cross variograms (off-diagonal) along with fitted linear model of coregionalization (straight line), with the distance maxed at 90 kilometers.The three mortality rates are predicted to keep rising with the persisting spatial concentrations (hotspots) shown in Figure 4.

Table 1 .
Data collected from CDC and EPA.

Table 3 .
The most notable pairwise associations.

Table 4 .
Univariate multiple linear regression results.
2. Positive residuals indicating over-prediction and negative residuals indicating under-prediction indicate the need to improve the

Table 5 .
Multivariate multiple linear regression results.
Educational attainment, poverty, percentage Asian, percentage African American, percentage white, percentage aged 65 years or older, percentage near parks, and annual PM2.5 concentration.Educational attainment, poverty, percentage Asian, percentage African American, percentage white, and annual PM2.5 concentration.