Application of Multivariate Geostatistics in Environmental Epidemiology : Case Study from Houston , Texas

This study represents an example of investigating the associations between the joint exposure to ozone (O3) and particulate matter of sizes less than or equal to 2.5 micrometers in aerodynamic diameter (PM2.5) and cardiovascular disease (CVD) emergency room (ER) visits and chronic obstructive pulmonary disease (COPD) ER visits using multivariate geostatistics in Houston, Texas, from 2004 to 2009. Analyses showed lack of strong pair-wise association among the predictors of O3, PM2.5, wind speed, relative humidity, and temperature. Whereas CVD and COPD ER visits exhibited a strong positive correlation. Both outcomes drastically increased from 2006 possibly due to immigration from neighboring locations. Parametric testing showed that the data differed significantly between the years. Multivariate multiple regression results on the 2009 data showed that PM2.5, relative humidity, and temperature were significant to both CVD and COPD ER visits. Codispersion coefficients were constant which justified the assumption of intrinsic correlation. That is, our predictors had strong influence on the spatial variability of CVD and COPD ER visits. This multivariate geostatistics approach predicted an increase of 34% in CVD ER visits and 24% increase in COPD ER visits, which calls for more attention from policy makers. The use of multivariate geostatistics analyses enabled us to successfully detect the effects of risk factors on both outcomes.


Introduction
Assessing causal inferences between exposure to air pollution and respiratory and cardiovascular diseases provides epidemiological evidence regarding the adverse health effects of air pollution.Most of relevant studies quantify the possible causality through association between the pollutants and the health outcomes due to exposure to pollutants either singularly or jointly.However, the effect of local air pollution versus neighboring levels is not fully explored yet [1] [2], which points to the importance of studying community-related characteristics which could help in estimating the causal inference between air pollution and its respiratory and cardiovascular adverse health effects.Application of multivariate geostatistical techniques can assist in this regards because it has the advantage of making use of the spatial correlations between observations pertinent to neighborhoods to predict values at unsampled places [3] [4].
This study capitalizes on previous studies which assess the effect of ozone and particulate matter on CVD and COPD ER visits in Harris County, Texas from 2004 to 2009 [5] [6] by studying the effect of joint exposure to air pollutants and other factors like temperature and relative humidity on both health outcomes instead of univariate modeling.The use of multivariate geostatistics has been useful in other applications like mining, but with the increased quality of spatial health data, their relevance is gradually increasing.This study presents a bivariate example with two dependent variables (CVD and COPD ER visits, which could be replicated to incorporate more dependent variables.This can provide immeasurable benefits to epidemiologists and policy makers by incorporating more outcomes that have been documented to be associated, thus, representing a more realistic situation than modeling the effects on one outcome at a time.

Results and Discussion
Table 1 presents descriptive statistics for the collected variables.Analyzed variables included O3 and PM2.5 lagged at two and one day as well.Data exhibited high shifts of skewness and kurtosis from 0 and 3, respectively, which are the characteristic values of normal distribution.Therefore, the variables were of non-symmetric distributions, with long tails and several outliers.The total number of CVD ER visits during the six years was 96,596.Those visits kept increasing with time, starting with 3309 visits in 2004 and ending with 30,765 in 2009, a greater than 900% increase (Figure 1).Modeling the association of each visit to temperature, relative humidity, wind speed, wind direction, age, ethnicity, gender, day of the week, PM2.5 of the same day, PM2.5 of the previous day, O3 of the same day, and O3 of the previous day showed that the following variables were significant: visits on Friday, Saturday, Sunday and Tuesday, being African American male, wind speed, O3 of the same day, PM2.5 of the same day and previous day [6].The total number of COPD ER visits was 95,765.Similar to CVD ER visits.COPD ER visits had positive significant associations with the being African American male, relative humidity, O3 of the same day, and PM2.5 of the same day [5].Pair-wise correlations between the variables are given in Table 2.The most notable associations include the strong positive association between CVD and COPD ER visits of 0.74 and the negative association between temperature and COPD ER visits of −0.27.The number of CVD and COPD ER visits was then summed up by date and zip code.Fisher's F test to test the null hypothesis of non-changing variances of each outcome between the years, gave p-values less than the significance level of 0.05.The same result was obtained for t test.Hence we reject the null hypothesis of equal variances and means respectively.That is, neither CVD nor COPD ER cases for Houston during those years can be combined as if they were coming from one statistical distribution.Consequent analyses were conducted on the 2009 data.Multivariate multiple regression results showed that the model is statistically significant and that lagged ozone and PM2.5 concentrations were not statistically significant to both outcomes.Interestingly, only PM2.5, relative humidity and temperature were significant to both CVD and COPD ER visits (Table 3).The effect   of these three factors on the mechanisms of toxicity for respiratory and cardiovascular disease can be attributed to the oxidative stress leading to inflammation, which generates the physiological processes evolving as respiratory and/or cardiovascular symptoms like narrowing of airways, shortness of breath, wheezing, cough, and the ability of particles to penetrate the lung wall accumulating in the pulmonary interstitium between the lung and the bloodstream [13]- [15].As for heat's effect on the health outcomes, research shows that hot-humid conditions are more challenging for patients with respiratory and cardiovascular diseases because heat and humidity affects the body's ability to preserve normothermia [16] [17].The increase in humidity combined with particulate matter pollution allows for the delivery of the pollutant deeper into the lung [18].Both heat and humidity exacerbate respiratory and cardiovascular conditions [19].
As for the spatial association between CVD and COPD ER visits across the different zip codes of Houston, the codispersion coefficients were constant with a variance of 0.001 (Figure 2).These are the ratios of the cross versus the direct variograms which can capture spatial association that is not possible to calculate following conventional methods.Therefore, we can assume the presence of intrinsic correlation whereby the model of intrinsic correlation is entirely specified by its spatial structure, and by the variance-covariance matrix [3] [20] [21].Tests of normality lead to rejection of the null hypothesis of normality for CVD and COPD ER visits.Therefore, they were transformed using the logarithmic function.In order to predict future numbers of ER visits for each of the zip codes variograms were estimated, which exhibited anisotropy.This indicates that the independent variables had strong influence on the spatial variability of the two health outcomes.Directional variograms were estimated to characterize the spatial variability in different directions.Fitted variogram parameters (nugget, range, sill, smoothness parameter, practical range) were used in ordinary kriging [22].Cross validation proved reliability of the approach used.This approach predicts that CVD and COPD ER visits will increase by more than 34% and 24% respectively.These predictions can be compared to future ER visits from following years.Knowing that Houston is one of the top cities with unpaid ER visits [23], this high percent carries multi-fold responsibilities: The need for more sophisticated interventions and resources in order to manage the continually increasing number of visits.
The persistent increase in CVD and COPD ER visits (Figure 3) in Houston, Texas could be attributed to variations in data coverage and completeness as well as socioeconomic factors including immigrations due to severe weather disasters where the majority of movers are usually those who belong to the traditionally under-served under-represented groups [5] [6].The previous models were univariate, i.e.; the joint effects of the risk factors were investigated to one health outcome at a time.While these analyses provided good prediction because of inclusion of multipollutants (O3 of the same day, O3 of the previous day, PM2.5 of the same day, and PM2.5 of the previous day), they did not investigate the joint effect of the risk factors on both outcomes.This study allowed us to see how CVD and COPD ER visits are related once control variables have been included (through the examination of the correlation of the residuals), and that we can test the effect of ozone and PM2.5 both across CVD and COPD ER visits jointly [24].Figure 4 gives a proposed decision tree for applying multivariate geostatistics in environmental epidemiology, which was conducted in this study.Grayed boxes were not followed.

Conclusion
This paper presents an approach to study the multivariate spatial nature between health outcomes and multipollutants using multivariate geostatistics.The example presented modeled the effect of ozone, PM2.5, relative humidity, wind speed, and temperature on both CVD and COPD ER visits at the same time.Now that we answered questions like the effect on both outcomes simultaneously, we can address more complex questions like the effect on more health outcomes that have been historically associated to each other.Further investigation is recommended in order to address potential issues like computational complexity and bias [22].

Figure 1 .
Figure 1.Daily Numbers of CVD and COPD ER visits from January 2004 to December 2009 in Houston, Texas.

Figure 3 .
Figure 3.Total number of ER visits for CVD and COPD in Houston, Texas.

Figure 4 .
Figure 4. Proposed decision tree for applying multivariate geostatistics in environmental epidemiology.

Table 1 .
Summary statistics for the dependent and independent variables.

Table 3 .
Summary of type of association investigated by dependent variable.