Visualizing Fit between Dengue and Climatic Variables on Capitals of the Brazilian Northeast Region by Generalized Additive Models

Recent analysis indicates that the numbers of dengue cases may be as high as 400 million/year in the world. According to the Ministry of Brazilian Health, in 2015, there were 1,621,797 probable cases of dengue in the country including all classifications except discarded, the highest number recorded in the historical series since 1990. Many studies have found associations between climatic conditions and dengue transmission, especially using generalized models. In this study, Generalized Additive Models (GAM) was used associated to visreg package to understand the effect of climatic variables on capitals of Northeast Brazilian, from 2001 to 2012. From 12 climatic variables, it was verified that the relative humidity was the one that obtained the highest correlation to dengue. Afterwards, GAM associated with visreg was applied to understand the effects between them. Relative humidity explains the dengue incidence at an adjusted rate of 78.0% (in São Luis-MA) and 82.3% (in Teresina-PI) delayed in, respectively, −1 and −2 months.


Introduction
Dengue Fever is fast emerging pandemic-prone viral disease in many parts of the Tropical countries are the most heavily affected due to environmental, climatic, and social conditions.Studies of climatic variables can improve knowledge and prediction of epidemic seasonality.The climate is an important factor in the temporal and spatial distribution of vector-transmitted diseases as dengue fever [2].
Many works sought to identify climatic influences on dengue, and to evaluate the ability of the climate-based dengue models to describe associations between climate and dengue, simulate outbreaks by generalized additive models-GAMs [3].This model provides a flexible method for identifying nonlinear covariate effects in exponential family models and other likelihood-based regression models.For this, it used a degree of freedom estimate to assess the importance of covariates based on the expected decrease in the deviance due to smoothing, computable from the trace of the appropriate smoother matrix [4].
We assessed the potential contribution of climatic variables on Dengue Fever (DF) incidences based in GAM, according Hastie and Tibshirani (1990) [5], and we provided suggestions to improve their performance generated from the statistical analyses of the direct and indirect associations.Mordecai et al. (2017)  [6] used generalized models associated with R package visreg to understand the impact of temperature on transmission of Zika, dengue and chikungunya.Specifically, Oliveira (2016) [7] used visreg associated with GAM to understand the effect of temperature on ovulation by Aedes aegypti in Rio de Janeiro.It wasn't found in the literature, to date, a study involving GAM and visreg package associated with relative humidity.

Methods
To understand the risk of DF incidence by the occurrence limits parametrization Open Journal of Epidemiology c) Annual and monthly heat index, respectively, HI-a and HI-m; as well as, its Function (HI-f), both according to Steadman (1979) [11]; and d) Human comfort index (HCI), according to Rosenberg (1983) [12].
2) DF cases collected by the site SINAN-Net 2 , from the Departamento de Informática do Sistema Único de Saúde-DATASUS; and transformed into DF Incidence Rate; and 3) Annual population size for each of the nine capitals studied, collected by the Sistema de Recuperação Automática-SIDRA 3 , from the Brazilian Institute of Geography and Statistics-IBGE.
The monthly reporting DF cases were converted to DF incidence rates, which, according to the Ministry of Health [13], this is defined as the number of confirmed cases (classic and hemorrhagic DF), by 100,000 people in certain geographic space and the current year, and calculated according to Equation (1): number of confirmed dengue cases in residents DF incidence= 100, 000 Total resident population in the given period × DF incidence are classified by occurrence bands, as criteria of the National Program for Dengue Control-PNCD/MS [13], which considers: 1) low incidence = 0| … 100; 2) average incidence = 100| … |300; and 3) high incidence = 300 … ∞.
All analyses were conducted using the R-Project Software, Version 3.0.34 .

Generalized Additive Model
The class of models known as generalized linear models, or GLMs, was formally introduced by Nelder and Wedderburn (1972) [15].temporal (months and years of the time series data) and climatic variables ) a set of predictors or independent/explanatory variables, a regression procedure can be viewed as a method for estimating the expected value of Y given the values of i X .The standard linear regression model assumes a li- near form for the dependency, according Hair Jr. et al. (2005) [16], described as: where ( ) 0 Var ε σ = .Given a sample, estimates of 0 1 , , .p β β β … are usually obtained by the least squares method.
According Hastie and Tibshirani (1990) [5], GAM consist of a random component, an additive component, and a link function relating that two components, like generalized linear models (GLM).The response Y, the random component, is assumed to have exponential family density: where θ is called the natural parameter and ∅ is the scale parameter.The conditional mean μ of the response variable x to the linear predictor η is re- lated to the set of covariates i X by a link function g.The quantity: Open Journal of Epidemiology ( ) ( ) defines the additive component, where i f are smooth functions, and the rela- tionship between the conditional mean ( ) x µ and the linear predict ( ) where i f are smooth functions, ( ) 0 Var ε σ = .
In order to be estimable, the smooth functions i f have to satisfy standar- dized conditions such as GAM and GLM can be applied in similar situations, but they serve different analytic purposes.GLM emphasizes estimation and inference for the parameters of the model, while GAM focus on non-parametric data, and this is more suitable for exploring the data and visualizing the relationship between dependent and independent variables, considering the estimation of the smoothing terms i f in GAM, described in Equation ( 6) [4].

Smoothers
The spatial distribution was modeled using a bi-dimensional smooth function.A smoother is a tool for summarizing the trend of a response measurement Y as a function of one or more predictor measurements , , . An important property of a smoother is its nonparametric nature.It does not assume a rigid form for the dependence of Y on , , , producing an estimate of the trend that is less variable than Y itself, since of penalized least squares method.Each smoother i s is controlled by a single smoothing parameter, specificity in the model or choose it automatically by the generalized cross validation method [17] [18] [19].The GAMs used in this work included a set of directly observed covariates and an s spline smothing function, as depicted in the equations below: where i Y is the response variable, in this work the Dengue incidence simulated index, β 's are the slope coefficients of the model, so

Package "Visreg"
This interface was used in this work for visualize the fit of regression models arising from of GAM, as from constructing surface by cross-sectional plots using two-dimensional contour or perspective plots.In addition to estimates of this relationship, the package also provides pointwise confidence bands and partial residuals to allow assessment of variability as well as outliers and other deviations from modeling assumptions [21] [22].The contourlines with high relative risk of DF incidence (Dengue RR) presented in the "visreg" plot were identified on the maps and their climatic limits observed in the model parameterization were considered, as areas with high occurrences of dengue rates.

Pearson's Correlation Coefficient
Pearson correlation coefficient (r) [23] [24] [25] was used for measuring direction and degree of linear association between dengue and climatic variables, by each capital of the Brazilian Northest.According to Bewick et al. ( 2003) [26], r can be given by: where Pearson correlation coefficient or product moment correlation coefficient (r) is a measure of shared variance between two variables, i x and i y based in their averages X and Y , and their standard deviations S x and S y .The sign indicates a positive or negative direction of the correlation, and the value suggests the power of the relationship between the variables, which value r can vary from −1 to +1, indicating a perfect and very strong positive linear relationship (r = +1), a perfect and very strong negative linear relationship (r = −1), or no linear relationship (r = 0) between the variables [26] [27] [28].
GAM shows high significant (p-value < 0.001) association between DF incidence and relative humidity over a range of time-lags 0 -2, 4 -5 and 9, being the lag 2 the most significant, with the largest z-value (z = 6.802).Already, in São Luis-MA, the simulated GAM presents high significant level (p-value < 0.001) association between DF incidence and relative humidity over a range of time-lags 0 -3 and 11, being the lag 1 the most significant, with the largest z-value (z = 7.993).
Table 3 shows the adjust coefficients for GAMs simulated in lags 0 and 1 with DF Cases (using logarithmic function of the population) and DF Incidences, as- making the models with DF incidences more parsimonious and therefore more suitable for use [29].In Teresina-PI, the modeling by GAM with relative humidity over a time-lag 2 explain 82.3% of the deviance on DF incidences while São Luis-MA over a time-lag 1 explain 78.0% of the deviance on DF incidences, with significant effects in the adjust coefficients with low effective degree freedom, respectively, 6.067 and 7.276; and low estimate of the intercept and respective z-value, making it the best simulated model.In the lag 0 (no lag effect), both models presented the best estimate and z-value, although they had the lowest R-adjusted between the variables measured, 0.699 in both.
Figure 2 shows the distribution of DF incidence and relative humidity as     Ehelepola and Ariyaratne (2016) [31] evidenced in their study a median increase in 7x of dengue incidences, for a relative humidity of 86%, according to figure 6 of that study.Neto and Rebêlo (2004) [32], studying the association between dengue cases and climatic variables in São Luis-MA, from 1997 to 2002, verified that dengue cases are directly related to the increase of precipitation and relative humidity, with a positive correlation this variable of 76.0% (r = 0.76; p < 0.05).In addition, the authors identified peaks of relative humidity in the months of March and April, an average variation from 85.6% (March 1999) to 89.3% (April 1997), while the highest percentage of dengue was presented in May, with an average percentage of 20.2% of cases recorded.While in this study, we identify an r-adjusted between these variables of 0.75 for a −1 month lag of relative humidity in relation to dengue.

Conclusions
The formulation of GAM model is nearly exactly the same as for GLM.These models use all the same families and link functions; but GAM is wrapping the predictors in a non-parametric smoother function, in this paper, specifically, the s spline.The GAM fit is more sensitive to minimizing deviance (higher wiggliness) than the default fit of the loess function.This model is also able to minimize deviance based on the logit transformation.The model output shows that an overall (parametric) intercept is fitted (the mean) on the scale of the logit transformation (logarithmic population of the capitals studied).
Modeling by GAM, assuming a Poisson distribution, explained 82.3% of the deviance of DF incidences, and significant effects were found in the estimates of all climatic variables on dengue; however, the high values of the effective degrees of freedom (edf) of smooth functions indicate that the association between dengue and climate is highly nonlinear.The estimate initially found, by the GLM and GEEGLM models for these studied variables, was too high, indicating the overdispersion data, however regressions by GAM reduced significantly excess How to cite this paper: da Silva, J.C.B., Karam, H.A. and Machado, C.J.S. (2018) Visualizing Fit between Dengue and Climatic Variables on Capitals of the Brazilian Northeast Region by Generalized Additive Models.Open Journal of Epidemiology, 8, 259-275.https://doi.org/10.4236/ojepi.2018.84020Open Journal of Epidemiology world.Dengue flourishes in urban poor areas, suburbs and the countryside but also affects more affluent neighbor hoods in tropical and subtropical countries.Dengue is a mosquito-borne viral infection causing a severe flu-like illness and, sometimes causing a potentially lethal complication called severe dengue.The incidence of dengue has increased 30-fold over the last 50 years.Up to 50 -100 million infections are now estimated to occur annually in over 100 endemic countries, putting almost half of the world's population at risk [1].

Ferreira
et al. (2017) [8] used (Logistic Regression and) GAM associated to Binomial Negative distribution and model offset to understand DF cases relationship meteorological variables, specifically, temperature, rainfall and humidity.This work aims to identify the risk of DF incidence by the occurrence limits parametrization of climatic variables as a function of the time (months and years), in capitals of the NEB, from January 2001 to December 2012, as from visualizing the fit of regression models arising from of GAM, assuming Poisson Distribution, by cross-sectional plots using two-dimensional contour, by "visreg" package function.
of climatic variables on capitals of the NEB, we conducted the GAM analysis by average monthly data observed from 9 capitals of Brazilian Northeast (NEB), in the period of January 2001 to December 2012.These capitals and their respective codes of the Federative Unit (referring to their States): Aracaju-SE, Fortaleza-CE, João Pessoa-PB, Maceió-AL, Natal-RN, Recife-PE, Salvador-BA, São Luís-MA and Teresina-PI, according Figure 1.The data were provided by: 1) Climatological variables: rainfall, in mm, (PRP); minimum, average and maximum temperature, in ˚C, (respectively, T-min, T-mean and T-max); relative humidity, in % (RH); all collected by Meteorological Databank for Education and Research-BDMEP 1 , from the National Institute of Meteorology-INMET.From these variables collected, we calculated: a) Vapour pressure deficit (VPD) and saturated vapour pressure deficit (SVPD), all according Allen et al. (1998) [9]; b) Evapotranspiration of Reference (ETO), according Thornthwaite (1948) [10];
most commonly used link function is the canonical link, for which η θ = .Assuming that μ(x), is the mean of the Poisson distribution, the dependence of μ(x) and independent variables i X , the link function for the Poisson model is the log function ( 1986, 1990)[4] [5], the generalized additive model (GAM) fits a response variable Y by a sum of smooth functions of the explanatory variables, i X for i = 1, ..., p by modeling the dependency as: suming a Poisson distribution, in Teresina-PI and São Luis-MA, in the period from 2001 to 2012.The largest effective degree freedom (edf) values in DF cases simulations indicate nonlinear data when compared to DF incidences.Already, high values of the mean square error (Chi.sq), also simulated with those cases, characterize the overdispersion data.Although the models with DF cases have better fit of the explained deviance; however, your BIAS are extremely high,

Figure 2 .
Figure 2. Seasonality effect, by boxplot, on DF incidence (DF Cases/100.000inhabitants) in function of months (A and E) and years (B and F); and relative humidity (%) in function of months (C and G) and years (D and H), in Teresina-PI and São Luis-MA, respectively, from January 2001 to December 2012.

Figure 3
Figure3shows the visualization of the effect of simultaneous variance between relative humidity and time (months and years, Figure3(A) and Figure3(B), respectively), in relationship to DF incidence risk (Dengue RR), simulated on DF incidences, from January 2000 to December 2012, by "visreg" function on simulated regression GAM using penalized s splines smoothing, in São Luís-MA.

Figure 3 (Figure 3 .
Figure 3(A) shows a large nucleus limited to between 87.0% and 90.0%relative humidity between August and October months, with high relative risk Dengue (RR = 5.0), that is, high DF incidences.Comparing this figure to contour in function of the years, Figure 3(B), 3 nuclei are observed, characterizing the years of highest DF incidences, being 83.0% and 90.0% of relative humidity range highly significant to occurrences those incidences (RR ≥ 4.0).
All numerical output of GAM and respective intercepts in the capitals of the NEB have obtained setting on p-value of 0.001.The capital of João Pessoa-PB is the one with the smaller values of mean squared error (Chi.sq); in other words, the values of the estimated parameters of climatic variables around the true value of DF cases present a greater accuracy and precision in the quality of response by GAM.AccordingBolker (2008) [29], the mean square error represented by the sum of the variance and bias square indicates the quality of an estimator and shows the total change around a true value, in this study, the DF cases.We found a high correlation of DF incidence with relative humidity is lagged in 1 and 2 months, respectively, in São Luis and Teresina cities. Wu et al. (2007)[30] observed cross-correlation with statistical significance between DF incidence and relative humidity over a range of time-lags from −1 to −3 months, above all the most dominant effect at a lag −2 months (r = 0.202, p < 0.005).

Table 1 .
Table 1, it is observed the Pearson's correlation coefficient (r) with respective Pearson correlation coefficient (r) with respective 95% confidence interval (CI) and p-value, between DF cases and climatic variables, by capital of the NEB.suggesting that there are other factors involved in the increase of their DF cases.Table2presents the parametric coefficients of GAM between DF incidence and relative humidity over the period of one year (0 -12 time-lags), using temporal variables (months) in Teresina-PI and São Luis-MA cities.In Teresina-PI, (p-value < 0.001) observed to seven of 12 climatic variables in relationship DF cases.Already, in Aracaju-SE, Recife-PE and Salvador-BA, there are the lower absolute mean correlations and respective no significance p-value observed,

Table 2 .
Parametric coefficients of GAM between DF incidence and relative humidity over the period of one year (0 -12 time-lags), using temporal variables (months) with s term splines smooth, in Teresina-PI and São Luis-MA, in the period from 2001 to 2012.