Assessing the Effect of Climate Factors on Dengue Incidence via a Generalized Linear Model ()
1. Introduction
Dengue is a mosquito-borne viral disease transmitted by female mosquito species, known as Aedesaegypti, the same types of mosquitoes that spread the Zika Chikungunya virus. Dengue virus is believed to come from the Flaviviridae family. Four distinct serotypes of the virus cause dengue, such as DEN-1, DEN-2, DEN-3, and DEN-4 [1]. However, some people have been reinfected with another serotype, which is at high risk of developing more severe diseases such as dengue hemorrhagic fever (DHF) and dengue shock syndrome (DSS) [2] [3]. According to Bakshi [2], both DHF and DSS exhibit a distinct symptom from normal dengue fever. Common dengue fever shows rash, joint muscle pain, and back-high body temperature, while DHF and DSS display signs of bleeding under the skin, abdominal pain, severe vomiting, and sudden shock. DSS appears to affect children under the age of 10.
Dengue is a rapidly spreading disease that has become a primary concern for international health, especially the World Health Organization (WHO). According to the WHO, dengue diseases are the most widely identified in Asia Pacific countries, including Bangladesh, Malaysia, the Philippines, Vietnam, and South America [4] [5]. Nearly 40% of the world’s population is now at risk of dengue fever [6]. Those countries with a subtropical climate are the most suitable areas for dengue reproduction. Dengue has increased rapidly in urbanization in tropical and subtropical countries such as Malaysia [7]. It was first reported in Malaysia in early 1902 [8], followed by dengue hemorrhagic fever in Penang in 1962 [9] [10].
Climate change is the global phenomenon of climate transformation characterized by changes in rainfall, temperature, wind patterns, humidity, and other climate measures that occur over several decades or longer. Variations of these climate factors could have an impact on the incidence of dengue disease. Climate factors play a significant role in transmitting dengue disease [11] [12]. Researchers found that rainfall can influence the transport and spreading of the infectious agent while the temperature stimulates their growth and survival [13] [14] [15]. In general, climate conditions have the most important effect on the dengue life cycle. According to Liu-Helmersson et al. [16], temperature fluctuations can significantly affect dengue transmission and the Aedes mosquito population. The larva population’s size and density would also be determined by the amount and rainfall pattern [16] [17]. Environmental conditions such as warm temperatures and high humidity would promote the growth of the larva population. The dengue recorded in Asian Pacific countries such as Thailand [18] and Singapore [19] was also associated with temperature, relative humidity, and rainfall. All these changes have shown that climate conditions have a significant role in the transmission of dengue fever.
Dengue epidemics tend to have a seasonal pattern, with transmission often peaking or increasing during rainy seasons. The humidity, rainfall, wind speed, and ambient temperature can affect the mosquito population’s distribution, reproduction, and feeding pattern. Aedes mosquitoes are living microorganisms that can adapt rapidly to a new environment. It is also challenging to monitor and interrupt the transmission of vector populations. As a result, the Aedes life cycle develops rapidly with a high amount of rainfall, leading to a high humidity level.
For most climate-sensitive infectious diseases, climate change may impact the whole process of a disease’s development [20], including the survival, reproduction, or distribution of disease pathogens and their hosts, as well as the availability and means of such pathogens’ transmission environments. Modelling the vector-borne disease was challenging for several researchers [21] [22] due to the complex factors included in the interaction between vector and host dynamics, immunity level, or vaccination [23]. Ten Bosch et al. [24] used mathematical models by considering six separate dengue models with important dengue characteristics, such as cross-immunity, antibody-dependent enhancement, and seasonal power. In their analysis, pattern-oriented modelling strategies were adopted to reflect dengue dynamics such as multi-annual variability and mean period between peaks. In the other study by Atique et al. [25], time series models were used to examine associations between recorded dengue cases and climatic parameters. The authors found that regional climate phenomena played a role in the transmission of dengue. In a recent analysis by Jayaraj et al. [26], a multivariate Poisson Regression and Seasonal Autoregressive Integrated Moving Average (SARIMA) predictive model was developed to predict future outbreaks in Tawau, Malaysia. The model developed in their study demonstrated the ability to predict future dengue outbreaks for one to four months in advance.
However, some of the mentioned models are limited to several constraints, such as the inability to include non-continuous variables, accounting for nonlinear relationships, or handling non-Gaussian error distributions. Hence, it is more appropriate to use a generalized linear model (GLM) to deal with the nonlinear phenomenon. GLM could cater to any data such as binary or count data [27]. Previous studies have shown that GLM with the Poisson family has a problem of over-dispersion, in which the assumption of the equality of mean and variance is violated [27] [28]. The issue of over-dispersion can be solved using the Negative Binomial or Quasi-Poisson family [28]. However, the two approaches cannot be compared based on their results because they differ on the type of study and the context [29] [30].
Dengue epidemics in Malaysia were observed a long time ago. Despite the various attempts made by the Ministry of Health to minimize the disease, the number of cases continued to increase. Even a local climate variation associated with the rapidly increased dengue incidence is not very well understood. Besides, a dataset is common to have a collinearity effect between variables when dealing with climate variables. Thus, this study’s primary goal is to investigate the effect of climate factors on dengue incidence using a generalized linear model by considering the collinearity between climate factors.
2. Study Area and Data
Petaling Jaya is the largest city in Selangor, situated in the western part of Peninsular Malaysia with a surface area of approximately 97.2 km2. It is recognized as the satellite town of Kuala Lumpur. Petaling Jaya has an equatorial climate and is heavily influenced by monsoons. The city has two rainy seasons aligned with the Southwest monsoon from May to August and the Northeast monsoon from November to February. The average annual rainfall in Petaling Jaya is approximately 2438 mm, with an average temperature of 27˚C. Petaling Jaya has a high-density population of around 217,700 in 2017.
Data on dengue cases for the year 2014 were retrieved from the Ministry of Health Malaysia (MOH) through the portal of Malaysian Open Data Sources or known as Data Mampu at data.gov.my. Climate factors such as temperature, rainfall, humidity, and wind speed were used as explanatory variables. These climate factors were obtained from the Malaysian Meteorological Department. Temperature is one of the principal parameters of thermodynamics that scientifically imply the ecological system. The minimum temperature, maximum temperature and mean temperature were recorded daily. The daily amount of rainfall in millimetres was collected in a day over 24 hours. Daily humidity is defined as the ratio of water vapours’ partial pressure in an air gaseous and the saturated vapours pressure of water vapours at a given temperature. It is measured by a device name hygrometer with a unit percentage (%). Daily wind speed for a day at the weather station is defined by the movement of air and other gases in an atmosphere around the measurement area. This study involved weekly dengue cases and weekly climate data. Table 1 lists the response and the explanatory variables.
3. Statistical Methodology
3.1. Multicollinearity Detection
Multicollinearity is an issue that commonly happened in linear regression. The problem occurred when two or more regressors are highly correlated. The model’s parameter will become indeterminate, and the estimation’s standard errors
will become infinitely large when there is collinearity. The available R software packages, known as “mctest”, have been used to detect multicollinearity between variables. It can be used to compute individual and overall multicollinearity diagnostic measures. Overall, collinearity diagnostic will look at several aspects, such as the determinant results and the value of Chi-squared. The individual multicollinearity diagnostic measures will be used to diagnose the collinearity among independent variables. The present study focuses only on the universal indicator, Variance Inflation Factor (VIF). The higher the value of VIF, the more collinear between each of the regressors will be. As a rule of thumb, if the value of VIF of a variable exceeds 10, then the variable is highly collinear, implying that the jth variable’s multiple correlations
exceed 0.90. The package also indicates which regressors may be the reason for collinearity among regressors.
The formula for VIF is written as follows:
, (1)
where j represents the independent variables in the model, while
is the unadjusted coefficient of determination for regressing the jth independent variable on the remaining ones, such that
3.2. Generalized Linear Model
A Poisson GLM is used in this study since the dengue cases consist of count data. Besides, GLM is applied if the response variables are not distributed normally and consist of non-negative integers. The regression analysis assumes that the basic distribution for the response variables Y is under consideration of the Poisson family. Let
be independent random variables with
is a dependent variable representing the number of dengue cases of the week i for a given period and
represents both mean and variance of
. The Poisson probability distribution is given as
(2)
Let
where
.
A generalized linear model [31] is made up of a linear predictor that can be written as:
(3)
where the predicted value of
itself is a linear combination of the explanatory variables replaced by a linear predictor
, representing the aspect of the linear models. The mean of the response,
and the linear predictor,
is specified by a link function, denoted as
. The link function is given as
(4)
The symbol
is used to denote a function of some independent variables via a log link function, as stated below:
(5)
The prior assumption in GLM Poisson is the equality of the mean and variance [32]. A negative binomial regression is a generalization of Poisson regression, which loosens the restrictive assumption that the variance is equal to the Poisson model’s mean. It is based on the Poisson-gamma mixture distribution.
A negative binomial model for count data is used as a different generalization of the Poisson model that can be used to deal with over-dispersion. The mean
for fixed
can differ across observations i based on a distribution of gamma with mean
and a fixed parameter of shape
that reflect the additional variation due to the heterogeneity in the context of GLM. The expected value of the response is given as
and
.
Suppose
as the variance function can be written as follows
, (6)
where
is a type of dispersion parameter.
As a result, the density function
is given as
, (7)
where
denotes the gamma integral specialized to a factorial integer argument [33].
For the GLM, the deviance takes the form of the likelihood ratio, which is defined as
. (8)
For the models that are estimated by the maximum likelihood, one way in comparing the non-nested model is through the Akaike Information Criterion (AIC), which is based on the fitted log-likelihood function written as
. (9)
4. Results and Discussion
This section examines the collinearity among studied variables, followed by Poisson GLM and negative binomial analysis.
4.1. Testing for Multicollinearity
Based on Table 2, the value of the standardized determinant is 0.000. Besides, the calculated value of the Chi-square test statistics was recorded as 559.409, resulting in high significance that implies the presence of multicollinearity in the model specification.
Table 2. Overall multicollinearity diagnostics.
According to Imdadullah et al. [34], if
, then the collinearity among regressors exists. The same goes for the value of the Chi-square test. The collinearity exists among regressors if
. Hence, it is true that the collinearity among regressors exists as the value of Chi-square is greater than the value at a 5% significance level, i.e., 559.409 > 24.996.
The diagnostic of individual collinearity is shown in Table 3. If the variable obtained a VIF value exceeding 10, then the variable is highly correlated. It is shown in Table 3 that there are three independent variables, maximum temperature (X2), minimum temperature (X3), and mean temperature (X4), which exceeds the VIF values of 10. These show that these three variables are highly correlated.
The present study excludes those variables with the value of VIF above 10 and removes the variables that have logically appeared as redundant information. Since both variables of X2 and X3 supply similar mean temperature information (X4), these two variables will be removed during the remedial diagnostics. We maintain the mean temperature as one of the variables in the analysis. After removing these variables, the VIF values reduce below 10. The results of the individual VIF diagnostics after the remedial measures are plotted in Figure 1. The figure illustrates the plotted values of individual VIF diagnostics before and after remedial measures. The red dashed line refers to the default threshold or default limit that has been set.
4.2. Poisson and Negative Binomial GLM
Symptoms of dengue fever, such as sudden high fever or long-term fever, severe joint or muscle pain, skin rash, and nausea, can be detected in a period of three to seven days. For skin rashes, it can appear two or five days after the onset of fever. According to WHO, the time-lapse one-week time or a lag of seven days is relevant and can be considered in the study related to dengue fever. Therefore, the time-lapse one week is chosen for the model and will be used throughout the analysis. A GLM using the Poisson and Negative Binomial family was applied to determine the significant relationship between dengue cases and climate factors. Table 4 shows that the AIC values for the GLM Poisson are relatively high compared to the AIC values for the Negative Binomial. The AIC values reduced
Table 3. VIF values before remedial measures.
(a)(b)
Figure 1. The plotted graph value of individual VIF multicollinearity diagnostics: (a) Graph of VIF values of all regressors before remedial measures; (b) Graph of VIF values of all regressors after remedial measures.
Table 4. Result of AIC values for time-lapse of one week.
when using a Negative Binomial. Hence, the model with a Negative Binomial is much better than the Poisson GLM, respectively, when there exists over disperse dataset.
A correlation analysis was carried out to investigate whether the entire climate factors such as rainfall, temperature, humidity, and wind speed significantly affected dengue incidence. The analysis was done through package R, known as “mctest”. Based on Table 5, it seems that all regressors have a negative correlation with dengue cases. Some variables are significant, while others are insignificant based on the p-values. Variables X4, X5 are highly correlated with each other since the value over the threshold of 0.7 may be considered collinear. These two
Table 5. Correlation analysis between the dependent variable, each regressor and their p-values.
regressors may be the reason for collinearity among regressors, as suggested by the “mctest” package. Hence, we considered three separate models using a Negative Binomial GLM.
The first model consists of all four regressors, the second model excludes X4, while the third model excludes X5. Note that other variables remain the same. The GLM models can be written as follows
Using the time-lapse of one week, each model’s AIC values will then be compared to each other. The results of AIC values for all three models are displayed in Table 6. It can be seen that the second model provides a smaller AIC value, followed by the third model. Hence, the second model, including humidity, wind speed, and rainfall, is the optimal model. These variables have the greatest significant impact on dengue incidence.
Table 7 presents the significance values of the climate variables on dengue incidence. All the climate factors had a significant effect on dengue incidence. The results show that the p-values of the Chi-square distribution were less than a 5% significance level, which is consistent with the early assumption that rainfall, humidity, temperature, and wind speed might influence the incidence of dengue in Petaling Jaya.
The optimal model, including wind speed, humidity, and rainfall, yields a negative relationship with dengue incidence. These findings indicate that a low level of humidity, a low amount of rain, and low wind speed influences Aedes’ survival, contributing to the increase in dengue incidence. Of all climate factors, wind speed has the most significant influence on dengue incidence. The wind speed has a complex effect on dengue infection’s environmental factors with different interactions, either directly or indirectly. Low wind speed is inversely associated with dengue cases [35].
Table 6. AIC values for all models of dengue incidence in Petaling Jaya.
Table 7. Parameter estimation for all models and their corresponding probability values.
***Statistically significant with p-values < 0.05.
The third model also indicates that temperature significantly influences the rise of dengue incidence because of the reproduction from the egg growing into an adult, and mosquitoes need an appropriate temperature for a rapid growth pattern. Temperature affects the potential spread of the dengue virus through each stage in the mosquito’s life cycle. A low level of temperatures adversely affects the survival of adult and immature Aedes. In contrast, a high-temperature level may help the eggs or larvae survive in the winter seasons.
Figure 2 displays the relationship between dengue incidence and climate factors graphically. Figure 2(a) illustrates the relationship between the number of dengue cases with mean rainfall, while Figure 2(b) portrays the number of dengue cases with the mean humidity. Although rain generally helps increase the breeding process with heavy rain or storm, it may damage the existing breeding sites and interrupt mosquito eggs or larvae’s development. In other words, heavy rainfall flushes away all the eggs or larvae, waiting for the production process out of the pools. The increase in the rain is usually associated with the rise in the level of humidity, stimulating the growth of Aedes and their survival.
Figure 2(c) shows that the Aedes mosquitos can affect growth in both conditions, either high or low temperature [16] [36]. This Aedes mosquito can stay active and keep laying an egg as much as possible during rainy seasons with the surrounding temperature’s support. High temperature can affect the mosquitoes’ mating behaviours and reproduction activities, while low temperature helps in
the survival desired. Based on the parameter estimations in Table 7, the temperature harms dengue incidence. A low wind speed exhibits typically high dengue cases, which supported the fact that a low wind speed level influenced the mosquitos’ rapid growth [35]. The statement is consistent with the plot in Figure 2(d). The wind speed tends to suppress mosquito flight, which affected their ovipositional. Increasing wind speed generally will cause a decrease in mosquito flight with 1 - 4 minutes per the second threshold of their flight inhibition order.
Based on the visual displayed in Figures 2(a)-(d), it could be summarized that there may exist a seasonal pattern of dengue cases in Petaling Jaya. An upward trend was detected in dengue incidence at the early and the end of the year. On the other hand, a decreasing trend was observed in the middle of the year. There is a possibility that the monsoonal flow influences the pattern of dengue incidence in Petaling Jaya.
5. Conclusions
This study focused on how climate factors change dengue incidence in Petaling Jaya, Selangor, the second-largest city in Malaysia. The climate factors, including temperature, humidity, rainfall, and wind speed, were statistically significant with dengue cases. Since all climate factors are highly correlated with each other, the present study proposed several collinearity diagnostic measures through the R package known as “mctest”. The package helps to distinguish which regressors are the causes of the collinearity.
Wind speed and temperature were the two regressors that cause the collinearity between regressors. Hence, this study proposed three models, including all regressors, and excluding either temperature or wind speed in the second and the third models. Based on the regressors suggested by the “mctest”, these regressors will be included in GLM. The existence of the over-dispersed data set has also been considered. The negative binomial was applied for all the three GLMs since it provided a smaller AIC value than Poisson GLM. The lag operator of one week or seven days was used throughout the analysis since the optimum days are significant with dengue cases.
Rainfall, temperature, humidity, and wind speed are the climate factors that significantly affect dengue incidence in Malaysia and the world view, particularly in countries with a subtropical climate. A negative effect of climate factors has been identified significantly with dengue incidence, with the most prominent effect is contributed by wind speed. A low humidity level, a low amount of rain, and low wind speed influence Aedes’ survival, contributing to dengue incidence. Therefore, the health authorities can strengthen the surveillance by enhancing early disease detection, provide accurate planning, and better health care financing mechanisms.
Climate factors are not the only predictor that influences the rise in dengue incidence. Future studies should include socio-demographic components such as population growth, travel, or migration rate that should also be considered to link with dengue incidence. This study is limited to one focus area; hence, several areas representing various geographical regions should be considered in the future.
Acknowledgements
The authors would like to express their gratitude to the Ministry of Higher Education (MOHE) for the funding given under the Fundamental Research Grant Scheme (FRGS/1/2020/STG06/UTM/02/3) under vote 5F311. We are also grateful to Universiti Teknologi Malaysia for supporting this project with Research University Grant (QJ130000.3854.19J58).