Temporal Modelling of Dengue Fever in Côte d’Ivoire: Performance of the GAM and ARMAX Models with and without Climate Lag ()
1. Introduction
Dengue fever (dengue) is an infectious disease that occurs mainly in tropical and subtropical regions [1]. It is caused by the dengue virus (DENV), a member of the Flaviviridae family and the Flavivirus genus, which is transmitted by mosquitoes of the Aedes genus, mainly Aedes aegypti and Aedes albopictus.
Dengue affects more than 50 million people annually, while around two and a half billion people worldwide are at risk of infection [2]. Dengue fever is estimated to be responsible for 10,000 deaths in more than 125 nations [3]. Current statistical forecasts estimate that by 2080, 60% of the world’s population will be exposed to dengue fever [4]. Africa is one of the regions most affected by arboviruses such as dengue [5]. In 2023, 171,991 cases of dengue fever and 753 deaths were reported in the region [5]. Circulation of the virus has been confirmed in more than 30 countries, with outbreaks in 15 countries, including Côte d’Ivoire, Burkina Faso and Nigeria. As of 19 December 2023, 11 countries were still experiencing an outbreak. Burkina Faso was the worst affected country, with 146,878 suspected cases and 688 deaths, giving a case-fatality rate of 0.5% [5].
In 2024, dengue fever continues to represent a major threat to public health in Africa. Between week 1 and week 50, 176,481 cases were reported in 15 member countries of the African Union, including 30,324 confirmed cases, 25,249 probable cases and 120,908 suspected cases, with 136 deaths recorded (case-fatality rate: 0.08%). Burkina Faso remains the worst affected country, with 102,849 cases and 99 deaths, followed by Cape Verde and Mali [6].
In Côte d’Ivoire, although the number of cases is relatively low (39 cases reported without deaths over the period), the continued presence of the virus confirms its circulation in the country. This epidemiological context justifies the need to strengthen surveillance systems and to study the environmental and climatic factors influencing the dynamics of dengue transmission.
Numerous mathematical and statistical models have been developed to predict the onset, dynamics and scale of dengue epidemics, combining environmental and biological approaches [7]. Among the most widely studied factors are climatic variables, which are thought to play a key role in triggering epidemics [7]. Research carried out in Indonesia, Singapore, Mexico, Puerto Rico, Taiwan Region and Thailand has highlighted the impact of high temperatures, humidity and heavy rainfall on the increase in dengue cases [8]-[16].
In addition, various statistical models have been used around the world to analyse the relationship between climatic factors and dengue fever. According to a study carried out on this subject, predictive modelling approaches fall into two main categories: statistical models (60.6%) and machine learning techniques (39.4%) [17]. The most frequently used statistical models include time series and autoregressive models (26.7%), linear regression models (18.3%), Poisson regression models (18.3%), and generalised additive models (GAM) (16.7%), which are particularly well suited to exploring non-linear relationships between climatic variables and cases of dengue fever [17].
In Côte d’Ivoire, very few multivariate dengue modelling initiatives have been documented to date, even though these tools could prove essential for anticipating epidemic outbreaks. The general aim of this study is to assess the performance of different statistical models in predicting confirmed cases of dengue fever in Côte d’Ivoire.
To do this, a mixed approach was adopted, using methods from the two main families mentioned above. On the one hand, a generalised additive model (GAM) was used to take into account the non-linear and seasonal effects of the climatic variables. Secondly, a SARIMAX model (integrated autoregressive model with exogenous variables) was used to model weekly trends in confirmed cases of dengue fever, incorporating temporal components and exogenous factors. This methodological combination provides a more comprehensive and robust analysis of the phenomenon, taking into account both the temporal structure of the data and the delayed effects of environmental variables.
2. Methods
2.1. Materials and Methods
This is a retrospective secondary analysis of a historical cohort of dengue cases recorded in Côte d’Ivoire between 2017 and 2023. The data used are raw and aggregated data extracted from the Vigile weekly epidemiological bulletin. Daily meteorological data covering the period from 1 January 2017 to 31 December 2023 were obtained online [18]. Variables included: temperature (maximum, minimum, maximum maximum, maximum minimum, minimum and average), total precipitation, daily precipitation records, relative humidity and wind speed. In order to ensure temporal consistency with the weekly notifications of dengue cases, a weekly aggregation of the meteorological data was carried out, by calculating the weekly average of the daily measurements.
2.2. Statistical Analysis Plan
2.2.1. Descriptive Analysis
The time trend in epidemiological curves (suspected and confirmed cases) and in climatic variables was explored graphically over the entire period.
2.2.2. Modelling
1) Preparing data for modelling
Prior to analysis, the dataset was split into two subsets. The training set included observations from the first morbidity week of 2017 to the last week of 2022. The test set included observations from all morbidity weeks in 2023. The training set was used to build the statistical models, while the test set was used to validate them. Principal Component Analysis (PCA) was performed to identify the key meteorological factors to be included in the multivariate analysis. A biplot of the first two principal components was generated to visualise the dispersion of the variables and identify groups of correlated predictors. The Pearson correlation test was calculated to assess the strength of the linear relationships between the meteorological variables. For each statistical modelling technique, two separate data sets were considered. The first, called MSD (Meteorological Factors), included the meteorological variables corresponding to the week of morbidity studied. The second, called MD (Temporal Shifts), contained only meteorological variables shifted in time. In order to determine the optimal time lag for each meteorological factor, a cross-correlation analysis was performed on lags ranging from 0 to 25 weeks.
2) ARMAX model (ARMA with exogenous variables)
The ARMAX model is an extension of ARMA by incorporating exogenous explanatory variables
[19]:
(1)
the dependent variable at time t;
the constant;
the AR (autoregressive) coefficients;
the MA (moving average) coefficients;
the coefficients associated with the exogenous variables
;
the error term.
3) GAM model
GAMs have been used to analyse the non-linear influence of meteorological factors on dengue cases and to predict the course of the disease [20].
The formula is as follows
(2)
denotes a link function which can select the corresponding link function according to different statistical characteristics. The distribution of dengue cases in this study corresponds to a Poisson distribution. Thus, the corresponding link function for GAMs is ln(y).
is a constant term,
represents the linear fit function and,
represents the non-linear fit function.
2.2.3. Performance Assessment
Four main indicators were used to assess the predictive performance of the statistical models: mean absolute error (MAE), root mean square error (RMSE), R2, AIC and BIC. In addition, the Ljung-Box test was used to verify, at the 5% significance level, the absence of significant autocorrelation between the residuals.
(3)
(4)
3. Results
Figure 1 shows a multi-variable time series from 2017 to 2023, illustrating the monthly change in confirmed dengue cases (blue line) in relation to three meteorological factors: humidity (red line), rainfall (green line) and temperature (orange line). Peaks in the number of cases are mainly observed between mid-2017 and mid-2019, with a marked resurgence around the year 2022. Prolonged periods with no or few cases are also noted, reflecting strong seasonal variability in dengue transmission. Temperature follows a very regular seasonal pattern, characterised by well-defined annual cycles. It generally fluctuates between 32˚C and 44˚C, with no obvious direct correlation with dengue peaks. Humidity is relatively stable over the observation period, showing a slight cyclical pattern and varying between 25% and 31%, indicating a low amplitude of fluctuation. Rainfall shows a strong seasonal pattern, with peaks often preceding or coinciding with periods of increased dengue cases. This variable appears to be the most visually correlated with the incidence of cases.
![]()
Figure 1. Trends in climatic variables and dengue cases (2017-2022).
3.1. Search for Explanatory Variables
3.1.1. Principal Correspondence Analysis
The first two axes explain 71% of the inertia. On the correlation circle, the incidence of dengue appears to be positively correlated with the number of suspected cases, record rainfall, humidity and suspected cases (Figure 2).
Figure 2. Principal component analysis of variables.
Correlation analysis shows that six (6) meteorological variables are significantly associated with the incidence of dengue fever (Table 1). These are: number of suspected cases of dengue fever, precipitation record, humidity, minimum temperature, maximum temperature, average daily precipitation.
Table 1. Correlation coefficients between confirmed cases of dengue fever and other variables.
Variable |
Correlation |
p-value |
Suspected dengue |
0.65 |
<0.001 |
Min of min Temperature |
0.19 |
<0.001 |
Daily rainfall record |
0.19 |
<0.001 |
Humidity |
0.16 |
0.002 |
Average daily rainfall |
0.15 |
0.004 |
Max temperature |
−0.12 |
0.025 |
Average temperature |
−0.07 |
0.191 |
Max of max temperature |
−0.07 |
0.177 |
Min temperature |
0.02 |
0.643 |
Max of min temperature |
−0.02 |
0.704 |
Wind speed |
−0.02 |
0.636 |
Wind temperature |
−0.02 |
0.664 |
Figure 3 and Table 2 shows the highest time lags identified between each meteorological factor and dengue incidence. Relative humidity, maximum temperature and minimum temperature showed the strongest association at lag weeks 21, 17 and 3 respectively. The number of suspected cases and record rainfall were correlated without lag.
Figure 3. Cross-correlation analysis.
Table 2. Peak correlations between climate variables and dengue.
Variable |
Lag |
Corrélation |
Humidity |
21 |
−0.269 |
Daily Record_Precipitation |
0 |
0.186 |
Max_Temperature |
17 |
0.288 |
Minimum_Min_Temperature |
3 |
0.203 |
Suspected Dengue |
0 |
0.645 |
3.1.2. ARIMAX Models
Table 3 presents the estimated coefficients (± standard error) of the variables influencing the number of confirmed dengue cases in two approaches: a model incorporating time lags and another without lags.
The autoregressive component AR(1) was stable and high in both models, at around 0.76, reflecting a strong temporal dependence of dengue cases from one period to the next. Relative humidity, with a lag of 21 days, shows a significantly negative effect (coefficient = −9.84 ± 3.43), suggesting that high humidity three weeks earlier is associated with a decrease in the number of confirmed cases. Witha lag of 3 days, the minimum temperature had a weak and insignificant effect (−0.043 ± 0.10), whereas without a lag, it had a significant positive effect (0.674 ± 0.189), suggesting a more immediate but absent influence in the very short term. The maximum temperature, lagged by 17 days, showed a very weak effect (−0.0128 ± 0.1045), whereas the model without lag revealed a more marked negative effect (−0.18 ± 0.082), indicating that this variable has a greater influence on cases of dengue fever in the immediate rather than the medium term.
Daily rainfall has coefficients close to zero in both models, with opposite signs (negative with lag, positive without), and a high degree of uncertainty, suggesting either no direct effect or a non-linear relationship that is difficult to model.
The number of suspected dengue cases retained a significant positive effect in both models, slightly more pronounced without lag (0.0152 ± 0.0058), confirming that suspected cases are a good immediate predictor of confirmed cases.
Table 3. Comparative table of estimated coefficients with and without lag.
Variable |
With lag (coef ± s.e.) |
Without lag (coef ± s.e.) |
AR(1) |
0.7607 ± 0.0378 |
0.7583 ± 0.0386 |
Intercept |
10.3593 ± 5.6345 |
−12.6489 ± 5.7483 |
Humidity (lag 21 days) |
−9.8420 ± 3.4250 |
— |
Humidity (no lag) |
— |
1.9644 ± 3.4526 |
Min Temp (lag 3 days) |
−0.0434 ± 0.1017 |
— |
Min Temp (no lag) |
— |
0.6896 ± 0.1911 |
Max Temp (lag 17 days) |
−0.0128 ± 0.1045 |
— |
Max Temp (no lag) |
— |
−0.1821 ± 0.0833 |
Daily Precipitations |
−0.0010 ± 0.0026 |
0.0006 ± 0.0027 |
Suspected Dengue |
0.0121 ± 0.0059 |
0.0149 ± 0.0058 |
Figure 4. Comparison between actual and fitted values of ARMAX models for confirmed cases of dengue fever in Côte d’Ivoire (2017-2022).
Figure 4 compares the observed dengue cases (in black) with the values adjusted by the ARIMAX models (in blue for the model without lag and in red for the model with lag), over the period from 2017 to 2022. Both models appear to fit the real data. The Ljung-Box test suggests that there is no significant autocorrelation between the residuals at different lag times and that the residuals are white noise p = 0.31 (with lag) and p = 0.63 (without lag).
3.1.3. GAM Models with Offset
The model presented incorporates lagged explanatory variables as well as smoothed terms, in a generalised additive model (GAM) approach with a Poisson link, aimed at modelling the incidence of dengue cases. Among the linear variables, only the minimum temperature with a 3-day lag (lag3) had a positive and significant effect on the number of cases (
= 0.2078; p = 0.0424). On the other hand, the maximum temperature delayed by 17 days (lag17) had no significant effect (p = 0.2872). The smoothed terms showed very marked significant effects Humidity at D-21 (p = 0.000642), Extreme precipitation at D-18 (p = 1.5e−06), Suspected cases of dengue at D-1 (p < 2e−16). The model performed well overall, with an adjusted R2 of 0.837 and a predictive correlation of 0.92 (Table 4).
Table 4. Summary of the GAM model with offset and performance.
Category |
Variables |
Estimate |
Std_Error |
p_value |
Significance |
Model Parameters |
Min_Temperature_lag3 |
0.2078 |
0.1024 |
0.0424 |
* |
Max_Temperature_lag17 |
0.1783 |
0.1675 |
0.2872 |
|
Smoothed Terms |
Humidity_lag21 |
|
|
0.000642 |
*** |
Record_Precipitation_lag18 |
|
|
1.5e−06 |
*** |
Dengue_susp_lag1 |
|
|
<2e−16 |
*** |
Performance Criteria |
Adjusted_R2 |
0.837 |
|
|
|
RMSE |
0.92 |
|
|
|
Correlation |
0.92 |
|
|
|
3.1.4. GAM Model without Offset
The model explains around 10.1% of the variance in the data. There is a moderate correlation between observed and predicted values. The lag-free generalised linear model reveals that several factors have a significant impact on the number of confirmed cases of dengue fever.
Ambient humidity had a positive and highly significant effect
= 24.30; p < 0.001 which suggests that an increase in humidity is associated with an increase in cases. Maximum temperature also had a strong positive effect
= 0.73; p < 0.001 indicating that it plays an important role in incidence.
The number of suspected cases is a good predictor
= 0.011; p < 2e−16 confirming a robust association with confirmed cases; Extreme daily rainfall has a modest but significant effect
= 0.0027; p = 0.023; Minimum temperature had no significant effect
= −0.065; p = 0.476, suggesting a negligible influence in this model.
The model has low explanatory power, with an adjusted R2 of 0.101. The correlation between predicted and observed values was moderate at 0.35 (Table 5).
Table 5. Summary of the lag-free GAM model and performance.
Category |
Variables |
Estimate |
Std_Error |
z_value |
p_value |
Significance |
Model
Parameters |
(Intercept) |
−44.969 |
6.299 |
−7.139 |
9.42e−13 |
*** |
Humidity |
24.297 |
4.794 |
5.068 |
4.02e−07 |
*** |
Minimum_Min_Temperature |
−0.065 |
0.091 |
−0.713 |
0.476 |
|
Max_Daily_Precipitation |
0.003 |
0.001 |
2.273 |
0.023 |
* |
Suspected dengue |
0.011 |
0.001 |
9.566 |
<2e−16 |
*** |
Maximum_Temperature |
0.725 |
0.105 |
6.885 |
5.77e−12 |
*** |
Performance
Metrics |
Adjusted_R2 |
0.101 |
|
|
|
|
RMSE |
2.21 |
|
|
|
|
Correlation |
0.35 |
|
|
|
|
3.2. Model Comparison
The lagged GAM model showed the best overall performance, with a low root mean square error (RMSE) of 0.92 and a mean absolute error (MAE) of 0.26. Its coefficient of determination (R2) was 0.837, meaning that it explained around 84% of the variance in confirmed dengue cases. In addition, its AIC (243.58) and BIC (318.61) values are significantly lower than those of the other models, reflecting excellent parsimony and good fit. The ARIMAX model with lag performed intermediately, with an RMSE of 1.41 and a MAE of 0.47. Its R2 was 0.62, indicating a moderate explanation of variance. The AIC (1049.75) and BIC (1079.19) information criteria are higher, reflecting a less favourable trade-off between accuracy and complexity. The ARIMAX model without lag shows a similar performance with an RMSE of 1.40 and a MAE of 0.52, but an R2 slightly lower than 0.53, which suggests a lower explanatory capacity than its counterpart with lag. However, it has slightly better AIC (1046.07) and BIC (1075.52) criteria, reflecting a slightly more parsimonious model. Finally, the GAM without lag shows the weakest performance with a high RMSE of 2.21, a MAE of 0.82 and a very low R2 of 0.101, explaining only about 10% of the variance. Despite a moderate AIC (712.91), this model performs the worst in all respects (Table 6).
Table 6. Comparison of the performance of the 4 models.
Model |
RMSE |
MAE |
R2 |
AIC |
BIC |
ARIMAX_WITH_OFFSET |
1.4101 |
0.4667 |
0.620 |
1049.75 |
1079.19 |
ARIMAX_WITHOUT_OFFSET |
1.4013 |
0.5151 |
0.530 |
1046.07 |
1075.52 |
GAM_WITH_OFFSET |
0.9200 |
0.2600 |
0.837 |
243.58 |
318.61 |
GAM_WITHOUT_OFFSET |
2.2100 |
0.8200 |
0.101 |
712.91 |
735.00 |
4. Discussion
The aim of our work was to compare the predictive performance of several statistical methods for anticipating cases of dengue fever in Côte d’Ivoire, with or without taking climatic variables into account. The statistical models explored include ARIMAX and GAM with and without time lag. The temporal evolution of confirmed dengue cases between 2017 and 2023 was compared with that of three meteorological variables: humidity, temperature and rainfall. The graph shows a strong seasonality in precipitation and temperature, while dengue cases show irregular peaks but seem to occur mostly during or after periods of high precipitation. This visual link suggests an association between increased rainfall and the occurrence of dengue fever, probably linked to the proliferation of vectors. Humidity and temperature, on the other hand, show less variation and appear to play a more modulating role. These observations reinforce the hypothesis of a meteorological influence on the dynamics of dengue transmission. A comparative assessment of the performance of the models identified the GAM with lag as the best-performing model. This model had the lowest forecast errors (RMSE = 0.92, MAE = 0.26), while explaining nearly 84% of the observed variance (R2 = 0.837). Its AIC and BIC values were also the lowest, indicating an excellent balance between model accuracy and complexity. The lagged GAM model clearly stands out as the optimal option for forecasting confirmed cases of dengue.
As regards the ARIMAX models, the two versions (with and without lag) show intermediate performances. The ARIMAX model with lag offers a better explanatory capacity (R2 = 0.62) compared with the version without lag (R2 = 0.53), although the latter has slightly lower AIC and BIC values. These models are therefore reasonably effective, but still lag behind the GAM in terms of accuracy and variance explained.
Finally, the GAM without lag shows the weakest performance. It is characterised by high prediction errors (RMSE = 2.21, MAE = 0.82) and very low explanatory power (R2 = 0.101), indicating that it explains only around 10% of the observed variance. Although its AIC is relatively moderate, these results indicate that this model should be discarded or considerably reworked.
In a comparative study of statistical models and machine learning techniques by Xiang Chen in Brazil, the ARIMA model performed best using only historical data [21]. However, by taking climatic factors such as temperature and humidity into account, SARIMAX provided a more complete analysis, thereby improving accuracy. In addition, the use of lagged covariates in the SARIMAX model further improved the accuracy of long-term forecasts, taking into account the uncertainties inherent in extended forecast horizons [21]. These results are in line with our results, which showed interesting results with the lagged ARIMAX model.
In a study by Oswaldo Santos Baquero et al on the prediction of dengue fever in the city of São Paulo using generalised additive models, artificial neural networks and autoregressive seasonal models with integrated moving averages, a generalised additive model with lags of the number of cases and meteorological variables performed best, predicting epidemics on an unprecedented scale [22]. Their GAM model had an RMSE of 2152, whereas ours was much lower (243.58) [22].
There are a number of limitations to this study, which open up prospects for future work.
Challenges such as incomplete data availability and under-reporting of cases can affect transmission dynamics and the accuracy of forecasts.
On the other hand, only climatic parameters such as temperature, humidity and rainfall were used as explanatory variables. However, other factors such as population density, human mobility, urbanisation and socio-economic conditions could also play a decisive role in dengue transmission. Incorporating these factors in the future would enhance models and provide a better understanding of epidemiological mechanisms. Finally, the use of advanced techniques such as reinforcement learning could further improve the quality of forecasts.
5. Conclusion
The study compared the performance of GAM and ARIMAX models in predicting weekly dengue cases in Côte d’Ivoire. The lagged GAM model performed best, explaining 84% of the variance with the smallest errors. These results highlight the importance of incorporating lagged climatic variables such as humidity, minimum temperature, precipitation, and suspected dengue cases into decision-support tools to anticipate dengue incidence in Côte d’Ivoire.