^{1}

^{*}

^{1}

^{*}

The dynamic and accurate forecasting of monthly streamflow processes of a river are important in the management of extreme events such as floods and drought, optimal design of water storage structures and drainage network. Many Rivers are selected in this study: White Nile, Blue Nile, Atbara River and main Nile. This paper aims to recommend the best linear stochastic model in forecasting monthly streamflow in rivers. Two commonly hydrologic models: the deseasonalized autoregressive moving average (DARMA) models and seasonal autoregressive integrated moving average (SARIMA) models are selected for modeling monthly streamflow in all Rivers in the study area. Two different types of monthly streamflow data (deseasonalized data and differenced data) were used to develop time series model using previous flow conditions as predictors. The one month ahead forecasting performances of all models for predicted period were compared. The comparison of model forecasting performance was conducted based upon graphical and numerical criteria. The result indicates that deasonalized autoregressive moving average (DARMA) models perform better than seasonal autoregressive integrated moving average (SARIMA) models for monthly streamflow in Rivers.

Streamflow forecasting is of great importance to water resources management and planning. Medium- to long- term forecasting, at weekly, monthly, seasonal, or even annual time scales, is particularly useful in reservoir operations and irrigation management, as well as institutional and legal aspects of water resources management and planning. Due to their importance, a large number of forecasting models have been developed for Streamflow forecasting, including concept-based process-driven models such as the low flow recession model and rainfall-runoff models, and statistics-based data-driven models such as regression models, stochastic time series models [

The stochastic time series models are the popular and useful tools for medium-range forecasting and generating the synthetic data. A number of stochastic time series models such as the Markov, Box-Jenkins (BJ) Seasonal Autoregressive Integrated Moving Average (SARIMA), deseasonalized Autoregressive Moving Average (DARMA), Periodic Autoregressive (PAR), Transfer Function Noise (TFN) and Periodic Transfer Function Noise (PTFN), are in use for these purposes [

The univariate time series models that deal with only one time series, including the autoregressive integrated moving average (ARIMA) model and its derivatives such as seasonal ARIMA (SARIMA), periodic ARIMA, and deseasonalized ARIMA models, have long been applied in streamflow forecasting, particularly in the modeling of monthly streamflow [

The objective of this research is to select the best linear stochastic model for modeling monthly streamflow in Rivers. Two models are selected to be compared which the best; the deasonalized autoregressive moving average (DARMA) models and seasonal autoregressive integrated moving average (SARIMA) models. It is expected that this study will provide useful information for modeling monthly streamflow in Rivers, developing the appropriate strategy for managing the surface water under consideration and forming the basis of planning of major water resources. For example, a prediction may be required for construction of a hydrologic such as a bridge or dam.

Two main approaches to model seasonal time series at main key stations are considered in this paper. In the first approach, the series is deseasonalized by subtracting the seasonal means and perhaps dividing the seasonal adjustment by the seasonal standard deviation. A non-seasonal ARMA model is then fitted to the deseasonalized series. This model is named deseasonalized autoregressive moving average (DARMA) model. In the second approach, a linear stochastic model containing both seasonal and non-seasonal parameters is fitted to the differenced series this model is called Seasonal Autoregressive Integrated Moving Average (SARIMA) model. This type of seasonal model is discussed by Box and Jenkins [

The DARMA model is a widely used approach to model seasonal data series. In this method, first the series should be deseasonalized and then an appropriate nonseasonal stochastic ARMA(p, q) models are fitted to the deseasonalized data. Two standard deseasonalization techniques that have been widely used are:

where

where

The second stochastic model, ARIMA model is constructed using a combination of moving average (MA) and autoregressive (AR) processes after differencing the data to remove nonstationarity. The general non-seasonal ARIMA(p, d, q) model, AR(p) refers to order of the autoregressive part, I(d) refers to degree of differencing involved and MA(q) refers to order of the moving average part. The equation for the simplest ARIMA(p, d, q) model is:

where U_{t} is the d-th difference of the X_{t} process,

Box and Jenkins [_{ω} model which consists of a seasonal ARMA(P, Q) fitted to the D-th seasonal difference of the data coupled with an ARMA(p, q) model fitted to the d-th difference of the residuals of the former model. The general form of the ARIMA(p, d, q) × (P, D, Q)_{ω} model is given by:

where ω refers to number of periods per season,

Calibration of time-series models is conducted based on the three stages of model building: identification, estimation, and diagnostic checking [

The residual autocorrelation function (RACF) should be obtained to determine whether residual are white noise. There are three useful applications related to RACF for independence of residual. The first one is the correlogram drawn by plotting r_{k}(є) against lag k. If some of the RACF are significantly different from zero, this may mean that the present model is inadequate. The second one is Porte Manteau Lack of fit test (Q). The Q statistic is calculated by

where N is the number of observations, r_{k} is the correlogram of the residual єt, L is the maximum lag considered, and d is the number of differences. The static Q is approximately chi-square distribution with L-p-q degree of freedom. The adequacy of ARIMA(p, d, q) may be checked by comparing the static Q with the chi-square value χ^{2}(L-p-q) of a given significance level. If Q < χ^{2}(L-p-q), єt is an independent series and so the model are adequate , otherwise the model are inadequate.

The third approach, the Ljunge Box Q or Q(r) statistics can be employed to check independence for the model adequacy, If Q(r) < χ^{2}-table critical value at a level of significance so the model are adequate. The Q(r) statatistic is calculated by the following equation [

There are many standard tests available to check whether the residuals are normally distributed. Chow et al. [

The Nile Basin covers a surface of about 2.9 million square kilometers, approximately one-tenth of the surface area of Africa. It extends from 4˚S to 31˚N latitude and from about 21˚30'E to 40˚30'E longitude as shown in

The data collected in this study consisted of monthly discharge at Key stations as shown in

The plots of the ACF and PACF for each monthly data sequence were drawn to gather information about the seasonal and non seasonal AR and MA operators concerning the monthly series. The ACF graphs show an attenuating sine wave pattern that reflects the random periodicity of the data. For these data sequences with the cyclic seasonal component, seasonal differencing was needed by taking the seasonal differencing operator as one (1) or standardized (see

For differenced data (

Station | River name | Total data length | Forecasting | Prediction |
---|---|---|---|---|

Malakal | White Nile | 1905 to 2002 | 1905 to 1997 | 1998 to 2002 |

Eldiem | Blue Nile | 1963 to 2002 | 1963 to 1997 | 1998 to 2002 |

Khartoum | Blue Nile | 1903 to 2002 | 1903 to 1997 | 1998 to 2002 |

Atbara | River Atbra | 1903 to 2002 | 1903 to 1997 | 1998 to 2002 |

Dongola | Main Nile | 1903 to 2002 | 1903 to 1997 | 1998 to 2002 |

stress that seasonal AR terms are required. The PACFs possess significant values at some lags but rather tail off this may imply the presence of moving average (MA) terms. There are peaks in the graphs of the PACFs at lags that are multiples of 12, which may suggest seasonal MA terms.

For standardized data (Zt) as shown in

Logarithmic transformations are made for monthly data to be approximately normally distributed for all stations except Atbara River, square root transformation is used. Alternative models were selected by inspecting ACFs and PACFs for monthly streamflow at all stations. Diagnostic checks were applied in order to determine whether the residuals of the alternative models were independent and normally distributed. All of the DARMA and SARIMA models selected from ACF and PACF graphs did not fulfill the residual assumption (independent, normality). The models that did not fulfill at least one of the diagnostic checks were eliminated. The selected best models for DARMA and SARIMA models are presented in

Two approach, Porte Mnteau lack test and Ljung-Box Q (LBQ) are applied for testing the independence assumption of residuals for the best models. The results of these tests are presented in

For the selected best models, the results related to the normality of residuals using Shapiro-Wilk test, Anderson-Darling test, Jarque-Bera test and skewness tests are given in

Station | DARMA(p, q) | ||||
---|---|---|---|---|---|

Order | Parameter | ||||

p | Value | q | Value | ||

Malakal | DARMA(3, 1) | Ø_{1} | 1.96 | θ_{1} | 0.87 |

Ø_{2} | −1.27 | ||||

Ø_{3} | 0.3 | ||||

Eldiem | DARMA(1, 2) | Ø_{1} | 0.81 | θ_{1} | 0.08 |

θ_{2} | 0.19 | ||||

Khartoum | DARMA(3, 2) | Ø_{1} | 1.25 | θ_{1} | 0.56 |

Ø_{2} | −0.08 | θ_{2} | 0.41 | ||

Ø_{3} | −0.18 | ||||

Atbara | DARMA(1, 4) | Ø_{1} | 1.00 | θ_{1} | 0.43 |

θ_{2} | 0.31 | ||||

θ_{3} | 0.15 | ||||

θ_{4} | 0.06 | ||||

Dongola | DARMA(4, 5) | Ø_{1} | 1.91 | θ_{1} | 1.06 |

Ø_{2} | −1.02 | θ_{2} | 0.13 | ||

Ø_{3} | −0.34 | θ_{3} | −0.63 | ||

Ø_{4} | 0.30 | θ_{4} | −0.03 | ||

θ_{5} | 0.04 |

Station | ARIMA(p, d, q) × (P, D, Q) | ||||||||
---|---|---|---|---|---|---|---|---|---|

Order | Parameter | ||||||||

Nonseasonal | Seasonal | ||||||||

p | Value | q | Value | P | Value | Q | Value | ||

Malakal | ARIMA(1,0,2) × (2,1,3)_{12} | ɸ_{1} | 0.85 | θ_{1} | −0.02 | Φ_{1} | 0.18 | Θ_{1} | 0.84 |

θ_{2} | 0.11 | Φ_{2} | 0.27 | Θ_{2} | 0.34 | ||||

Θ_{3} | −0.19 | ||||||||

Eldiem | ARIMA(1,0,1) × (1,1,1)_{12} | ɸ_{1} | 0.64 | θ_{1} | −0.04 | Φ_{1} | −0.02 | Θ_{1} | 0.94 |

Khartoum | ARIMA(1,0.2) × (1,1,3)_{12} | ɸ_{1} | 0.75 | θ_{1} | 0.10 | Φ_{1} | −0.33 | Θ_{1} | 0.55 |

θ_{2} | 0.11 | Θ_{2} | 0.32 | ||||||

Θ_{3} | −0.01 | ||||||||

Atbara | ARIMA(1,0,1) × (3,1,1)_{12} | ɸ_{1} | 0.39 | θ_{1} | −0.15 | Φ_{1} | 0.03 | Θ_{1} | 0.89 |

Φ_{2} | −0.02 | ||||||||

Φ_{3} | −0.06 | ||||||||

Dongola | ARIMA(1,0.1) × (2,1,1)_{12} | ɸ_{1} | 0.65 | θ_{1} | −0.14 | Φ_{1} | −0.01 | Θ_{1} | 0.87 |

Φ_{2} | 0.04 |

Station | Porte Mnteau lack test | Ljung-Box Q(LBQ) | Model | ||||
---|---|---|---|---|---|---|---|

Q | χ2_{table} | Decision | Q(r) | χ2_{table} | Decision | ||

Malakal | 58.73 | 60.48 | residuals are independent | 59.67 | 60.48 | residuals are independent | DARMA models(p, q) |

Eldiem | 34.72 | 61.65 | 36.4 | 61.65 | |||

Khartoum | 54.25 | 58.12 | 56.14 | 58.12 | |||

Atbara | 56.38 | 59.3 | 58.26 | 59.3 | |||

Dongola | 50.25 | 54.57 | 53.35 | 54.57 | |||

Malakal | 45.64 | 55.75 | residuals are independent | 54.57 | 55.75 | residuals are independent | SARIMA models (p, d, q) × (P, D, Q)_{12} |

Eldiem | 51.39 | 60.48 | 52.32 | 60.48 | |||

Khartoum | 60.99 | 61.78 | 61.43 | 61.78 | |||

Atbara | 59.65 | 61.78 | 61.49 | 61.78 | |||

Dongola | 48.78 | 58.16 | 55.92 | 58.16 |

Plotting the observed and estimated data series for each model could be used as an indication of reliability of the models at the validation stages. The scatter plots of observed monthly flow and one-month-ahead forecasts of all models from the period from 1998 to 2002 are given in

For validation the SARIMA and DARMA models described in the previous sections, one-step-ahead forecasts for the test portion of the time series were generated using the selected set of calibrated models. The forecasting performance of all the models at the validation stage was compared based on the mean absolute error (MAE), root mean square error (RMSE), and coefficient of determination (R^{2}) which indicates the strength of fit between observed and forecasted stream. The procedure having lower MAE and RMSE and upper R^{2} values can be assumed to be the most accurate model for flow forecasting in the study area.

Station | Shapiro-Wilk test | Anderson-Darling test | Lilliefors test | Jarque-Bera test | Skewness test | Model |
---|---|---|---|---|---|---|

p-value | p-value | p-value | p-value | skew | ||

Malakal | 0.49 | 0.26 | 0.17 | 0.45 | −0.07 | DARMA models (p, q) |

Eldiem | 0.73 | 0.64 | 0.48 | 0.63 | 0.09 | |

Khartoum | 0.23 | 0.51 | 0.8 | 0.29 | −0.07 | |

Atbara | 0.82 | 0.33 | 0.1 | 0.96 | 0.09 | |

Dongola | 0.15 | 0.52 | 0.56 | 0.41 | −0.04 | |

Malakal | 0.15 | 0.11 | 0.14 | 0.15 | 0.07 | SARIMA models (p, d, q) × (P, D, Q)_{12} |

Eldiem | 0.85 | 0.6 | 0.61 | 0.67 | 0.06 | |

Khartoum | 0.07 | 0.23 | 0.46 | 0.58 | −0.03 | |

Atbara | 0.33 | 0.4 | 0.46 | 0.27 | 0.04 | |

Dongola | 0.09 | 0.06 | 0.19 | 0.19 | −0.08 |

where Y_{i} is the observed flow, F_{i} is the forecasted flow, n is the number of data points,

The one month ahead forecasting performances of all models for the calibration and testing periods are shown in ^{2} values for all stations.

This study aims to select the suitable stochastic model in forecasting monthly streamflow in rivers. Many Rivers are selected: White Nile, Blue Nile, Atbara River and main Nile. In this study a comparison between DARMA and SARIMA models which are the most popular for generating stochastically synthetic data, is applied to monthly streamﬂow data for key stations at Rivers. Independence analysis of the residuals was examined by using Porte Mnteau lack test and Ljung-Box Q (LBQ). To determine whether the residuals are normally distributed,

Station | DARMA | SARIMA | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

Forecasting | Predicted(1998-2002) | Forecasting | Predicted (1998-2002) | |||||||||

R^{2} | RMSE | MAE | R^{2} | RMSE | MAE | R^{2} | RMSE | MAE | R^{2} | RMSE | MAE | |

(m^{3}/s) | (m^{3}/s) | (m^{3}/s) | (m^{3}/s) | (m^{3}/s) | (m^{3}/s) | (m^{3}/s) | (m^{3}/s) | |||||

Malakal | 0.96 | 64.47 | 45.75 | 0.87 | 108.4 | 80.32 | 0.92 | 95.98 | 63.59 | 0.86 | 162.5 | 130.8 |

Eldiem | 0.96 | 232.5 | 80.05 | 0.92 | 628.4 | 383.5 | 0.94 | 276.8 | 92.24 | 0.91 | 788.5 | 470.4 |

Khartoum | 0.93 | 532.9 | 301.9 | 0.91 | 637.2 | 366.6 | 0.92 | 593.3 | 321.6 | 0.89 | 859.8 | 530.4 |

Atbara | 0.9 | 212.4 | 98.77 | 0.88 | 238.4 | 119.8 | 0.9 | 214.4 | 111.5 | 0.88 | 284.8 | 136.2 |

Dongola | 0.95 | 563.7 | 339.8 | 0.86 | 1058 | 602.2 | 0.95 | 574.3 | 341.9 | 0.86 | 1355 | 719.6 |

Shapiro-Wilk test, Anderson-Darling test, Lilliefors test, Jarque-Bera test and skewness tests were used. The selected model for each data set among DARMA and SARIMA models fulfilled the diagnostic checks. Furthermore, comparisons of monthly values for observed and predicted data from DARMA and SARIMA were compared based on the mean absolute error (MAE), root mean square error (RMSE), and coefficient of determination (R^{2}). DARMA models for all stations have the lower MAE and RMSE and upper R^{2} values can be assumed the most accurate model for monthly streamflow forecasting in Rivers.

Mohammed A.Elganiny,Alaa EsmaeilEldwer, (2016) Comparison of Stochastic Models in Forecasting Monthly Streamflow in Rivers: A Case Study of River Nile and Its Tributaries. Journal of Water Resource and Protection,08,143-153. doi: 10.4236/jwarp.2016.82012