ARMA Modelling of Benue River Flow Dynamics : Comparative Study of PAR Model

The seemingly complex nature of river flow and the significant variability it exhibits in both time and space, have largely led to the development and application of the stochastic process concept for its modelling, forecasting, and other ancillary purposes. Towards this end, in this study, attempt was made at stochastic modelling of the daily streamflow process of the Benue River. In this regard, Autoregressive Moving Average (ARMA) models and its derivative, the Periodic Autoregressive (PAR) model were developed and used for forecasting. Comparative forecast performances of the different models indicate that despite the shortcomings associated with univariate time series, reliable forecasts can be obtained for lead times, 1 to 5 day-ahead. The forecast results also showed that the traditional ARMA model could not robustly simulate high flow regimes unlike the periodic AR (PAR). Thus, for proper understanding of the dynamics of the river flow and its management, especially, flood defense, in the light of this study, the traditional ARMA models may not be suitable since they do not allow for real-time appraisal. To account for seasonal variations, PAR models should be used in forecasting the streamflow processes of the Benue River. However, since almost all mechanisms involved in the river flow processes present some degree of nonlinearity thus, how appropriate the stochastic process might be for every flow series may be called to question.


Introduction
Time series modelling for either data generation or forecasting of hydrologic variables is an important step in planning and operational analysis of water resource systems.But, providing good forecast functions for time dependent data has become a common problem.It is particularly acute in environmental and ecologic studies in which the ability to predict is closely allied to the successful allocation of the resources needed to control the environment.Operational hydrological forecasting and water resource management require efficient tools to provide accurate estimates of future river level conditions and meet real world demand.The use of stochastic time-series models for hydrologic forecasting has evolved greatly.Although many studies have given flow simulation considerable attention, it is important to recognise that simulation is not an end in itself but rather a means to an end, the end being an optimal water resource design.Despite some notable applications and case studies [1], relatively few studies have reported on the use of flow simulation or stochastic modelling in general in solving engineering problems [2].More attention needs to be given to the uses of synthetic data, such as using the data with optimizing techniques to obtain optimal operating policies for storage or a set of storages.
In the context of the above, stochastic linear models are fitted to hydrologic data for two main reasons: to enable forecasts of the data one or more time periods ahead to allow for the generation of sequences of synthetic data.In the same context, deterministic models are of importance in forecasting flows over very short time intervals such as hours or even days.But even in these situations if the parameters of a deterministic model have no physical interpretation and cannot be measured in the field, a deterministic model offers few advantages over a stochastic model [3]; since probability limits for forecasts may readily be obtained, there may be advantages in using a stochastic model.Stochastic streamflow models are often used in simulation studies to evaluate the likely future performance of water resources systems.Several stochastic models have been proposed for mod-elling hydrological time series and generating synthetic streamflows.These include Autoregressive Moving Average (ARMA) models [4], disaggregation models [5], models based on the concept of pattern recognition [6].Most of the time-series modelling procedures fall within the framework of multivariate autoregressive moving average (ARMA) models.
Generally, Autoregressive (AR) and Autoregressive Integrated Moving Average (ARIMA) models have an important place in the stochastic modelling of hydrologic data.Such models are of value in handling what might be described as the short-run problem; that of modelling the seasonal variability in a stochastic flow series.In recent years, their importance to practical water resource problems has been over-shadowed by more sophisticated types of models that are designed to preserve long-run dependencies, perhaps of the order of decades, in hydrologic series.In most of these models, the Hurst h has been used to characterize the long-run dependencies.Although the long-run problem is important, the shortrun problem, perhaps of the order of months to a few years, is important, too.Thus, as river flow dynamics at some time-space scales are not as irregular and complex as those at other time-space scales, the need and appropriateness of the stochastic process concept for 'every' river flow, and hydrologic and geomorphic series calls for a second look at the totality of the whole assertion.In view of these conflicting paradigms, the question of whether a given river (or any hydrologic and geomorphic) series can be modelled appropriately by stochastic methods underscores the premise of this study.Considering all of this, the focus of the study is to model the daily flow sequence of the Benue River using Autoregressive Integrated Moving Average (ARIMA) and its two derivatives, the ARMA and the Periodic Autoregressive (PAR) models.Here, the feature of interest is to investigate the suitability of either model on the basis of some selected forecast performance criteria.

Hydrology of the Benue River
The Benue River is the major tributary of the Niger River.It is approximately 1,400 km long and almost navigable during the rainy season (between July and October).Hence, it is an important transportation route in the regions it flows through.Its headwaters rises in the Adamawa Plateau of the Northern Cameroon, flows into Nigeria south of the Mandara Mountains through the east-central part of Nigeria before entering the Niger River at Lokoja (Figure 1(a)).The wide flood plain is used for agriculture, with main crops being sugar cane and rice.There is only one high-water season because of its southerly location; this normally occurs from May to October, while on the other hand, the low-water period is from December to June. Figure 1(b) explains the hydrological flow regime of the Benue River in line with the general climatic pattern.There are definite wet and dry seasons which give rise to changes in river flow and salinity regimes.The flood of the Benue River (upper, middle, and downstream) lasts from July to October, and sometimes up to early November.

Data Base Management
In this study, historical time series for gauging stations at the base of the Benue River (i.e., Lower Benue River Basin) at Makurdi (7˚44′N, 8˚32′E) was used.A total of 26 years  water stage and discharge data were collected.A shorter time scale was considered for the modelling and forecasting.To this end, daily average discharges were used only.

ARMA Modelling of the Daily Flows
The procedure of fitting deseasonalised ARMA models to daily streamflow as used in this study involves two basic steps; i.e., deseasonalisation and ARMA model construction.To do this, the flow series was logarithmically transformed and deseasonalised by subtracting the seasonal mean values and dividing by the seasonal standard deviations of the logarithmic transformed series.To alleviate the stochastic fluctuations of both the daily and monthly means and standard deviations, they were smoothened by the first 8 Fourier harmonics respectively before being used for standardisation.To broaden the choices of the models in the modelling exercise, the possibility of the traditional autoregressive integrated moving average (ARIMA) model was examined.Unlike the deseasonalisation pre-processing method, the logarithmic transformed flow series was differenced before fitting the appropriate ARIMA model.The objective is to appraise the impact the pre-processing may have on the overall forecasting results for the respective models adopted.

PAR Model Building
A lot of contrasting difficulties are usually encountered in the development of different types of PAR models; for instance, model order, lose of generality, and the overwhelmingly burdensome and practically infeasible computation of compatibility between neighbouring days.Because of this, the method for fitting PAR model based on cluster analysis as espoused by Wang [7] and Otache [8] was adopted.The fuzzy clustering method was applied to partition the days over an annual cycle in order to build the PAR model.The Fuzzy Clustering Method (FCM) approach partitions a set of n vectors , j x j  , into c fuzzy clusters; this implies that each data point belongs to a cluster to a degree specified by a membership grade ij between 0 and 1.Thus, a matrix U consisting of the elements ij can be defined based on the assumption that the summation of degrees of belonging for a data point is equal to 1, i.e., The objective of the FCM algorithm is to find c cluster centers such that the cost functions of dissimilarity (or distance measure) are minimized.The cost function is defined by where, v i is the cluster center of the fuzzy group i; is the Euclidean distance between the ith cluster and the jth data point, and m ≥ 1 is a weighting exponent, taken as 2 here.The necessary conditions for Equation (1) to reach minimum are:  and In partitioning the days over the year with the clustering approach, the raw average daily discharge data and the autocorrelation values at different lag times, say 1 -10 days were used.The discharge data and the autocorrelation coefficients were organized as a matrix X of size where N is the number of years and 10 the autocorrelation values at 10 lags.To eliminate the influence of large differences among data values on the cluster result, the daily discharges were first logarithmically transformed before carrying out the cluster analysis.
Figure 4 shows the FCM clustering result.The entire daily discharge over the annual cycle was partitioned into three, basically conforming to the flow dynamics which is made up of low, medium, and high flows.The medium flow regime is a watershed or rather, the transition between the low and high flows.Based on the partitioning results, one AR model was fitted to a partition.Before fitting the respective AR models, the daily streamflow series was deseasonalised.The orders of the AR models were determined according to AIC criterion [9], with the PACF acting as a basis for the model choices.The partitioning of the daily streamflow in terms of days over an annual cycle is shown in Table 1.Based on the minimum AIC, Table 2 shows the orders of the AR models for each partition while

Forecast Performance Measures
Since forecast accuracy is best assessed by retrospective comparison of forecasts actually made or that which have been made, and the values observed during the forecast period, the following measures were used to evaluate model performances in the respective cases.
Mean Absolute Error: Mean Absolute Percentage Error: Root Mean Squared Error: Mean Squared Relative Error: Coefficient of Efficiency: Coefficient of Determination: Seasonally Adjusted Coefficient of Efficiency: (mod is the modulus, an operator used for calculating the remainder) is the season, ranging from 0 to S -1; and S is the total number of season.The forecast exercise was done by using the models developed.In all the cases, the forecast horizon covers a two-year period; i.e., the last two years.It suffices to note that model building was on a rolling-forward basis.

Results and Discussion
The forecast results were evaluated based on the stated measures of performance for each model under differing flow regimes as appropriate; namely, Wet (April-October), and Dry (November-March).The evaluation results for 1 to 10-day ahead forecasts with the ARMA(20,1) are listed in Tables 4-6; ARIMA (8,2,3) in Tables 7-9, and PAR in Tables 10-12.Based on the performance statistics, the following observations can be made.
In terms of the values of CE, it is obvious that both the ARMA (20,1), ARIMA (8,2,3), and PAR models indicate that satisfying forecasts can be achieved for lead times up to 10 days considering the whole year; that is, there is a possibility of making long-term forecasts of the streamflow process with the respective models.But concisely, this shows how much the CE statistic can flagrantly exaggerate the forecast accuracy of the model; SACE statistic in contrast, indicates the contrary.Using a threshold of ≥ 0.9 [10], the SACE statistic shows that with the ARMA (20,1), satisfactory forecast can be obtained for up to a 10-day ahead lead time, and for ARIMA (8,2,3), it is 5 days; whereas with the PAR model, it is around 7 days.Realistically, baring model uncertainty resulting from problems of externalities (say, data quality problems, data size, non-stationarity and seasonality issues), based on the SACE statistic, reliable forecasts can plausibly be made up to a lead time of 7 days.
It is important to note that there is obvious presence of significant seasonal variation in forecast accuracy.The forecast accuracy for dry season is relatively much higher than that of the wet season.Using the MAE statistic (threshold value, say, ≤ 150), with the ARMA (20,1), satisfactory forecasts on the average, can be made for up to 3 -5 days, that is for both wet and dry season.On the same basis, the performance of the ARIMA (8,2,3) is abysmal for the wet season period; in the dry season period, reliable forecasts are possible for at most 4 days while with the PAR model, around 3 -6 days for both wet and dry season periods.When assessing the performance of a streamflow forecasting model, it is not only important to evaluate the average prediction error but also the distribution of prediction errors as shown by the results here.It is important to know whether the model is predicting higher flows badly or the lower agnitude flows badly, which may help in further refin-m Generally, the statistical performance criteria, RMSE, r 2 , and CE are global statistics and do not provide any robust information on the distribution of errors; precisely, CE, MSRE, RMSE, MAE, and r 2 are all measures that incorporate both systematic and random errors.For instance, it is noted that a CE value of zero indicates that the observed mean is as good a predictor as the model, hile a negative value implies that the observed mean is w a better predictor than the model [11].But for hydrological time series that often exhibit strong seasonality, the general concern is whether the model is better than seasonal mean values of the series rather than the overall observed mean.This phenomenon cannot adequately be addressed by CE as in Equation ( 8); similarly, it is noted that the value of CE calculated for a whole year is higher than the average of CE values calculated for separate seasons, which illogically implies that the model per-formance for the whole year is better than for most separate seasons [7].These problems arise from the inadequacy of the definition of CE in dealing with seasonal processes.Basically, CE's definition is premised on the assumption that the process of interest is stationary [12]; but, hydrological time series usually exhibit strong seasonality.It is interesting to note that when strong seasonality exists, especially, when the mean value changes ith season, for most of the seasons (such as days or w  months) in a year, the value of the overall standard deviation is larger than the values of seasonal standard deviation [7,8].Figure 5 illustrates this disparity resulting from the existence of strong seasonality.As shown by Figure 5, the computed overall standard deviation (using the overall mean) is about 3721.97 m 3 •s -1 whereas the average of daily standard deviations (calculated for the average discharges in each day over the year) is 977.38 m 3 •s -1 .Thus Equation (10) (i.e., SACE), the seasonally adjusted coefficient of efficiency, espoused by Wang [7] which requires the use of seasonal mean values can overcome the shortcomings of the traditional CE.
To a large extent, the performance of a hydrologic model is seriously dependent on several factors, among which is the quality and information content of the data used vis-à-vis the form of pre-processing or transformation adopted.Most univariate time series models are developed under the assumption of second-order stationarity; thus if a strong seasonal component causes a series to be non-stationary, the traditional approach is to either pass it through a linear time-invariant filter, where the output is assumed to be stationary.But there are many instances of hydrologic time series that cannot be filtered or standardised to achieve second-order stationarity because the entire correlation structure of the series may be dependent on season.Considering this therefore, it is important to look at the forecast accuracy of the ARMA (20,1), ARIMA (8,2,3) and PAR models against the backdrop of the pre-processing strategy adopted here preparatory to the forecasting process since there is evidence of strong seasonality in the flow series.As re-ported in Kavvas and Delleur [13], McKerchar and Delleur [3] and Delleur et al., [14], both from analytical and empirical results, seasonal and/or non-seasonal differencing, although very effective in the removal of hydrologic periodicities, distorts the original spectrum of the time series, thus making it impractical or impossible to fit an ARMA model for hydrologic simulation or synthetic generation.Resulting from this, the forecasting capabilities of either seasonally differenced or non-seasonally differenced models may be impaired since they do not take into account the seasonal variation in the standard deviations as well as the seasonal structure inherent in the time series.A similar argument may be made for the deseasonalisation pre-processing approach; the deseasonalised modelling has some associated theoretical difficulties.The principal setback is the stationarity assumption usually made for deseasonalised series, which is not likely to be satisfied; this agrees with the findings of Moss and Bryson [15].These difficulties can be overcome by employing periodic models, which allow the model parameters, as well as model orders, to vary depending on the season of the year.
Thus considering all the issues highlighted, the inability of the ARMA (20,1), ARIMA (8,2,3), and PAR models to adequately capture the dynamics of the flow process here can be understood.Despite this though, considering the defects of both the ARMA (20,1) and ARIMA (8,2,3) resulting from the pre-processing style respectively, in the context of realistic forecasting, the PAR model as used here performs comprehensively better, as it has a higher potential to account for the variability in both seasonal deviations and seasonal correlation structures.

Conclusions
Data-driven models based on univariate time series were used for forecasting in this study, namely: traditional ARMA-type and the periodic AR (PAR) models.Comparative forecast performances of the various models show that despite the limitation associated with univariate time series, reliable forecasts can be obtained for lead times, one to 5-day-ahead on the average for all the models used.The forecast results also brought to the fore the inadequacy of the traditional ARMA model.It was unable to robustly simulate high flow regimes unlike the periodic AR (PAR).Because of this, it is imperative that in order to account for seasonal variations, PAR models should be used in forecasting the daily streamflow process of the Benue River.However, the stochastic modelling does show that the ARMA type models could be used as preliminary models for the basis of understanding the dynamics of the streamflow process.
In the light of the results obtained in this study, suffice it to note that one limitation of this study is smallness of the data size used for modelling the streamflow process.Thus, to enhance the performance of the models and establish the generality of the conclusions drawn, it is strongly recommended that larger data size be used, and too, explanatory exogenous variables (e.g.precipitation) be included during the modelling exercise, i.e., multivariate modelling.To this end, in order to improve the accuracy of long-range forecasts, investigation of the linkage between streamflow processes and ancillary hydroclimatic factors would be inevitable.In addition, since predictability is an important aspect of the dynamics of hydrological processes, though not considered in this study, a definition of the predictability of the streamflow processes is a necessity; at least to put forward a predictable horizon for the entire respective flow dynamics.

Figure 1 .
Figure 1.(a) Map of Nigeria showing Benue River and its traverse; (b) General hydrological year flow regime.

Figure 4 .
Figure 4. Membership grades of the days over the year for the daily streamflow based on fuzzy clustering.

Figure 5 .
Figure 5.Comparison between the overall standard deviation and seasonal standard deviation over an annual cycle.

Table 3
indicate the concise definition of each partition in terms of the intrinsic flow pattern.Collectively, the respective AR models constitute the PAR model.During the forecasting process, a specific AR model is applied depending on what season's partition the date to be forecasted is in.