The choice of a particular Artificial Neural Network (ANN) structure is a seemingly difficult task; worthy of relevance is that there is no systematic way for establishing a suitable architecture. In view of this, the study looked at the effects of ANN structural complexity and data pre-processing regime on its forecast performance. To address this aim, two ANN structural configurations: 1) Single-hidden layer, and 2) Double-hidden layer feed - forward back propagation network were employed. Results obtained revealed generally that: a) ANN comprised of double hidden layers tends to be less robust and converges with less accuracy than its single-hidden layer counterpart under identical situations; b) for a univariate time series, phase-space reconstruction using embedding dimension which is based on dynamical systems theory is an effective way for determining the appropriate number of ANN input neurons, and c) data pre-processing via the scaling approach excessively limits the output range of the transfer function. In specific terms considering extreme flow prediction capability on the basis of effective correlation: Percent maximum and minimum correlation coefficient (R_{max}% and R_{min}%), on the average for one-day ahead forecast during the training and validation phases respectively for the adopted network structures: 8 7 5 (i.e., 8 input nodes, 7 nodes in the hidden layer, and 5 output nodes in the output layer), 8 5 2 5 (8 nodes in the input layer, 5 nodes in the first hidden layer, 2 nodes in the second hidden layer, and 5 nodes in the output layer), and 8 4 3 5 (8 nodes in the input layer, 4 nodes in the first hidden layer, 3 nodes in the second hidden layer, and 5 nodes in the output layer) gave: 101.2, 99.4; 100.2, 218.3; 93.7, 95.0 in all instances irrespective of the training algorithm (i.e., pooled). On the other hand, in terms of percent of correct event prediction, the respective performances of the models for both low and high flows during the training and validation phases, respectively were: 0.78, 0.96: 0.65, 0.87; 0.76, 0.93: 0.61, 0.83; and 0.79, 0.96: 0.65, 0.87. Thus, it suffices to note that on the basis of coherence or regularity of prediction consistency, the ANN model: 8 4 3 5 performed better. This implies that though the adoption of large hidden layers vis-à-vis corresponding large neuronal signatures could be counter-productive because of network over-fitting, however, it may provide additional representational power. Based on the findings, it is imperative to note that ANN model is by no means a substitute for conceptual watershed modelling, therefore, exogenous variables should be incorporated in streamflow modelling and forecasting exercise because of their hydrologic evolutions.
It is imperative to note as reported in Chibanga et al. [
According to Abrahart [
The selection of an appropriate architecture is usually problematic. In the views of Abrahart and See [
In light of the preceding sections, considering that the ANN model structure is ideally suited for modelling highly nonlinear input-output relationships, the central thrust of this study, therefore, is to assess the implication(s) of some of the latent issues in the application or adoption of neural network modelling paradigm or approach; specifically, in this regard, the emphasis is on model structural complexity and implicitly bring to the fore the correlation between optimisation algorithm as well as data pre-processing regime.
For this study, daily streamflow sequence of the River Benue at the Makurdi hydrometric station was obtained from the Benue State Water Works and National Inland Waterways Authority (Makurdi Office); the data sequence spanned through an entire period of thirty (30) years. Consistency test and continuity tests were done; based on these tests, non-continuous data years were removed thus reducing the length to 26 years (i.e., 9490 data elements). The entire time series of length of 9490 daily values was thus partitioned into two-set constituents of 8670 and 730 data points corresponding to training and validation phases, respectively; i.e., split sampling approach.
The mean daily discharges are as shown in
Similarly, the autocorrelation of the first difference signal (
whole range of frequency components. The fact that the spectrum is continuous with a pronounced and wide base underscores the aperiodicity of the series; but the problem however is, how much does the character of this complex aperiodicity and irregularity may translate to complex nonlinearity. This therefore portends the need to investigate if the dynamics of the discharges of a river could have a dominant chaotic signature on which high-dimension linear and nonlinear dynamics may be grafted.
The first step in the search for a deterministic behaviour is that of attempting to reconstruct the dynamics in phase space. Having available the time series of only one of the variables present in the phenomenon, that is, the discharge x ( t i ) , the delay time method proposed by Takens [
x ( t i ) = { x ( t i ) , x ( t i + τ ) , ⋯ , x [ t i + ( m − 1 ) τ ] } (1)
where, m ( m = 2 , 3 , ⋯ ) is called the embedding dimension.
To construct a well-behaved phase-space by delay time, a careful choice of τ is critical. The delay time τ is commonly selected by using the autocorrelation function (ACF) method where ACF first attains zeros or drops below a small value, say 1/e^{4}, or the mutual information (MI) method according to Fraser and Swinney [
The time delay coordinate method (Packard, et al. [
R i = | Y i + 1 − Y j + 1 | ‖ X i − X j ‖ (2)
If R i exceeds a given threshold R T (a suitable value is 10 ≤ R T ≤ 50 ), the point X i is marked as having a False Nearest Neighbour. As a consequence, the embedding dimension p is high enough if the fraction of points that have False Nearest Neighbours is actually zero, or sufficiently small, say, smaller than a criterion R f . In this case, the False Nearest Neighbour threshold R T was set to 10 (as reported in Wang [
Following from the analysis, eight lagged values of input variables were used when fitting the ANN model to the series; specifically, this implied that based on the phase-space reconstruction, the discharges Q t − 7 , Q t − 6 , ⋯ , Q t of day t-7 to day t. The eight lagged input values were used to forecast the discharge from time t + 1, i.e., the next day, to t + 5; i.e., 5-ahead values, using a multiple-output approach rather than a single-output. The idea here is just to explore the ANN model forecast behaviour over a high lead time (
To address the thrust of the study, two model configurations were considered corresponding to two model architectural variants with different nodal configurations. Precisely, single and double hidden layers were thus considered to examine the implications of model structural complexity. The ANN models adopted were 1) 8 7 5 single-hidden layer with 7 nodes; i.e., 8 input nodes in the input layer, 7 nodes in the hidden layer, and 5 output nodes in the output layer 2) 8 5 2 5 double-hidden layers with 5 and 2 nodes, respectively and 3) 8 4 3 5 double-hidden layers with 4 and 3 nodes, respectively; though after several trials in an attempt to choose comparable network structures.
For the purposes of the stated aim of the study, the multi-layer feedforward back propagation network was used. Specifically, network training was implemented using the trainbr (Bayesian regularisation: Br) function, traingdm (Gradient descent with momentum: gdm) function, trainlm (Levenberg-Marquardt: lm function in MATLAB Neural Network Toolbox. Since in neural network training, the transfer function is of critical relevance and predictability of future behaviour is a direct consequence of the correct identification of it, for the identified network structure, the tansigmoid and purelin transfer functions were used in the hidden and output layers, respectively. The purelin transfer function was considered for the output layer because it allows the network outputs to take on any value, whereas the last layer of a multi-layer network with sigmoid neurons constrains the network outputs to a small range.
Before applying the ANN, both input and output data were pre-processed and normalised in the range [−1 1]. The scaling strategy was adopted based on the findings of Wang [
x t = ( U x − L x ) x ′ t + ( M x L x − m x U x ) M x − m x (3)
where, x ′ t = the original input data, x t = the input data scaled to the network range, M x and m x are respectively the maximum and the minimum of the original input data, while U x and L x are the upper and the lower network ranges for the network input, respectively. Similarly, the original output, say y ′ t is scaled to the network range by
y t = ( U y − L y ) y ′ t + ( M y L y − m y U y ) M y − m y (4)
where, y t the systems’ output is scaled to the network range, M y and m y are respectively the maximum and minimum values of the original output data y ′ t , whereas U y and L y are respectively the upper and the lower network ranges for the network output. After scaling the inputs and outputs, the resulting output, say y ^ t is in the scaled domain. Hence, there is need to rescale the output y ^ t back to its original domain; this is by inverting Equation (3) and using y ^ ′ t as
y ^ ′ t = ( M y − m y ) y ^ t − ( M y L y − m y U y ) U y − L y (5)
In order to draw conclusions on the ANN model performance, attention is on the ANN model performance in terms of extreme events, that is, maximum and minimum flows. In this regard, the coefficient of correlation R as in Equation (6) was employed.
R = 1 v ∑ t = 1 v [ y t − μ y ] [ y ˜ t − μ ˜ y ] [ 1 v ∑ t = 1 v [ y t − μ y ] 2 ] 1 / 2 [ 1 v ∑ t = 1 v [ y ˜ t − μ ˜ y ] 2 ] 1 / 2 (6)
where, v = the number of output data points, y t = the observed flow, y ˜ t = predicted flow, μ y = mean of observed flow, and μ ˜ y = mean of predicted flows. In terms of the measures of forecast accuracy with respect to extreme values, the ratio of the forecasted maximum to the observed maximum (peak) was determined as
R max = y ˜ t max { y t } × 100 (7)
where, max { y t } = max { y 1 , ⋯ , y v } and y ˜ t is the forecast corresponding to such maximum; and R max = 100 % , means that the observed peak is perfectly reproduced by the model. Forecasts with values of R max about 100% are considered to be very accurate, while R max < 100 % indicates that the model underestimates the peak value; and R max > 100 % indicates overestimation. Similarly, the ratio of the forecasted to the observed minimum
R min = y ˜ t min { y t } × 100 (8)
where, y ˜ t represents the forecast corresponding to the minimum observed value was used to judge the forecasting capability of the model. In addition, specific-event prediction was also considered by looking at low and high flows.
Figures 9-12 and
Based on the results obtained, it suffices to note that the structural complexity is defined here to connote the size of the hidden layers against the traditional one-hidden layer commonly employed. It is thus imperative to state that the selection of the optimal number of hidden units (nodes) for the hidden layer is often considered to be problem dependent. However, intuition suggests that more is better but as reported by Abrahart and See [
Model Architecture: (8 7 5) | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
General | Optimisation algorithm | Training | Validation | ||||||||
R_{max} (%) | R_{min} (%) | R_{max} (%) | R_{min} (%) | ||||||||
1-day | 5-day | 1-day | 5-day | 1-day | 5-day | 1-day | 5-day | ||||
Br | 99.8 | 96.8 | 98.5 | 66.0 | 98.5 | 97.3 | 82.2 | 22.1 | |||
LM | 98.9 | 96.4 | 125.8 | 0.0 | 100.6 | 100.9 | 84.9 | 60.8 | |||
GDM | 104.8 | 101.2 | 0.0 | 0.0 | 99.2 | 88.6 | 16.4 | 0.0 | |||
Average | 101.2 | 98.1 | 74.8 | 22.0 | 99.4 | 95.6 | 61.2 | 27.6 | |||
Model Architecture: (8 5 2 5) | |||||||||||
General | Optimisation algorithm | Training | Validation | ||||||||
R_{max} (%) | R_{min} (%) | R_{max} (%) | R_{min} (%) | ||||||||
1-day | 5-day | 1-day | 5-day | 1-day | 5-day | 1-day | 5-day | ||||
Br | 95.9 | 94.4 | 10.9 | 54.9 | 95.5 | 94.6 | 104.6 | 34.9 | |||
LM | 99.9 | 98.1 | 84.8 | 0.0 | 489.5 | 423.0 | 0.0 | 0.0 | |||
GDM | 104.8 | 101.2 | 0.0 | 0.0 | 69.8 | 68.5 | 0.0 | 0.0 | |||
Average | 100.2 | 97.9 | 31.9 | 18.3 | 218.3 | 195.4 | 34.9 | 11.6 | |||
Model Architecture: (8 4 3 5) | |||||||||||
General | Optimisation algorithm | Training | Validation | ||||||||
R_{max} (%) | R_{min} (%) | R_{max} (%) | R_{min} (%) | ||||||||
1-day | 5-day | 1-day | 5-day | 1-day | 5-day | 1-day | 5-day | ||||
Br | 98.5 | 95.5 | 110.36 | 66.2 | 97.0 | 96.1 | 115.0 | 89.7 | |||
LM | 101.2 | 101.2 | 107.2 | 0.0 | 94.6 | 94.4 | 110.3 | 87.1 | |||
GDM | 81.5 | 75.8 | 70.9 | 30.7 | 94.6 | 94.4 | 110.3 | 87.1 | |||
Average | 93.7 | 90.8 | 96.1 | 32.3 | 95.4 | 95.0 | 111.8 | 88.0 | |||
Network Topology (8 7 5: Single hidden layer) | ||||||||
---|---|---|---|---|---|---|---|---|
Optimisation algorithm | Training | Validation | ||||||
% correct event prediction | % correct event prediction | |||||||
Low flow | High flow | Low flow | High flow | |||||
1-day | 5-day | 1-day | 5-day | 1-day | 5-day | 1-day | 5-day | |
Br | 0.82 | 0.73 | 0.98 | 0.89 | 0.65 | 0.69 | 0.87 | 0.86 |
LM | 0.81 | 0.72 | 0.98 | 0.88 | 0.65 | 0.72 | 0.87 | 0.87 |
GDM | 0.72 | 0.68 | 0.91 | 0.83 | 0.65 | 0.72 | 0.87 | 0.87 |
Average | 0.78 | 0.71 | 0.96 | 0.87 | 0.65 | 0.71 | 0.87 | 0.87 |
Network Topology (8 5 2 5: Double hidden layer) | ||||||||
Optimisation algorithm | Training | Validation | ||||||
% correct event prediction | % correct event prediction | |||||||
Low flow | High flow | Low flow | High flow | |||||
1-day | 5-day | 1-day | 5-day | 1-day | 5-day | 1-day | 5-day | |
Br | 0.81 | 0.73 | 0.97 | 0.89 | 0.65 | 0.70 | 0.88 | 0.87 |
LM | 0.81 | 0.73 | 0.98 | 0.89 | 0.65 | 0.69 | 0.88 | 0.86 |
GDM | 0.67 | 0.70 | 0.84 | 0.84 | 0.53 | 0.56 | 0.73 | 0.78 |
Average | 0.76 | 0.72 | 0.93 | 0.87 | 0.61 | 0.65 | 0.83 | 0.84 |
Network Topology (8 4 3 5: Double hidden layer) | ||||||||
Optimisation algorithm | Training | Validation | ||||||
% correct event prediction | % correct event prediction | |||||||
Low flow | High flow | Low flow | High flow | |||||
1-day | 5-day | 1-day | 5-day | 1-day | 5-day | 1-day | 5-day | |
Br | 0.82 | 0.73 | 0.98 | 0.89 | 0.65 | 0.70 | 0.88 | 0.87 |
LM | 0.81 | 0.72 | 0.98 | 0.88 | 0.65 | 0.72 | 0.87 | 0.87 |
GDM | 0.73 | 0.68 | 0.91 | 0.83 | 0.65 | 0.72 | 0.87 | 0.87 |
Average | 0.79 | 0.71 | 0.96 | 0.87 | 0.65 | 0.71 | 0.87 | 0.87 |
It is paramount not to only evaluate model forecast performance on the basis of statistical parameters, but to also consider the impact data pre-processing may have on ANN model forecasts. It is recognised that data pre-processing can have a significant effect on model performance (e.g. Maier and Dandy, [
The behaviour as depicted by
Based on the results obtained in all instances, it could be inferred that adoption of large hidden layers could be counter-productive; this is because an excessive number of free parameters will encourage over fitting of the network though it may provide an additional representational power. In the same context, rescaling of the ANN input regime adversely limits the output of the transfer function. Thus resulting from the conclusions drawn, it suffices to note that ANN model is by no means a substitute for conceptual watershed modelling, therefore, exogenous variables should be incorporated in streamflow modelling and forecasting exercise because of their hydrologic evolutions and too, effort should be geared towards using hybrid models like Fuzzy-Neural Network and Wavelet models in a coupling strategy with ANN in the modelling of streamflow; similarly, because of volatility and nonlinear deterministic problems, ARMA-GARCH models should be considered as viable complement in this regard too.
The authors declare no conflicts of interest regarding the publication of this paper.
Otache, M.Y., Musa, J.J., Kuti, I.A. Mohammed, M. and Pam, L.E. (2021) Effects of Model Structural Complexity and Data Pre-Processing on Artificial Neural Network (ANN) Forecast Performance for Hydrological Process Modelling. Open Journal of Modern Hydrology, 11, 1-18. https://doi.org/10.4236/ojmh.2021.111001