A Comparison of ANN and HSPF Models for Runoff Simulation in Balkhichai River Watershed, Iran ()
1. Introduction
Streamflow is one of the most important processes in the hydrological cycle and its prediction is vital for water resources management and planning [1] . Computer simulation models of watershed hydrology and artificial intelligent techniques are widely used for runoff simulation and forecasting. The use of watershed models is increasing due to the growing demands of improving runoff quantity.
Over the last decades, artificial intelligent techniques have been introduced and widely applied in hydrological studies as powerful alternative modelling tools, such as Artificial Neural Network (ANN) [2] - [6] , and fuzzy inference system (FIS) [7] - [9] . In addition, Shamseldin [10] (1997), Kumar et al. [11] and Mutlu et al. [12] compared ANNs with different input variables for runoff simulation. The comparisons showed that the ANN models applying both with rainfall and discharge as input variables gave better results than the models with rainfall as the input. When the model utilizes of rainfall values as the input variables, the simulated hydrographs do not match the measured hydrographs so well [13] [14] . Although better fits between the simulated and measured hydrographs have been reported in other studies, where additional variables such as temperature [15] , evaporation [16] and, soil moisture [17] have been included as inputs for the ANN model.
HSPF is a semi-distributed, conceptual model that combines spatially distributed physical attributes into the hydrologic response units. In this model, surface runoff is simulated primarily as an infiltration-excess process. HSPF has been used for simulation of various hydrological conditions [18] [19] , non-point source pollutants including contaminated sediment [20] , and land use management and flood control scenarios [21] . Abdullah et al. have used of HSPF model for runoff simulation of a watershed in Jordan. The results showed that monthly calibration and verifications produced good fit with correlation coefficients equal to 0.928 and 0.923, respectively. Also, daily simulation results showed lower correlation coefficient of 0.785 [22] . Al-Abed and Whiteley applied HSPF model for runoff simulation in the Grand River watershed; located in Southern Ontario, Canada, with a drainage area of about 6965 km2 [23] . Their study revealed very satisfactory results in the calibration step and the percentage of error between the yearly simulated and the yearly observed yearly discharge values ranged between 4% to 16%.
Not only ANNs and HSPF have been widely used for runoff, pollution and sediment simulation, but also some researchers have been performed based on the comparison between mathematical models and ANNs [24] - [27] . Wang et al. have applied both HSPF and ANN models for runoff and pollution simulation of the eleven watersheds in north of Bremerton, WA. Results of both the ANN model and the HSPF model bore similar accuracy levels [28] . Considering the cost/product factor, this study shows that the ANN modeling approach provides a cost-effective alternative tool for predicting rainfall-runoff relationships. In this study, ANN and HSPF are evaluated for the simulation of Balkhichai River stream flow in Iran.
2. The Study Area
The study area is Balkhichai River watershed (Figure 1), located in the northwest of Iran, between 47˚50'E to 48˚18'E, and 37˚50'N to 38˚16'N. The Balkhichai River watershed area is about 1214 km2 that plays an important role on the region agriculture. Balkhichai River length is about 40 km and it is the most important river in the region. Watershed outlet is Ardebil station located in the northern part of the watershed. The region consists of mountainous areas in the west and northwest. The watershed elevation varies from 1280 m to 4732 m. Land use is mainly open grass (former pasture and hay fields), mountains, and the land use is agricultural, rural and urban residential. The mean annual precipitation in this watershed (stations are presented in Table 1) is very little in comparison with world average of 800 mm. The slope watershed is variable between 0 to 60 percent that the lowest (minimum) slope located in the center of watershed is between 0 to 2 percent. The satellite images are taken from of LAND SAT5 satellite in 2006 year. Figure 1 shows the location of 2 synoptic meteorological stations in the watershed and Table 1 shows the coordinates and the average of the observed temperature and precipitation values of these stations.
3. Methodology
3.1. Data
Temperature and precipitation are the two basic variables, which are measured at meteorological stations. The training input dataset includes a total of 2557 data records between 2004 and 2010. The testing input dataset
Table 1. Properties of the synoptic stations.
consists of a total of 729 data records, which were observed during in the last 2 years (2011-2012). Hourly precipitation and temperature data were utilized as inputs to the HSPF model. Some data about of different land use classes in the Balkhichai watershed were retrieved during 3 days of field survey in the watershed. Then Land use classifications for the Balkhichai River were retrieved from the satellite image processing using the maximum likelihood classification.
3.2. HSPF Hydrological Model
In this study, Hydrological Simulation Program FORTRAN (HSPF) was used for simulation of Balkhichai River runoff. HSPF is a set of computer codes, which was developed by the US Environmental Protection Agency. It is based on the Stanford Watershed Model IV [29] . HSPF has been generated by the combination of Stanford Watershed Model IV with Agricultural Runoff Management Model (ARM) [30] , Non-point Source Runoff Model (NPS) [31] , and Hydrological Simulation Program (HSP) [32] - [34] . This model can simulate the hydrological processes over on permeable and impermeable land surfaces and streams [35] . It has been widely used in Asian and other parts of the world in the climate change studies [22] [36] .
HSPF is a semi-distributed deterministic, continuous and physically based model. The PERLND, IMPLND, and RCHRES modules are the three main modules of HSPF which help to simulate permeable land segments, impermeable land segments, and free-flow reaches, respectively. Detailed information about these modules can be found in the literatures [34] [38] [39] . Figure 2 exhibits the hydrological cycle processes in HSPF model. HSPF model uses a Storage Routing technique to route water in each stream-branch. Infiltration in permeable land is calculated based on Richards’ equation [34] . Actual evapotranspiration (ET) is calculated by Penman or Jensen formulas. Table 2 shows key HSPF parameters. These parameters should be calibrated during the calibration process. LZSN is the lower zone nominal capacity, which is the most important parameter in infiltration capacity. It is called in HSPF with the INFILT parameter. AGWRC is defined as the rate of today’s flow divided by the rate of yesterday’s flow, which is depended on topography, climate, soil properties and land use. UZSN is influenced by LZSN [35] . Other parameters that they have not presented in Table 2 are estimated using the
Figure 2. HSPF conceptual hydrological model [37] .
Table 2. The parameters of HSPF model in simulation process [40] .
BASINS (Better Assessment Science Integrating Point and Nonpoint Sources) software based on topographic, soil properties and land use data. Then the estimated parameters are introduced to HSPF. The BASINS is developed to promote better assessment and integration of point and nonpoint sources in watershed and water quality management. It integrates several environmental key data sets with improved analysis techniques. Several types of environmental programs can benefit from the use and application of such an integrated system in various stages of environmental management planning and decision making [34] . The data from 2004 to 2010 were utilized for HSPF model calibration and the data from 2011 to 2012 were used as validation datasets.
3.3. Artificial Neural Networks (ANN)
ANN inspired by using studies of biological neural system is composed of processing elements called neurons or nodes [41] . In literature, different types of ANN methods are used for forecasting and modeling the engineering Problems [12] . ANNs with one hidden layer are commonly used in hydrological modeling [42] since these networks are considered to provide enough complexity to simulate the nonlinear-properties of the hydrological process accurately.
A FFNN (Feed Forward Neural Network) consists of at least three layers of input, output and hidden layers. The input signals presented to the system in input layer are processed and forwarded into the hidden layer. The summation of the weighted input signals is transferred by a nonlinear activation function. The response of the network is compared with the actual observation results and the network error is calculated. The error of network is propagated backwards through the system and the weight coefficients are updated (Figure 3).
3.4. Evaluation Criteria
Table 3 shows the evaluation criteria that use in this study. The root-mean square error (RMSE) evaluates how closely predictions match observations. Values may range from 0 (perfect fit) to (no fit) based on the relative range of the data. The coefficient of determination, R, known as the square of the sample correlation coefficient, ranges from 0 to 1 and describes the amount of observed variance explained by the model. A value of 0 implies no correlation, while a value of 1 suggests that the model can explain all of the observed variance. The Nash-Sutcliffe coefficient of Efficiency, ENS, measures the model’s ability to predict variables different from the mean and gives the proportion of the initial variance accounted for by the model [43] . PWRMSE is implicitly a measure of the comparison of the magnitudes of the peaks, volumes, and times of peak of the simulated and measured hydrographs.
Table 3. List of elevation criteria.
: observed;: mean observed;: simulated;: mean simulated; n: number of data.
Figure 3. A three-layered FFNN with a back-propagation training algorithm.
4. Results and Discussion
4.1. Daily Streamflow Simulation by HSPF Model
The daily discharge data from 2004 to 2010 and from 2011 to 2012 were utilized for “calibration and training” and “validation and testing” the model approach, respectively. Table 4 shows the values of calibrated parameters in this study. For example, LZSN in Table 4 has the average value of 56.23 mm that has been estimated according to the Linsley equation [44] . Linsley equation for the LZSN estimation is LZSN = 100 + 0.25 × (yearly mean precipitation). For estimation of the other parameters, BASINS Technical Note 6 has been utilized [40] . Figure 4 and Figure 5 show the observed and simulated hydrographs for calibration and validation periods in HSPF model, respectively. These figures present good agreement between the daily observed and simulated runoff values in the calibration and validation periods. The correlation coefficients for calibration and validation periods are 0.83 and 0.91, respectively. It implies that HSPF simulation is acceptable. Moreover, Nash-Sutcliff coefficient (model efficiency) is 0.75 in calibration period and 0.72 in validation period. Nash-Sutcliffe efficiency coefficient values less than 0.5 are considered as unacceptable, while values greater than 0.6 are considered good and values greater than 0.8 are considered to be excellent results. Therefore, HSPF presented good daily runoff simulation. Results show that HSPF simulation of watershed discharge is acceptable in calibration period and can be used in this research.
4.2. Daily Stream Flow Simulation by ANN Model
The most common ANN network is the feed-forward network, which uses the back-propagation algorithm for training [45] . The numbers of neurons contained in the input and output layers are determined by the number of input and output variables of a given system. The size or number of neurons of a hidden layer is an important consideration when solving problems using multilayer feed-forward networks. If there are fewer neurons within a hidden layer, there may not be enough opportunity for the neural network to capture the intricate relationships between indicator parameters and the computed output parameters.
Here, we use the three-layer FFNN with one hidden layer and the common trial and error method to select the number of hidden nodes. Too many hidden layer neurons not only require a large computational time for accurate training, but may also result in overtraining. A neural network is said to be “over-trained” when the network focuses on the characteristics of individual data points rather than just capturing the general patterns presented in the entire training set.
Understanding the temporal relationships between climatic drivers and stream-flow is fundamental for the model development. Some studies use time-series correlation analysis to determine the temporal lag (number of time steps) between climate and flow variables [46] . Similar to Moradkhani et al. (2004), cross-correlation analyses were used in this study to determine the temporal relationships between precipitation, temperature and stream-flow [47] . Time-series analysis found the daily stream-flow autoregressive and moving average components to be of order 3, while the exogenous variables (precipitation and temperature) were of orders 3 and 1,
Figure 4. Analysis plot for daily flow in calibration period with HSPF model.
Figure 5. Analysis plot for daily flow in validation period with HSPF model.
Table 4. Values of parameters, used in simulation in HSPF model.
respectively. Therefore a total number of seven variables were identified as inputs (Equation (1)).
(1)
After the appropriate input vector was identified, the network was trained to predict future data based on the past and present data. In the present study, the input and output variables are first normalized linearly in the range of 0 and 1. The normalization is done using the following equation:
(2)
where is the standardized value of the input, X is the original data set, and are the minimum and maximum of the actual values in all observations, respectively. The main reason for standardizing the data matrix is that the variables are usually measured in different units. By standardizing the variables and recasting them in dimensionless units, the arbitrary effect of similarity between objects is removed.
Figure 6 and Figure 7 show discharge outputs of ANN model in calibration and validation periods based on observed data, respectively. A very good match is obtained between the observed runoff values and those computed by the ANN model for the training data in all the inputs. The performance of ANN model during high flows, such as in summer, autumn and winter are considered to be perfect and better than that of the HSPF model. However, during low flows, observations are a little bit above estimation. This indicates the robustness of the ANN model and confirms its capability for runoff simulation within acceptable accuracy.
Figure 6. Analysis plot for daily flow in calibration period with ANN model.
Figure 7. Analysis plot for daily flow in validation period with ANN model.
4.3. Comparison of HSPF and ANN Models
During comparison of results, the words such as “calibration” and “validation” of HSPF model were used as similar to training and testing of ANN model, respectively. To estimate the relative performance of the models in runoff simulation, values of evaluation criteria obtained from both ANN and HSPF models were compared. The evaluation criteria of ANN model obtained during calibration were compared with the corresponding evaluation criteria obtained during HSPF calibration. The values of ENS, R, ME, RMSE and PWRMSE are statistical evaluation criteria that were showed in Table 5. Similarly, the validation results of the ANN model were compared with the validation results of HSPF model.
The ENS for the HSPF model ranged from 0.64 to 0.80 for the calibration period and from 0.70 to 0.74 for the validation period. Similarly, the R for the HSPF model ranged from 0.73 to 0.88 for the calibration period and from 0.82 to 0.92 for the validation period. The ME ranged from −1.58 to 1.89 for the calibration period and from −1.15 to −0.84 for the validation period. Also, the RMSE ranged from 1.97 to 4.39 for the calibration period and from 2.09 to 3.86 for the validation period. The PWRMSE for HSPF model ranged from 2.37 to 4.52 for the calibration period and from 2.71 to 3.27 for the validation period.
The ENS for the ANN model ranged from 0.72 to 0.89 for the calibration period and from 0.78 to 0.85 for the validation period. Similarly, the R for the ANN model ranged from 0.86 to 0.93 for the calibration period and from 0.91 to 0.94 for the validation period. The ME ranged from −1.39 to 1.72 for the calibration period and from −0.97 to −0.41 for the validation period. Also, the RMSE ranged from 1.07 to 3.75 for the calibration period and from 1.89 to 2.56 for the validation period. The PWRMSE for ANN model ranged from 1.16 to 3.23 for the calibration period and from 1.23 to 2.67 for the validation period.
The results indicated that both models were generally able to simulate stream flow well during both the calibration/validation periods. However, the simulated stream flows by ANN were better than those predicted by HSPF during the calibration and validation periods. The runoff simulation of the ANN model was found to be better than the HSPF model during calibration and validation as revealed from the values of the evaluation criteria. There was a considerable difference between the values of ENS obtained from the ANN and HSPF models for the year 2004 (Table 5). Similar results were obtained during model validation period as well. In this study of the HSPF model, the values of Nash-Sutcliffe coefficients were found to be lower than that of the ANN model. This confirms that ANN model is well capable of describing the non-linear relationship between the input and output.
Figure 8 and Figure 9 show the scatter plots of observed and computed runoff values for the calibration and validation periods for the HSPF model. The scatter plot is well spread over the ideal line for this the watershed (Figure 8). In the validation period, the plot is shifted towards one side. The shift from the ideal line shows the possibility of systematic errors. Similar scatter plots for the ANN model, shown in Figure 10 and Figure 11,
Table 5. Performances of HSPF and FFNN model.
Figure 8. Scatter plot for daily flow in calibration period with HSPF model (m3/s).
Figure 9. Scatter plot for daily flow in validation period with HSPF model (m3/s).
Figure 10. Scatter plot for daily flow in calibration period with ANN model (m3/s).
Figure 11. Scatter plot for daily flow in validation period with ANN model (m3/s).
exhibit a closer scatter to the ideal line, thus indicating good runoff simulation for the Balkhichai River watershed. The scatter for the ANN model is obviously better than that of the HSPF model. These scatter plots are considered to be accounted for the application of the ANN model as is revealed by relatively more symmetrical scatter in figures. The ANN model was found to be more successful than the HSPF in relation to better forecast of peak flow. The results of this study, in general, showed that ANNs can be powerful tools in runoff simulation.
One of known advantages of the HSPF model is to make reliable flow simulation when there are available climate and soil data at ungauged site. Rainfall-runoff relation is impacted by climatic parameters and different physical e.g. slope, elevations, vegetation, soil humidity, groundwater, etc. all these parameters make a non- linear and complex relation for rainfall and runoff. Also, they have not completed data in many watersheds. Many different physical models such as HSPF have been developed, but because they cannot engage all necessary parameters, they are not as efficient as needed. Advancing use of ANN, despite its short background and the reliable results calculated by them, gives an idea of its growing popularity and bright future.
5. Conclusion
This paper reports the results of a comparison between two different models for runoff simulation in the Balkhichai River watershed in Iran, during the period of 2004-2012. The performances of models in “calibration and training” and “validation and testing” stages are compared with the observed runoff values to identify the best fit forecasting model based upon a number of selected performance criteria. The comparison results show that the ANN models have better performances in forecasting the runoff from HSPF. By considering a good training process and suitable algorithms and nodes, the prediction is more accurate. Once the architecture of the network is defined, weights are calculated so that they represent the desired output through a learning process where the ANN is trained to obtain the expected results. The neural network could predict runoff accurately, with good agreements between the observed and predicted values compared with the HSPF model. The ANNs are capable of daily simulation of runoff. However, in low flows, a little bit above estimation is observed. As in hydrological models, ANN does not require watershed information and other physical parameters in the modeling process, which reduces the complexities of modeling the system. Required time for the calibration of the ANNs is much less as compared with the HSPF. Also, for calibration, ANN model needs less expertise and experiences. In comparison to HSPF model, less data are required for simulation using the ANNs. If a number of scenarios are to be made to investigate the response of the catchment, the HSPF may prove to be advantageous in comparison to the ANNs. One of advantages of the HSPF model is to make reliable runoff simulation when there are available climate and soil data at ungauged site. In Iran, it is relatively easier to obtain flow and precipitation records through the governmental online resources compared with physical characteristics of river basins such as soil moisture, soil classes, groundwater level, infiltration and evaporation. So, the black-box models might emerge as a faster tool to implement on flow fore-casting business. The results of this study can be used for future studies, in general, the HSPF and ANN comparison in daily simulations, specifically runoff prediction performance.
NOTES
*Corresponding author.