Analysis of Mean Monthly Rainfall Runoff Data of Indian Catchments Using Dimensionless Variables by Neural Network

This paper focuses on a concept of using dimensionless variables as input and output to Artificial Neural Network (ANN) and discusses the improvement in the results in terms of various performance criteria as well as simplification of ANN structure for modeling rainfall-runoff process in certain Indian catchments. In the present work, runoff is taken as the response (output) variable while rainfall, slope, area of catchment and forest cover are taken as input parameters. The data used in this study are taken from six drainage basins in the Indian provinces of Madhya Pradesh, Bihar, Rajasthan, West Bengal and Tamil Nadu, located in the different hydro-climatic zones. A standard statistical performance evaluation measures such as root mean square (RMSE), Nash–Sutcliffe efficiency and Correlation coefficient were employed to evaluate the performances of various models developed. The results obtained in this study indicate that ANN model using dimensionless variables were able to provide a better representation of rainfall–runoff process in comparison with the ANN models using process variables investigated in this study.


Introduction
The rainfall-runoff relationship is one the most complex hydrological phenomenon due to the tremendous spatial and temporal variability of watershed characteristics and rainfall patterns as well as a number of variables involved in the physical processes.Also, this process is non-linear in nature and thus difficult to arrive at explicit solutions [1,2].The runoff needs to be estimated for efficient utilization of water resources.The rainfall-runoff models play a significant role in water resource management planning and hydraulic design.Several attempts have been made to model the non-linearity of the rainfall-runoff process, arising from intrinsic non-linearity of the rainfall-runoff process and from seasonality These rainfall-runoff models generally fall into these broad categories; namely, black box or system theoretical models, conceptual models and physically-based models [3][4][5].Black box models normally contain no physically-based input and output transfer functions and therefore, are considered to be purely empirical models.Conceptual rainfall-runoff models usually incorporate interconnected physical elements with simplified forms, and each element is used to represent a significant or dominant constituent hydrologic process of the rainfall-runoff transformation [6,7].A dimensional analysis technique has also been developed and used to obtain mean annual flood estimation in several Indian catchments [8].
In recent year, applications of Artificial Neural Network (ANN) has become increasing popular in water resources and have been used in various fields for the prediction and forecasting of complex nonlinear processes, including the rainfall runoff phenomenon.Many studies have demonstrated that the ANNs are excellent tools to model the complex rainfall-runoff process and can perform better than the conventional modeling techniques [1, [9][10][11][12] However, many a times, less attention is given to simplify the ANN structure.
The use of dimensionless variables as input and output to ANN in rainfall-runoff modeling has not been found in the literature as of our best knowledge.Although, some evidences of using dimensionless variables in ANN are known in application of estimation of scour downstream [13] and for heat problems [14].Swamee used the dimensionless variables to compute annual flood estimation and hence, the same dimensionless variables are used in this present study in the context of rainfall-runoff process [8].
Thus, in view of the above, the objectives of the present study are to 1) evaluate dimensional analysis technique of Swamee et al.; 2) investigate the technique of ANNs using process variables as well as dimensionless variables for modeling the complex rainfall-runoff process; and 3) to achieve simplifications in ANN structure.The paper begins with a brief introduction of the computing techniques of ANN and study area followed by the details of the model development before discussing the results and making concluding remarks.The techniques are applied on all river basin data used in the present study and Damodar river basin is used as an example of individual river basin to examine the effects on individual catchment.

Artificial Neural Network
The Artificial Neural Network represents an alternative computational paradigm where the solution to a problem is learned from a set of samples.An artificial neural network consists of simple synchronous element, called neurons, which are analogous to the biological neurons in the human brain [7,15].These neurons are arranged in layers in a network.The neurons in one layer are connected to those in the adjacent layers and strength of connection between the two neurons in adjacent layers is called "weight".There are weights on each of the interconnections and it is these weights that are altered during the training process to ensure that the inputs produce an output that is close to the desired value with an appropriate training rule being used to adjust the weights in accordance with the data that are presented to the network.An ANN normally consists of three layers, an input layer, a hidden layer, and an output layer.In a feed-forward network, the weighted connections feed activations only in the forward direction from an input layer to the output layer.Each node in a layer receives and processes weighted input from a previous layer and transmits its output to nodes in the following layer through links.A typical three layer feed-forward network is shown in Figure 1.There are many optimization techniques for neural networks training using the backpropagation algorithm.Recently, Levenberg-Marquardt learning algorithms are used increasingly due to the better performance and learning speed with a simple structure [15,16].

Figure 1. Three layer feed-forward neural network
This learning algorithm is discussed here briefly as follows: The Levenberg-Marquardt algorithm is based on approaching second-order training speeds without having the computation of Hessian matrix [17].The Levenberg-Marquardt algorithm uses an approximation to the Hessian matrix in the following Newton-like update: when μ is large, this becomes gradient descent with a small step size, and when μ is small, the algorithm approximates the Newton's method.
The Levenberg-Marquardt algorithm uses this approximation to obtain the revised weight in the following form: Xk (1) where J is the Jacobian matrix that contains first derivatives of the network errors with respect to the weights and biases; e is a vector of network errors and I is an identity matrix [15,18,19].

Study Area
The data used in this study are from 31 sub-catchments of six large drainage basins in the Indian provinces of Madhya Pradesh, Bihar, Rajasthan, West Bengal and Tamil Nadu.Locations of the various catchments and sub-catchments taken for the analysis are shown in  The values of monthly runoff were determined by summing up the daily observed discharges for the month.The monthly rainfall for each catchment was averaged using the Theissen polygon method.The hydrological data for use in the present study is taken from Pooja Jain and Rama Raju [20,21].These data were originally taken from the reports of Soil and Water Conservation Division, (1984,1987) published by Water Conservation Division of the Ministry of Agriculture, Government of India.The periods for which data is available vary from 10 to 17 years.Some data points were excluded from published hydrological data where runoff was more than precipitation, which is practically not possible.Mean values of several years of data are given in Table 1 and ranges of the above mentioned data used in the present study are given in Table 2.

Dimensional Analysis
For dimensional analysis, Buckingham's π theorem can be used to obtain the various dimensionless groups [8].Swamee has investigated the influence of inclusion of 4 dimensionless groups in mean flood flow estimation.These dimensionless groups were formed using variables such as discharge Q, Area A, average rainfall p, of duration D and recurrence interval T, Slope S and forest cover F A [8].
Based on available data for Indian catchments, following variables were identified: rainfall (P), runoff (R), slope (S) and forest cover (F A ). Adopting A as the repeated variable, following nondimensional groups were formed: R * = A -0.5 R (2) where R is the runoff in mm, P is the rainfall in mm and A is the drainage area in km 2 .
Using the above dimensionless group, the following empirical equation was proposed: Here a 0 -a 5 are empirical constants, S is the slope (percent) and F A is the forest cover.
The computed value for R * for ith data set R *ci was obtained as Here suffix i stand for ith data set and a 0 -a 5 are fitted coefficients.
Using Equation (5), the observed value R * for ith data set R *oi was obtained.To calibrate the model, the error criterion was set to minimize the average percentage error Ea, defined as

ANN Model Development Using Process Variables
Before the data presented to the ANN training, it must be standardized in order to restrict its range to the interval [0, 1].The actual observed outputs of the network being outside this bounded range of neuron transfer function; need to be normalized such that they fall within the bounded output range.To develop a model, it is important to establish a correlation between the dependent variable with the independent variables.For this purpose, correlation matrix has been made and is given in Table 3.
Using the information drawn from the correlation matrix analysis, runoff models have been decided as a function of different input variables.However, rainfall has been considered as a common input variable among all.1: Here ANNPAM represents Artificial Neural Network Process variables All river basins Model.
The development of rainfall-runoff models using ANNs, involves the following steps: 1) selection of data set for training, cross-validation and validation of the model, 2) identification of the input and output variables, 3) normalization of the data, 4) selection of the network architecture, 5) determining the number of neurons in the hidden layer, 6) training of the ANN models, and 7) validation and cross-validation of ANN model using the selected performance evaluation statistics.
Back Propagation Learning Network (BPLN) has been first calibrated using about 60 percent of data and 20 percent of data have been used in the validation of model.The remaining 20 percent have been used for cross validation of the model.The momentum coefficient is adapted to 0.9 and learning rate is fixed to 0.05 for neural network training.The number of epochs has been set to 3000.Log sigmoid is used as transfer function.The set of inputs combination which produced desired results corresponding to minimum RMSE were adopted for further analysis.

ANN Model Development Using Dimensionless Variables
Following model ANNDAM1 has been developed using dimensionless variables of rainfall (P*), slope (S), forest cover (F A ) and runoff (R*).These dimensionless variables are discussed previously.Here ANNDAM represents Artificial Neural Network Dimensionless All river basin Model.

Model-Performance Criteria
For identification of best combination of input variables, different models are tested using various performance criteria [22].Root mean square error (RMSE) has been calculated for training, validation and testing data of these models.The RMSE is defined as follows.) In addition, the Nash-Sutcliffe efficiency (η) is also widely used in water resources sector to assess the performance of a model [23].
Also the correlation coefficient (CC) was also used a performance criteria and is computed by using the following relationship [22].
where is observed output, is computed output, Y c Y Y is the mean of observed output , Yc is the mean of computed output,  is the standard deviation and N is total no. of samples.

Training and Validation and
Cross-Validation of Data

Training, Validation and Cross-Validation of All River Basins Data
Data have been analyzed in this section using dimensional analysis and ANN using process variables as well as using dimensionless variables.

Using Dimensional Analysis Model
The dimension analysis model DAAM1 was developed and fitted coefficients a 0 -a 5 were calculated by minimizing Ea by using steepest descent technique.DAAM represents Dimensional Analysis All river basins Model.The optimum value of a 0 -a 5 was obtained for which Ea was 39.74.This yielded the following form of ( 5 4(a).For this model, RMSE was in the range of 21.17-37.17and Nash-Sutcliffe efficiency was in the range of 0.632-0.815for different NN architecture.The best identified NN architecture was 1-4-5-1 for which RMSE was in the range of 21.17-30.54and Nash-Sutcliffe efficiency was in the range of 0.718-0.798.The NN architecture performed 1-1-1 the worst for which RMSE was in the range of 22.76-37.17and Nash-Sutcliffe efficiency was in the range of 0.632-0.767.
2) Model ANNPAM2: The performance of this model has been presented in Table 5( The best identified NN architecture was 2-1-1 for which RMSE was in the range of 19.99-36.49and Nash-Sutcliffe efficiency was in the range of 0.645-0.819.The NN architecture 2-6-7-1 performed the worst for which RMSE was in the range of 24.45-31.05and Nash-Sutcliffe efficiency was in the range of 0.694-0.840. 3) Model ANNPAM3: The performance of this model has been presented in Table 5(c) and RMSE of the results for training, validation and cross-validation are shown in Figure 4(c).RMSE was in the range of 18.89-36.26for different NN architecture for this model.The best identified NN architecture was 3-4-5-1 for which RMSE was in the range of 18.89-27.42and Nash-Sutcliffe efficiency was in the range of 0.762-0.856.The NN architecture 3-1-1 performed the worst for which RMSE was in the range of 21.07-36.26and Nash-Sutcliffe efficiency was in the range of 0.650-0.800.

4) Model ANNPAM4:
The performance of this model has been presented in Table 5( Based on these results, it can be inferred that NN architecture 4-6-1 performs the best for which RMSE was 18.70, 31.30,23.87, Nash-Sutcliffe efficiency was 0.907, 0.689, 0.743 and CC was 0.95, 0.83, 0.86 for training, validation and cross validation set respectively.

ANN with BPLVM Using Dimensionless Variable
Using the input dimensionless variables defined in the model ANNDAM1 and ANNDAM2; the ANN models have been trained using Levenberg-Marquardt algorithm (BPLVM) for different ANN architectures.The performance statistics of the results for all the models used with different architectures have been summarized in Tables 6(a) and (b).The trends of the RMSE for different architectures have been shown in Figures 5(a) and (b).
1) Model ANNDAM1: The performance of this model has been presented in Table 6(a) and RMSE of the results for training, validation and cross-validation are shown in Figure 5(a).For this model ANNDAM1, RMSE was in the range of 2.13-6.88 and Nash-Sutcliffe efficiency was in the range of (-0.65)-0.927for different NN architecture.For NN architecture 3-1-1, RMSE was 2.86, 4.86 and 3.85 and Nash-Sutcliffe efficiency was 0.762, 0.209 and 0.489 for training, validation and cross validation set respectively.For NN architecture 3-3-1, RMSE was 3.10, 6.60 and 6.93 and Nash-Sutcliffe efficiency was 0.845, -0.460 and -0.657 for training, validation and   5(b).For this model ANNDAM2, RMSE was in the range of 2.63-5.30and Nash-Sutcliffe efficiency was in the range of (-0.08)-0.88 for different NN architecture.For NN architecture 4-1-1, RMSE was 4.11, 4.52 and 2.92 and Nash-Sutcliffe efficiency was 0.722, 0.304 and 0.338 for training, validation and cross validation set respectively.For NN architecture 4-3-1, RMSE was 2.63, 5.30 and 3.76 and Nash-Sutcliffe efficiency was 0.88, 0.043 and -0.0.093 for training, validation and cross validation set respectively.For NN architecture 3-5-1, RMSE was 2.96, 4.16 and 3.74 and Nash-Sutcliffe efficiency was 0.856, 0.411 and -0.082 for training, validation and cross validation set respectively.
Based on these overall results, it can be inferred that model ANNDAM1 with NN architecture 3-1-1 performs the best for which RMSE was 2.86, 4.86, 3.85, Nash-Sutcliffe efficiency was 0.762, 0.209,0.489and CC was 0.873,0.645,0.905for training, validation and cross validation set respectively.

Training, Validation and Cross Validation of Damodar River Basin Data
Data of Damodar river basin has been analyzed in this section using dimensional analysis, ANN models using process variables and ANN models using dimensionless variables.

Using Dimensional Analysis Model
The dimension analysis model DAINM1 was developed and fitted coefficients a 0 -a 5 were calculated by minimizing Ea by using steepest descent technique.
DAINM represents Dimensional Analysis Individual river basin Model.The optimum value of a 0 -a 5 was obtained for which Ea was 20.54.This yielded the following form of (5): By using above expression, for model DAINM1, RMSE was 1.72, 2.43 and 1.044; Nash-Sutcliffe efficiency was 0.85, 0.65 and -0.21 and CR was 0.950, 0.970, 0.945 for training, validation and cross validation set respectively for.The performance statistics in terms of RMSE, Nash-Sutcliffe efficiency and CC of the results for this model have been summarized in Table 7.The trends of the RMSE for different models have been shown in Figure 6.
Using ANN with BPLVM Using Process Variables Using the input process variables defined as the models (i.e.ANNPAM1 through ANNPAM4), the ANN models (ANNINPM1 to ANNINPM4) have been trained using Levenberg-Marquardt algorithm (BPLVM) for different ANN architectures for Damodar river basin.ANNINPM represents Artificial Neural Network Individual river basin Process variables Model.The performance statistics of the results for all the models used with different architectures have been summarized in Tables 8(a)-(d).The trends of the RMSE for different architectures have been shown in Figures 7(a)-(d).
1) Model ANNINPM1: The performance of this model has been presented in Table 8(a) and RMSE of the results for training, validation and cross-validation are shown in Figure 7(a).For this model, RMSE was in the range of 7.01-51.12and Nash-Sutcliffe efficiency was in the range of (-1.31)-0.999for different NN architecture.The best identified NN architecture was 1-6-1 for which RMSE was in the range of 7.01-32.57and Nash-Sutcliffe Based on these results, it can be inferred that NN architecture 4-6-1 performs the best for which RMSE was 8.31, 6.60, 11.22, Nash-Sutcliffe efficiency was 0.974, 0.961, 0.948 and CC was 0.988, 0.989, 0.976 for training, validation and cross validation set respectively.

ANN with BPLVM Using Dimensionless Variable
Using the dimensionless variables as input defined in the model ANNDAMI, the ANN model ANNINDM1 have been trained using Levenberg-Marquardt algorithm (BPLVM) for different ANN architectures.ANNINDM represents Artificial Neural Network Individual river basin Dimensionless variables Model.The performance statistics of the results for all the models used with different architectures have been summarized in Table 9.The trends of the RMSE for different architectures have been shown in Figure 8.
For this model ANNINDM1, RMSE was in the range of 0.344-3.36;Nash-Sutcliffe efficiency was in the range of 0.198-0.995and CC was in the range of 0.73-0.99 for different NN architecture.
Based on these results, it can be inferred that NN architecture 3-1-1 performs the best for which RMSE was 1.16, 1.95 and 2.5; Nash-Sutcliffe efficiency was 0.943, 0.198 and 0.786 and CC was 0.971, 0.737 ,0.946 for training, validation and cross validation set respectively.

Results and Discussion
Here is summary of results for all river basins data as well as Damodar river basin data using different techniques.
All River Basins ANN models using process variables have been developed using all river basin data and the best identified NN architecture was 4-6-1 of model ANNPAM4 for which RMSE was in the range of 18.70-31.30and Nash-Sutcliffe efficiency was in the range of 0.689-0.907while RMSE was in the range of 2.79-5.11,Nash-Sutcliffe efficiency was 0.45-0.73and CC was in the range of 0.729-0.910for model DAAM1 using dimensional analysis technique.Hence, it can be concluded that dimensional analysis technique performed better than ANN models using process variables for all river basins data.
Based on the performance evaluation of ANN models using dimensionless variables, ANNDAM1 performed better than model ANNPAM4 using all river basin data in terms of performance criteria.For this model ANNDAM1, RMSE was in the range of 2.13-6.88 while RMSE was in the range of 18.70-31.30for ANNPAM4 using ANN models with process variables.For best identified struc-ture 3-1-1 with model ANNDAM1, RMSE was in range of 2.86-4.86,Nash-Sutcliffe efficiency was in the range of 0.20-0.90and CC was in the range of 0.64-090.Hence, it can be concluded that ANN models using dimensionless variables performed better than Ann models using process variables for all river basins data.The comparison of observed and computed runoff for models ANNPAM4 and ANNDAM1 have been shown in Figure 9 and Figure 10 respectively.
It is important to note here that the ANN architecture of best identified model ANNPAM4 using process variables was 4-6-1 while ANN architecture of best identified model ANNDAM1 using dimensionless variables was 3-1-1.Hence it can be concluded that ANN structure can be simplified using dimensionless variables.
In this analysis of given data set, it has been found that there was not much improvement in performance criteria by using input process variable as P(t-1).For best identified model ANNPAM5 with NN architecture 5-2-2-1 using P(t-1) as one of input variables, RMSE was the range of 19.40-28.61while RMSE was in range of 18.70-31.30for the best identified model ANNPAM4 with NN architecture 4-6-1 without using P(t-1) as a one of input process variable.Similarly, for ANN model ANNDAM2 with NN architecture 4-1-1 using P(t-1) as one of input dimensionless variables, RMSE was in the range of 2.92-4.52 while for ANN model ANNDAM1 with NN architecture 3-1-1 without using P(t-1), RMSE was in the range of 2.86-4.86.
Damodar River Basin ANN models using process variables have been developed using Damodar river basin data and NN architecture 4-6-1 of model ANNINPM4 using process variables performs the best for which RMSE was in the range of 6.6-11.22 and Nash-Sutcliffe efficiency was in the range of 0.948-0.974while RMSE was in the range of 1.044-2.43and Nash-Sutcliffe efficiency was in the range of (-0.21)-0.85for the dimensional analysis technique for this basin.Hence, it can be concluded that dimensional analysis technique performed better than ANN models using process variables for individual river basins data.
For this model ANNINDM1, RMSE was in the range of 0.344-3.36and Nash-Sutcliffe efficiency was in the range of 0.198-0.995while RMSE was in the range of 6.6-11.22 and Nash-Sutcliffe efficiency was in the range of 0.948-0.974for model ANNINPM4 using ANN models with process variables.Hence, ANN model ANNINDM1 using dimensionless variables performed better than ANN model ANNINPM4 using process variables.
The best identified structure for ANN model ANNINDM1 using dimensionless variables was 3-1-1 for which RMSE was in the range of 1.16-2.50,Nash-Sutcliffe efficiency was in the range of 0.19-0.97and CC was in the range of 0.73-0.97.Hence it can be concluded that ANN structure can be simplified using dimensionless variables.

Conclusions
This paper presents the findings of a study of comparison of the using process variables and dimensionless variables with dimensional analysis and ANN for rainfall- runoff modeling in certain Indian catchments for a group of river basins as well as individual basin.The performance of each model structure was evaluated using common performance criteria.The salient findings of this study are presented as follows: 1) ANN models using dimensionless variables performed better than ANN models using process variables for all river basin data as well as individual river basin data; 2) ANN models using dimensionless variables simplified ANN architecture for all river basins as well as individual river basin; 3) Dimensional analysis approach can be effectively used in rainfall-runoff modeling.

Figure 2 .
Figure 2. Geographical locations of different catchments ): DAAM1: R* = 0.41P * 0.89 (S + 0.052) 0.112 (F A + 0.049) -0.001 (16) By using above expression, for model DAAM1, RMSE was 5.11, 4.05, 2.79 and Nash-Sutcliffe efficiency was 0.58, 0.45, and 0.73 as well as CC was 0.837, 0.729, and 0.910 for training, validation and cross validation set respectively for.The performance statistics in terms of RMSE, Nash-Sutcliffe efficiency and CC of the results for this model have been summarized in Table 4.The trends of the RMSE for different models have been shown in Figure 3. Using ANN with BPLVM Using Process Variables Using the same input process variables defined as the models (i.e.ANNPAM1 through ANNPAM5), the ANN models have been trained using Levenberg-Marquardt algorithm (BPLVM) for different ANN architectures.The performance statistics of the results for all the models used with different architectures have been summarized in Tables 5(a)-(d).The trends of the RMSE for different architectures have been shown in Figures 4(a)-(d).

Figure 3 .
Figure 3. RMSE of dimensional analysis for DAAM1 b) and RMSE of the results for training, validation and cross-validation are shown in Figure 4(b).For this model, RMSE was in the range of 19.99-36.49and Nash-Sutcliffe efficiency was in the range of 0.413-0.840for different NN architecture.
d) and RMSE of the results for training, validation and cross-validation are shown in Figure 4(d).RMSE was in the range of 18.70-36.25for different NN architecture for this model.The best identified NN architecture was 4-6-1 for which RMSE was in the range of was 18.70-31.30and Nash-Sutcliffe efficiency was in the range of 0.689-0.907.The NN architecture performed 4-1-1 the worst for which RMSE was in the range of 20.19-36.25 and Nash-Sutcliffe efficiency was in the range of 0.650-0.803.5) Model ANNPAM5: The performance of this model has been presented in Table 5(e) and RMSE of the results for training, validation and cross-validation are shown in Figure 4(e).RMSE was in the range of 8.69-147.64 for different NN architecture for this model.The best identified NN architecture was 5-2-2-1 for which RMSE was in the range of was19.40-28.61and Nash-Sutcliffe efficiency was in the range of 0.23-0.68.

Figure 4 .Figure 5 .
Figure 4. (a) RMSE of different ANN architecture using BPLVM for ANNPAM1; (b) RMSE of different ANN architecture using BPLVM for ANNPAM2; (c) RMSE of different ANN architecture using BPLVM for ANNPAM3; (d) RMSE of different ANN architecture using BPLVM for ANNPAM4; (e) RMSE of different ANN architecture using BPLVM for ANNPAM5

Figure 8 .
Figure 8. RMSE of different ANN architecture using BPLVM for ANNINDM1

Figure 9 .Figure 10 .
Figure 9.Comparison of observed and computed runoff using ANNPAM4 model

Table 5 (e). Summary of ANN application to ANNPAM5 using BPLVM with process variables
The performance of this model has been presented in Table5(a) and RMSE of the results for training, validation and cross-validation are shown in Figure

Table 6 (b). Summary of ANN application to ANNDAM2 using BPLVM with dimensionless variables
The performance of this model has been presented in Table 6(b) and RMSE of the results for training, validation and cross-validation are shown in Figure