_{1}

^{*}

Sensitivity analysis of neural networks to input variation is an important research area as it goes some way to addressing the criticisms of their black-box behaviour. Such analysis of RBFNs for hydrological modelling has previously been limited to exploring perturbations to both inputs and connecting weights. In this paper, the backward chaining rule that has been used for sensitivity analysis of MLPs, is applied to RBFNs and it is shown how such analysis can provide insight into physical relationships. A trigonometric example is first presented to show the effectiveness and accuracy of this approach for first order derivatives alongside a comparison of the results with an equivalent MLP. The paper presents a real-world application in the modelling of river stage shows the importance of such approaches helping to justify and select such models.

Rapid developments in artificial intelligence in recent years, and ever more powerful computing resources, have led to a plethora of machine learning techniques being used to solve all kinds of hydrological problems—for example, multi-layer perceptrons, radial basis function networks, support vector machines, deep learning, etc. Numerous studies have compared different techniques in different regions for different hydrological problems (for example, [

Although MLPs are arguably the most recognised feed forward neural network models, Radial Basis Function networks (RBFNs) have proved to be a popular alternative as they can be trained relatively quickly and perform comparably well against equivalent MLPs. RBFNs [

However, with such a diverse choice of tools open to the hydrologist, the challenge of deciding which technique to use can be daunting. The choice of which machine learning technique to use can be broken down into three fundamental issues:

1) Difficulty—how difficult is it to calibrate the chosen model—how long will it take; and how complex is the solution?

2) Accuracy—how accurate is the model we have produced?

3) Confidence—how confident are we that the model has some physical interpretability of its behaviour so we believe in the predictions it is making and can justify these predictions in a rational sense?

While studies have shown the MLP and RBFN can address the difficulty and accuracy issues defined above (they are relatively easy to implement and understand; and studies have shown them to be accurate), there has been little work addressing the third concern—that of physical interpretability. The objective of this paper is thus to contrast these two common machine learning variants within this context by exploring an approach to sensitivity analysis which provides some physical interpretation.

Sensitivity analysis of FFNNs is an important research issue as it allows us to explore relationships, provide physical and/or mechanical rationality in certain practical applications and justify model behaviour. Chen [

The RBFN has also attracted interest in terms of sensitivity analysis—most notably by Ng et al. [

This paper applies the backward chaining rule of Hashem [

The remainder of this paper is structured as follows. Section 2 discusses the derivation of the first order partial derivative of the RBFN (and, briefly, the MLP for background) and includes a simple trigonometric example to show this working in practice. Section 3 introduces a real-world example and explains how rainfall-stage RBFN and MLP models were developed to modelling river levels in the River Ouse, UK. Section 4 presents the results of the sensitivity analysis using the partial derivative equations outlined in Section 2, before Section 5 concludes the paper with some thoughts and further work.

In this section, we will focus on the calibration and the derivation of first order partial derivatives of the RBFN. The equivalent equation for the MLP is presented for completeness. A simple trigonometric function is presented by way of example to show the derived equations working in practice.

Both the MLP and RBFN are structured in a similar way—shown in

trial and error approach. For an RBFN, the hidden nodes represent the basis functions employed.

The RBFN consists of basis functions in the hidden layer and (typically) a linear activation function in the output layer. The basis functions in the hidden layer are typically represented by a Gaussian function (although it is acknowledged that other functions are sometimes used: [

h ( x ) = e − ( x − c ) 2 / 2 σ 2 (1)

in which c represents the distance from the centre of the Gaussian function and σ represents its width (sphere of influence).

When calibrating an RBFN the first step involves establishing the centres of the m basis functions. If m is set to the number of training samples in the data set, the centres can be set to the values of the inputs of each training pair. However, for large data sets, this can lead to networks that are unwieldy and over-fitted. An alternative is to use some form of data-clustering such as k-means to identify the centres of a much smaller number of basis functions. These centres are represented by the connection weights between Layer 0 and Layer 1 in

While the basis centres are calculated, the width of the basis functions, σ j (j = 1 to m basis functions) are also determined. There are a number of ways this can be achieved, but the most common is to set the basis width to the root squared distance between the basis function and its next nearest neighbour.

With the basis functions established it remains for the weights connecting Layer 1 to Layer 2 to be calculated. With a linear activation function in the output layer the output O can be calculated as:

O = w H (2)

in which w is the vector of weights connecting Layer 1 to Layer 2 and H is the arranged as:

H = [ h 1 ( X 1 ) h 2 ( X 1 ) ⋯ h m ( X 1 ) ⋮ ⋮ ⋮ h 1 ( X p ) h 2 ( X p ) ⋯ h m ( X p ) ] (3)

in which X_{i} represents the input vector of data sample i (i = 1 to p data samples).

Thus, the weights w can be calculated as:

w = H − 1 O (4)

Unfortunately, because H is not necessarily square, the pseudo inverse of H must be calculated:

w = ( H T H ) − 1 H T O (5)

With the RBFN calibrated the output from the network, O, can be calculated as follows. The output from each node (basis function) in the hidden layer, h_{j}j = 1 to m), is calculated from:

h j = e − S j 1 (6)

in which:

S j 1 = ( ( I 1 − w 1 j 0 ) 2 + ( I 2 − w 2 j 0 ) 2 + ⋯ + ( I n − w n j 0 ) 2 ) / 2 σ j 2 (7)

The output from the network, O, can then be calculated as ( w b O 1 is the bias):

O = S O 2 = h 1 w 1 O 1 + h 2 w 2 O 1 + h 3 w 3 O 1 + ⋯ + h m w m O 1 + w b O 1 (8)

The partial derivative of the output O with respect to input I i is calculated as:

∂ O ∂ I i = ∑ j ∂ O ∂ h j ∂ h j ∂ S j 1 ∂ S j 1 ∂ I i (9)

for all hidden nodes, j.

In the case of the RBF (from (8)):

∂ O ∂ h j = w j O 1 (10)

From (6):

∂ h j ∂ S j 1 = − e − S j 1 = − h j (11)

From (7):

∂ S j 1 ∂ I i = ( I i − w i j O ) / σ j 2 (12)

Thus, by substitution, (9) becomes:

∂ O ∂ I i = ∑ j w j O 1 h j ( w i j 0 − I i ) σ j 2 (13)

An MLP is typically trained using the error back propagation algorithm (although many other training algorithms exist)—the detail of which is beyond the scope of this paper as it is covered in many other texts elsewhere. The connection weights between input i ( i = 1 , ⋯ , n ) and hidden node j ( j = 1 , ⋯ , m ) are denoted by w i j 0 (0 representing layer 0); while the connection weights between hidden node j and the output node are donated by w j O 1 (1 representing layer 1 and O representing the single output node). By convention, a bias is also added as an additional input to each node which enables the network to model more complex relationships. The biases are represented in as w b j 0 for nodes in the hidden layer and w b O 1 for the output node.

By using the backward chaining rule of Hashem [

∂ O ∂ I i = ∑ j O ( 1 − O ) w j O 1 h j ( 1 − h j ) w i j 0 (14)

For j = 1 to m hidden nodes.

As an example, take the simple trigonometric function as used by Hashem [

y = sin ( 4 x ) (15)

Thus,

d y d x = 4 cos ( 4 x ) (16)

Data were generated for these two single-input, single-output functions for values between [−2, 2] in steps of 0.01. An MLP was trained with seven hidden units for 50,000 epochs and an RBFN was created with seven basis functions using these data. The first derivative for each model was then calculated using Equations 13 (for the RBFN—referred to as RBFN’) and 14 (for the MLP—referred to as MLP’).

RMSE = ∑ i = 1 n ( Q ^ i − Q i ) 2 n (17)

RSqr = [ ∑ i = 1 n ( Q i − Q ¯ ) ( Q ^ − Q ˜ ) ∑ i = 1 n ( Q i − Q ¯ ) 2 ∑ i = 1 n ( Q ^ − Q ˜ ) 2 ] 2 (18)

RMSE | RSqr | |
---|---|---|

RBFN | 0.0495 | 0.9952 |

MLP | 0.0452 | 0.9960 |

RBFN’ | 0.4575 | 0.9743 |

MLP’ | 0.5737 | 0.9573 |

Both models show “good” accuracy in modelling the underlying equation with RSqr values of over 99%. The first derivative also shows “good” accuracy with the RBFN’ model, achieving an RSqr score of 97.4% while the MLP’ achieves a score of 95.7%. Although the MLP’ is marginally worse than the RBFN’ according to both error measures, the results confirm the accuracy of the partial derivative equations presented in Equations (13) and (14).

Although the partial derivative equations presented for the RBF in the previous section are not new, their application within a hydrological context has yet to be explored. With this in mind, we present here a rainfall-stage model for the River Ouse in the UK. This has been used in previous studies such as [

Six-hourly stage data for the River Ouse were available for this study. This catchment, situated in North Yorkshire, UK, covers an area of 3315 km^{2} and contains an assorted mix of urban and rural land uses. It embraces three main rivers—the Swale, Ure and Nidd. It has a base flow index of 0.439; an average annual rainfall of 899 mm and its longest drainage path is 149.96 km.

Name | River | Station Code | Grid Ref | Latitude | Longitude | Catchment Area (km^{2}) |
---|---|---|---|---|---|---|

Crakehill | Swale | 27071 | SE 425733 | 54.153767 | −1.3507459 | 1363 |

Skelton | Ouse | 27009 | SE 568553 | 53.990631 | −1.1351792 | 3315 |

Skip Bridge | Nidd | 27062 | SE 482560 | 53.997793 | −1.2662224 | 516 |

Westwick | Ure | 27007 | SE 355670 | 54.097678 | −1.4586607 | 915 |

Name | Station Code | Grid Ref | Latitude | Longitude | Elevation (m) |
---|---|---|---|---|---|

Arkengarthdale | 051684 | NY 999030 | 54.4223 | −2.00154 | 294 |

East Cowton | 054261 | NZ 308041 | 54.4313 | −1.52516 | 48 |

Malham Tarn | 073420 | SD 893671 | 54.0996 | −2.16228 | 381 |

Osmotherly | 055223 | SE 457967 | 54.3644 | −1.29557 | 147 |

Snaizeholme | 047282 | SD 829866 | 54.2752 | −2.26166 | 290 |

In this study, we focus on modelling stage at the downstream site of Skelton (Skelton_{t}) using lagged water level (m) data at that location (Skelton_{t−1}); upstream stage at Crakehill (Crakehill_{t−1}); and a moving average (over 12 time steps) of rainfall (mm) at Tow Hill (Tow Hill_{t−1}). These three predictors were chosen as they allowed us to explore the sensitivity of the derived models with a lagged input, an upstream driver, and a rainfall component. Their selection is based on results from previous studies [

The data cover two winter periods between 1993 and 1996 (1 October to 31 March) and were split so that the winter of 1993-1994 was used for training (716 data points); while the winter of 1995-1996 was used for evaluation (720 data points). Note that in normal circumstances of ANN modelling three data sets are used—a training set, a validation set, and a test set. However, as the intention of these experiments was not to derive and test an independent model, but to explore the use of partial derivatives for sensitivity analysis, only two data sets were required. Thus, a training set was used to calibrate the models and the evaluation set was used to select the “best” models and perform the sensitivity analyses.

Several RBFNs were created using the training data set with 2, 4, 6, 8, 10, 12, 14, 16, 18, and 20 radial basis functions. Several MLPs were trained using error back propagation for 1000 to 20,000 epochs (in steps of 1000) with 2 - 10 hidden units. A learning rate of 0.1 and a momentum term of 0.9 were used. The RBFNs and MLPs were then evaluated (using RMSE) against the evaluation data set to choose the optimum configurations. The best RBFN had 10 basis functions; the best MLP had 9 hidden nodes and was trained for 2000 epochs.

RMSE | RSqr | |
---|---|---|

RBFN | 0.2041 | 0.9149 |

MLP | 0.0976 | 0.9772 |

an RSqr value of 91.49%.

In this section, we apply Equations (13) and (14) to evaluate the sensitivity of the RBFN and MLP models to the three predictors. We present the results of the MLP first as these provide more meaningful insight than the RBFN which are presented later.

Note, because the data were standardised to [0.1, 0.9] for MLP training, the outputs from Equation 14 need adjusting by multiplying by range y/range x to return the actual dy/dx values. However, because the predictors have different ranges it would be wrong to compare dy/dx as changes in the predictor would not be consistent from one driver to the next. For example, if one predictor (say, x_{1}) ranges from 1 - 10; another (say, x_{2}) ranges from 1 - 1000, it would be wrong to compare change y/change x as it would take a large change in x_{2} to affect y. However, in practice x_{2} would be making large changes because we are dealing with real data. Therefore, Equation (14) is adjusted by multiplying by the range of each predictor to get the proportionate change in that predictor. This is a fairer comparison of the inputs.

Figures 7-9 show the sensitivity of the MLP model to each of the three inputs. As expected, the MLP is not particularly sensitive to rainfall (Tow Hill_{t−1})—exhibiting a gradually increasing level of sensitivity across the range of rainfall data. This is to be expected, as the catchment will not respond to low levels of

rainfall which have little impact on base flow but will react to higher levels.

The MLP exhibits similar sensitivity levels to both upstream flow at Crakehill and antecedent flow at Skelton. What is interesting is how these sensitivities seem to tail off for—particularly for the Skelton_{t−1} predictor. The answer to this may lie in the content of the training data set. _{t−1} and Crakehill_{t−1} predictors, respectively. The available data for Skelton_{t−1} steadily decreases for higher values. This broadly follows the sensitivity of this driver. With fewer data points available at higher flows, the MLP has not been able to capture the behaviour of the catchment as well as it was at lower flows, where data were more abundant and more differentiation between values could be made.

A similar scenario has occurred for the Crakehill_{t−1} predictor, but to a lesser extent. In this case, above values of around 1 m, the frequency of data does not tail off as much as the data did for Skelton_{t−1}. This has probably led to a less severe decline in the sensitivity of the model to Crakehill_{t−1} which decreases only

slightly for values over 1 m.

In conclusion, Equation (13) does provide a way of exploring the sensitivity of a real-world model to different inputs. Investigations of the sensitivities give insight into the model behaviour and provide the user with a better understanding of why the model performs as it does and how data can affect the training and performance of the final model.

Figures 12-14 show the sensitivity of the RBFN (10 basis functions) to each of the three inputs which are calculated using Equation 13. A best fit polynomial line of order three has been added to these plots to highlight the general shape of the sensitivity. The sensitivity of the model to Crakehill_{t−1} and Skelton_{t−1} are similar in nature—rising to a peak before falling away again with similar sensitivity values. The sensitivity of the model to Tow Hill_{t−1} is less clear appearing to rise and fall across the range of values. The sensitivities of Crakehill_{t−1} and

Skelton_{t−1} do not really provide any meaningful physical interpretation of how the model is behaving—for example, why would the model have low sensitivity for small Crakehill_{t−1} values, high sensitivity for values of around 1.5 m, and low sensitivity again for values of Crakehill_{t−1} above around 3 m? Equally puzzling are the fluctuating sensitivity values of the model with respect to the rainfall input of Tow Hill. The conclusion would be that these sensitivities, rather than representing a physical relationship within the RBFN model, are presenting the behaviour of the RBFN itself in which the model is distributed throughout the basis centres.

As these sensitivities may be peculiar to the RBFN chosen (i.e. the one with 10 basis functions) the sensitivities of all other RBFNs (from 2 to 20 basis functions in steps of 2) were explored for each of the three inputs. These sensitivities are presented in Figures 15-17 (for Skelton_{t−1}, Tow Hill_{t−1} and Crakehill_{t−1} respectively). Although no obvious physical pattern emerges, the figures do show how the models become more pronounced as more basis centres are used. The conclusion in this case is that the models are reflecting the physical relationship in the catchment as a distributed relationship with the basis functions themselves.

The lack of any physical meaning in the interpretation of the sensitivities of the RBFN models to their inputs led to an additional question of whether they were representing the physical nature of the model within the basis functions themselves. With this idea in mind, the RBFNs were analysed further by determining which of their basis functions was aligned most closely to each individual data point. In other words, for each data point, we identified (by Euclidian distance) which was its nearest basis function centre in each model.

This paper has applied, for the first time, a new means of directly calculating the partial derivatives of inputs with respect to the output in RBFNs for a rainfall-stage model. The derived solution is shown to work in simple cases and a complex, real world example for the MLP. Such analysis is particularly important as it allows modellers to explore the behaviour of their models more thoroughly, justify their performance and determine their ability to generalise. While such an analysis appears to work for an MLP, the results are less conclusive in the case of RBFNs. This leads us to the conclusion that such a technique does not work well real-world applications of RBFNs and alternative analysis should be performed. Such an analysis can involve exploring the relationship between each of the basis functions with the data points themselves. A preliminary analysis with the data set used in this study shows promise—in that there seems to be a clear relationship between aspects of the hydrograph and individual basis functions.

The results presented in this paper lead to several further investigations. First, it would be prudent to determine partial derivatives for RBFs that use alternative basis functions to the popular Gaussian function as these may provide more meaningful physical rationality. Second, looking at how the information garnered from such analysis can help to explain, justify, evaluate, and generalise from such models. There is currently no framework whereby sensitivity results can be used to derive such information. Such a framework would prove beneficial in the evaluation of neural networks and go a long way to addressing the criticisms levelled at such models due to their black-box behaviour. Finally, exploring the relationship between basis functions and components of the hydrograph shows promise from this work.

The author declares no conflicts of interest regarding the publication of this paper.

Dawson, C.W. (2020) Sensitivity Analysis of Radial Basis Function Networks for River Stage Forecasting. Journal of Software Engineering and Applications, 13, 327-347. https://doi.org/10.4236/jsea.2020.1312022