Generalized Additive Mixed Modelling of River Discharge in the Black Volta River

River discharge data offer a rich source of information for reservoir management and flood control, if modelling can separate out the effects of rainfall, land use, soil type, relief, and weather conditions. In this paper, we model river discharge data from the Black Volta River, using Generalised Additive Mixed Models (GAMMs) with a space-time interaction represented via a tensor product of continuous time and discrete space. River discharge data from January 2000 to December 2009 for the four gauge stations along the Black Volta River namely, Lawra, Chache, Bui and Bamboi were obtained from the hydrological services department of Ghana and used for model fitting. Four GAMMs were explored, two with space-time interactions and two without space-time interactions. The comparison of the performance of the models with space-time interactions and those without space-time interactions based on Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) suggests that in this application, the former is better overall and in particular for modelling local variations. Further, a model with space and time main effects performed better compared with one without space and time main effects. After model selection, checking and validation, there is evidence for increasing river discharge from the most upstream gauge station to the most downstream gauge station for the study period.


Introduction
Most important tasks in problem solving in hydrology have been taken over by mathematical models [1]. According to [2], modelling in environmental science is the representation of a complex natural system in a simplified form through the use of logical mathematical statements. Most hydrologic systems are extremely complex, and we cannot hope to understand them in detail without modelling [3].
Many different reasons account for the development of hydrologic models for a catchment. They therefore have many different forms despite the fact that they are in general developed to meet at least one of two primary objectives [3]. One objective is to gain a better understanding of the hydrologic phenomena operating in a catchment and of how changes in the catchment may affect these phenomena whiles the other objective is to generate synthetic sequences of hydrologic data for facility design or for use in forecasting [3].
River discharge and other components of the hydrologic system are affected by many variables. Key among them is rainfall and its variation in space and time in response to various climatic factors. Other variables that potentially can affect river discharge include rock and soil type, land use, relief and weather conditions such as temperature and humidity. Establishing a relationship among these variables is the central focus of hydrological modelling from its simple form of unit hydrograph to rather complex models based on fully dynamic flow equations [4].
Hydrologic models can be classified into two broad classes, namely physical and abstract models [5]. Physical models can further be categorised into two categories, namely scale and analog models. A scale model refers to a scaled down model of a real system whiles an analog model refers to a physical system having the same characteristic as the original sample. Abstract models on the other hand, are used to show a system in a mathematical form. The model is operated with a set of equations, input and output data. These models are datadriven in nature, as they do not require knowledge of the underlying process beforehand and are solely based on empirical equations calibrated to field data [6].
Quite recently, [7] argued that hydrologic models may be seen as black-box, conceptual or deterministic models. Black-box models explain the relationship between the input and output data mathematically [8] and are often good for modelling with available and analyzed data for a specific catchment. Deterministic models have complex physical theory and need to have a large amount of data and computational time. Conceptual models are formulated with a number of conceptual elements which are simple representations of a reference system [9]. A significant number of physically based and data-driven models have been developed and implemented. Examples include [10]- [21]. Although it is easier understanding the separate hydrological processes that govern the whole system using the physically based models, in many occasions the input data may be unavailable, expensive or time consuming to collect [22]. Also, a number of vari- data-driven models [23].
According to [24], the various physical mechanisms governing the river discharge dynamics act on a wide range of spatial and temporal scales. However, an important observation that can be made from the studies conducted thus far on the applications of both the physically-based models and the black-box models for river discharge forecasting is that none of these studies has looked at the influence of both spatial and temporal variability on river discharge forecasting simultaneously. This forms the basis of the present study. Giving the peculiar location of the Black Volta River, quantifying changes of river discharge both in space and time is fundamental in addressing issues of flooding, power generation and survival of ecosystem downstream [25].
In this study, we propose generalized additive mixed models (GAMMs) [26] [27] [28] incorporating a smooth interaction of space and time for modelling space-time variations in river discharge in the Black Volta River, to extract space-time signals for the entire study area. GAMMs are appealing for their flexibility and the straight forward way in which smooth effects of covariables can be incorporated along-side the smooth space time effect and random effects [29].

Study Area
The Black Volta river basin ( The hottest month is March and the coolest is August [30]. Four gauge stations in the Black Volta basin namely Lawra, Chache, Bui and Bamboi, were all used for modelling and analysis.

Data and Variables
The data contains information on the four gauge stations along the Black Volta    Ochrosols and patches of Savannah Ochrosols-Lithosols [30].
The response variable for these analyses was river discharge (disch) measured in cubic metre per second whiles the independent variables were time (month & year) and space (loc) which are the various gauge stations along the Black Volta River considered in this study, namely Lawra, Chache, Bui and Bamboi. The covariates included rainfall (rain) measured in millimetres, relative humidity (humid), elevation (elev) measured in meters, soil type (soil) and land use (luse) which was considered as a random effect. Interactions between some of these variables were also considered especially the space-time interactions.

Models and Analyses
After checking the relationship between river discharge (disch) and all predictors, independent models were constructed for all covariates to determine their effect on disch and, if it resulted significant, its nature (linear or nonlinear) was

Parameter Estimation
The GAMMs in Table 1 can be expressed as generalized linear mixed models

Model Selection and Validation
Model selection was based on Akaike information criterion (AIC), Bayesian information criterion (BIC), adjusted R-squared, the root mean squared prediction error (RMSPE), and Nash-Sutcliffe efficiency (NSE). However, the key indicators of performance were the RMSPE which is independent of the likelihood and NSE. The RMSPE and NSE are calculated using Equations ( (2) and (3)) respectively.
where i y and ˆi y are the observed and predicted river discharges for n months, mean y is the mean of the observed data.

Software
Data analysis was done in the R programming environment version 3.2.4 [33] and models were fit using the MGCV package [28].

Descriptive Statistics
In generalized regression models, it is important and necessary to study the distribution function of the response variable (disch) in order to select both response distribution and link fuction. Boxplots for disch and log (disch) are reported in the upper panels of Figure 4 while the normal QQ-plots for disch and log (disch) are reported in the lower panels. We observe from the figure that log (disch) gives a good approximation to the normal distribution. Hence the Gaussian distribution was considered as the underlying theoretical distribution of disch in the GAMMs with log link function.
The time series plots of river discharge at the various gauge stations are shown in Figure 5, which indicates an obvious seasonality in discharge at all four gauge stations. This suggests that smooth functions may be represented using cyclic cubic regression splines.

Model Selection
The

Parameter Estimates of the Selected Model
Parameter estimates of the selected GAMM are reported in Table 3 and Table 4.
We observe from those Tables that, parameter coefficients (both smooth terms and non-smooth terms) were all significant at the 0.05 level.

Diagnostic Checks
Basic diagnostic plots of the selected GAMM are reported in Figure 6. The QQ-plot of residuals shows an evident arrival of residual quantiles at the theoretical normal quantiles and a near symmetry observed in the histogram of residuals as well. The scatter-plot of residuals versus the linear predictor indicates

Conclusions
We have effectively used GAMMs for modelling space-time river discharge data in this paper. GAMMs provide a flexible framework which allows for smooth effects of covariates and smooth effects of space and time. In other applications such as repeated observations of weather station data, the use of spatio-temporal dynamic models or state-space models have been proposed. Four GAMMs were explored, two with space-time interactions and two without space-time interactions. The comparison of the performance of the models with space-time interactions and those without space-time interactions based on AIC and BIC suggests that in this application, the former is better overall and in particular for modelling variations in river discharge data. Further, a model with space and time main effects performed better compared with one without space and time main effects.