Discharge Simulation in a Data-Scarce Basin Using Reanalysis and Global Precipitation Data: A Case Study of the White Volta Basin

Basins in many parts of the world are ungauged or poorly gauged, and in some cases existing measurement networks are declining. The purpose of this study was to examine the utility of reanalysis and global precipitation datasets in the river discharge simulation for a data-scarce basin. The White Volta basin of Ghana which is one of international rivers was selected as a study basin. NCEP1, NCEP2, ERA-Interim, and GPCP datasets were compared with corresponding observed precipitation data. Annual variations were not reproduced in NCEP1, NCEP2, and ERA-Interim. However, GPCP data, which is based on satellite and observed data, had good seasonal accuracy and reproduced annual variations well. Moreover, five datasets were used as input data to a hydrologic model with HYMOD, which is a water balance model, and with WTM, which is a river model; thereafter, the hydrologic model was calibrated for each datum set by a global optimization method, and river discharge were simulated. The results were evaluated by the root mean square error, relative error, and water balance error. As a result, the combination of GPCP precipitation and ERA-Interim evaporation data was the best in terms of most evaluations. The relative errors in the calibration and validation periods were 43.1% and 46.6%, respectively. Moreover, the results for the GPCP precipitation and ERA-Interim evaporation were better than those for the combination of observed precipitation and ERA-Interim evaporation. In conclusion, GPCP precipitation data and ERA-Interim evaporation data are very useful in a data-scarce basin water balance analysis.


Introduction
A large number of hydrologic models have been developed up to the present.Some models are able to simulate water use with human activities as well as natural water cycles (e.g., [1]- [3]).Furthermore, physically sophisticated hydrologic models such as SWAT (Soil and Water Assessment Tool) developed by [4] are opened as a public domain model, and nowadays, anyone can run a sophisticated hydrologic model without any difficulties.Moreover, these hydrologic models are used in the impact assessment on river discharge of climate change and to establish water resources planning based on the simulations.
However, in developing countries and international rivers, it is often very difficult to obtain meteorological and hydrological data which are input into hydrologic models.Also the quality of observation data is extremely poor, and the situation that meteorological and hydrological observations have been stopped happens quite often.Drainage basins in many parts of the world are ungauged or poorly gauged, and in some cases existing measurement networks are declining [5].Therefore, sometimes it is impossible to apply a hydrologic model to simulation because of observation, even if there is a hydrologic model.
On the other hand, datasets from general circulation models and products estimated by satellites have been developed drastically recently.The spatial and temporal resolution of those data becomes higher rapidly.A representative dataset is re-analysis data, which is frequently used for a water balance analysis in global or continental scale and climate studies (e.g., [6]- [9]).Moreover, Biemans et al. [10] compared seven global gridded precipitation datasets at river basin scale in terms of mean annual and seasonal precipitation.Getirana et al. [11] assessed different precipitation datasets including reanalysis data in the Negro River basin which is the most important tributary of the Amazon basin.Kotsuki and Tanaka [12] discussed four precipitation data sets to estimate runoff in Southeast Asia.However, simulated runoffs using these data sets have not been examined in African basins, which are sometimes hampered by the fact that only little hydro-meteorological information is available [13].
In this study, the utility of reanalysis and global precipitation datasets are examined in the rainfall-runoff analysis in the Volta River basin, which is one of the representative basins in Africa.NCEP1 (National Centers for Environmental Prediction and National Center for Atmospheric Research Reanalysis 1), NCEP2 (NCEP and Department of Energy Reanalysis 2), ERA-Interim (European Reanalysis-Interim), and GPCP (Global Precipitation Climatology Project) datasets are compared with corresponding observed precipitation data.Datasets are used as input data to a hydrologic model and river flows are simulated and the simulation results are compared with the observed data.

Study Area
The Volta River basin which is one of international rivers was selected as a study basin.The Volta River basin shared between five riparian countries: Benin, Burkina Faso, Ghana, Ivory Coast and Mali [14].The Volta River basin is very flat.The predominant land use types are Guinea savannah in the southern and Sudan savannah in the northern part.The main geological systems of the basin are Precambrian platform and a sedimentary layer, the Voltaian sandstone basin [13].The predominant soil types are lixisols in the southern and arenosols in the northern part.
The Volta River basin is situated in the semi-arid to sub-humid climate zone with mean annual temperature between 27˚C and 36˚C in the northern and between 24˚C and 30˚C in the southern part [13].Mean annual precipitation ranges from less than 300 mm (North) to more than 1500 mm in the South whereof around 80% falls between July and September.Evapotranspiration is a very important factor in this region.The mean annual potential evaporation lies between 2500 mm in the North and 1500 mm in the South [13].

Data
Reanalysis data are a consistent and high quality historical analysis dataset spanning the past several decades using the latest data assimilation system, numerical prediction models, and a high-performance supercomputer (e.g., [15]).Some numerical prediction centers carried out their reanalysis projects and open those data by a web site.In this study, NCEP1 [16] made by NCEP (National Centers for Environmental Prediction) and NCAR (National Center for Atmospheric Research) were selected.NCEP2 [17] made by NCEP and DOE (Department of Energy) and ERA-Interim (European Reanalysis-Interim) [18] developed by ECMWF (European Centre for Medium-Range Weather Forecasts) were also selected (Table 1).Daily precipitation and evaporation data can be collected up to the present.However, collection period is from 1997 to 2009, considering other meteorological data.
In addition to reanalysis data, some high resolution global precipitation data are developed using satellite data and ground observation data.GPCP (Global Precipitation Climatology Project) one-degree daily [19] is selected in these data sets (Table 1).These data are based on multiple passive microwaves, infrared satellite observations and gauge observations.Its temporal resolution is daily and spatial resolution is 1˚ × 1˚ latitude/longitude grid.Since GPCP is only precipitation dataset and other metrological data are not distributed, ERA-Interim evaporation data are therefore used because its spatial resolution is the highest among reanalysis datasets.The data period is from 1997 to 2009 which is the same as reanalysis data sets.
In order to validate the accuracy of reanalysis and global precipitation datasets, daily rainfall data at Tamale city from Ghana Meteorological Agency (GMet) was collected.The data from 1975 to 2009 could be collected.When the accuracy of reanalysis data and global precipitation datasets are compared, the data from 1997 to 2009 (13-year) are used, which is the same as reanalysis and global precipitation datasets.
River flow data from GRDC (Global Runoff Data Centre) was also collected.A hydrologic model described in 3.2 with reanalysis data and global precipitation data sets are run.Simulated runoff is then compared with the river flow data from GRDC to validate the accuracy of datasets.Considering the observation period and data qualities, NAWUNI station in White Volta River was selected as a target point (Figure 1).The basin area up to NAWUNI is about 92,950 km 2 .River flow data from 1975-2006 could be collected.However, considering the period of reanalysis data and global precipitation datasets, the river flow data from 1997 to 2006 is used to compare simulation and observation.

Hydrologic Model and Its Application
Whole study area is from −6˚W to 3˚W and 5˚N to 15˚N, which covers whole Volta River basin.This domain (9˚ × 10˚) is divided by 0.5˚ latitude-longitude spatial resolution (18 × 20 grids).At each grid cell, a water balance model which is explained in 3.2.2 and 3.2.3 is applied to calculate runoff which is subsequently routed through a grid-based flow network to simulate stream flows at selected points within the basin.Grid-based flow network is based on STN-30 (Simulated Topological Network) made by [20] and its flow directions were modified by a map indicating actual river flow routes and locations.
HYMOD (Hydrology Model) was adopted as a water balance model (Figure 2), developed by [21].The model assumes that the soil moisture storage capacity (c) varies across the catchment and, therefore, that the proportion of the catchment with saturated soils varies over time.The spatial variability of soil moisture capacity is described by the following distribution function where max c : maximum soil moisture storage (mm), β : degree of spatial variability of the stores (-).Evapora- tion from the soil moisture store occurs at the rate of the potential evaporation.Following evaporation, the remaining rainfalls are used to fill the soil moisture stores.Excess rainfall is sent to the routing module.The routing module divides the excess rainfall using split parameter ( ) α and routes these through parallel concep- tual linear reservoir meant to simulate the quick and slow flow response of the system.The flow from each reservoir is controlled by the quick flow residence time ( ) q K and the slow flow residence time ( ) s K .The simulated stream flow is therefore the addition of the outputs from each of these reservoirs.Number of parameters is five.Maximum and minimum values (Table 2) are decided based on the past research [22].
WTM (Water Transport Model) was adopted as a river flow simulation proposed by [23].This model is a quasi-linear reservoir model that computes discharge through each grid cell of the simulated river basin based on runoff inputs from HYMOD, a river networking system, channel transfer rates, and the timing and extent of floodplain inundation.For a single grid cell the flow and continuity equations are   where c S : channel storage (m 3 ), f S : floodplain storage (m 3 ), K : downstream transfer coefficient (-), A : grid cell area (m 2 ), n : number of upstream donor cells, R : runoff simulated by HYMOD (mm), u Q : upriver inflow (m 3 ), d Q : discharge exported downstream (m 3 ), g Q : runoff generated locally within the grid cell con- sidered (m 3 ), f Q : exchange between channel and flood plain (m 3 ), dma Q : long term mean annual downstream discharge (m 3 ).The coefficient f r determines the fraction (0.0 to 1.0) of potential volume change that is as- signed to floodplain storage, and f c is the flood initiation parameter, giving the proportion (0.0 to 1.0) of long-term mean annual flow required to invoke floodplain exchanges.Maximum and minimum values (Table 3) are decided based on the past research.
Reanalysis data contain precipitation and evaporation data.However, GPCP is only precipitation data sets.Therefore, GCPC is combined with ERA-Interim evaporation data, because ERA-Interim is the highest spatial resolution among the reanalysis datasets used in this study.Observed meteorological data are also precipitation data only, and is combined with ERA-Interim evaporation data (Table 4).These five pattern data sets are used as input data to the hydrologic model, and optimized parameter is calibrated for each data and validated in terms of accuracy.1997 to 2000 (4-year) is used as a calibration period and 2001 to 2006 (6-year) is used as a validation period.
ES (Evolution Strategy) was used to calibrate five parameters of HYMOD (Table 2) and three parameters of WTM (Table 3).ES is one of the global optimization methods [24]- [26] and is more powerful or efficient than SCE-UA (Shuffled Complex Evolution) method developed by University of Arizona.SCE-UA is most frequently used in parameter calibrations.Objective function to express the differences between observations and simulations is RMSE (Root Mean Square Error) which is tend to balance water volume and emphasizes the error at high flows.
( ) where oi Q : observed runoff, ci Q : simulated runoff, and N : number of data.

Precipitation
The accuracy of precipitation data sets is compared.NCEP1, NCEP2, ERA-Interim, and GPCP of grid precipi-tation which includes Tamale city are compared with observed rainfall data (Figure 3).This figure shows that correlation coefficient is 0.64 -0.66 in NCEP1 and NCEP2.On the other hand, correlation coefficient of ERA is 0.79, indicating that ERA is the best performance among reanalysis datasets.On the other hand, GPCP made by satellite and ground rainfall data has better correlation coefficient (0.86) than reanalysis data.Therefore, it is found that GPCP is the best datasets in terms of monthly rainfall accuracy among the sets.Moreover, each data set and observed rainfall in annual rainfall were compared (Figure 4).This figure shows that NCEP1 and NCEP2 are quite overestimation compared with observations and annual variability does not match observations.Although ERA-Interim has a tendency to underestimate in later periods (2005)(2006)(2007)(2008)(2009), annual variability quite matches with observations and much better accuracy than NCEP1 and NCEP2.On the other hand, annual variability of observed rainfall is quite well reproduced by GPCP, and its accuracy is the best in the data sets.Scatter plots of annual precipitations are shown in Figure 5.In NCEP1 and NCEP2, correlation coefficient is −0.18 -0.15, indicating almost no-correlation in annual period.ERA-Interim is much better than Table 3. WTM parameters.

Parameter
Minimum Maximum    NCEP1 and NCEP2, but correlation coefficient is 0.46 and the performance is not good very much.On the other hand, GPCP of correlation coefficient is 0.70 and it is found that GPCP is much better than other data sets.

River Flow
Model parameters are calibrated for each data set (Table 4) using the ES, and simulation and observation are shown in Figure 6.In addition to RMSE as an objective function, relative error and water balance error are shown in Tables 5-7.
Water balance error 100 From Figure 6, it is found that simulation by NCEP1, NCEP2, and ERA is good accuracy in terms of seasonal variations, even if annual precipitation pattern is not well produced (Figure 5).However, relative error is 66.0% -  157.9 % and water balance error is −2.4% -32.7% even in calibration period (Table 6 and Table 7).And it is found that it is difficult to use these data sets in actual applications (e.g., water resources planning, construction, future prediction and so on).On the other hand, simulation results by GPCP are quite good agreement in hydrograph shapes.In terms of three evaluations (RMSE, Relative error, water balance error), the results of GPCP is the best performance in all except water balance error of validation period.Relative error is 43.1% in calibration and 46.6% in validation.Although observed data are not completely used in GPCP simulations, its accuracy is extremely good and it is quite surprising.Moreover, the simulation results done by OBS (observed precipitation at Tamale city and ERA-Interim evaporation) are the worst results among all data sets used in this study.It is found that only one observed precipitation datum does not represent spatial variability of rainfall in the basin.The correlation coefficient between GPCP and observed data in monthly is 0.79 (Figure 3) and the correlation coefficient between GPCP and observed data in annual is 0.70 (Figure 5).Moreover, time series of annual rainfall were quite well reproduced by GPCP (Figure 4) and time series of river flows were quite well reproduced by the combination of GPCP and ERA-Interim (Figure 6).From these results, precipitation of GPCP and evaporation of ERA-Interim make good performance in rainfall-runoff analysis in data-scarce basin that use meteorological data cannot be used at all.

Conclusion
The utility of reanalysis data and global precipitation data were examined in the rainfall-runoff analysis for an ungauged basin.NCEP1, NCEP2, ERA-Interim, and GPCP data were compared with corresponding observed precipitation data.Although the reanalysis data such as NCEP1, NCEP2, and ERA had fair seasonal accuracy, annual variations were not reproduced in these data.However, GPCP data, which are based on remote sensing data, had good seasonal accuracy and reproduced annual variations, making it the best among the data sets examined.Furthermore, five pattern data sets were used as input data to a hydrologic model with HYMOD, which is a water balance model, and with WTM, which is a river model; thereafter, the hydrologic model was calibrated for each data set, and river flows were simulated.The results were evaluated by considering the root-mean-square error, relative error, and water balance error.The results indicate that the combination of GPCP precipitation and ERA evaporation data was the best in terms of most evaluations.The relative errors in the calibration and validation periods were 43.1% and 46.6%, respectively.Moreover, the results for the GPCP precipitation-ERA evaporation were better than those for the combination of observed precipitation and ERA evaporation.It was found that GPCP precipitation data and ERA evaporation data are very useful in an ungauged basin water balance analysis.

Figure 1 .
Figure 1.Map of the Volta River basin.

Figure 3 .
Figure 3. Scatter plot of monthly rainfall for each data set.

Figure 4 .
Figure 4. Time series of annual rainfall for each data set.

Figure 5 .
Figure 5. Scatter plot of annual rainfall for each data set.

Figure 6 .
Figure 6.Time series of river flow for each data set.

Table 1 .
Data sets used in this study.

Table 4 .
Input data sets used in this study.