Evaluation of Daily Gridded Meteorological Datasets over the Niger Delta Region of Nigeria and Implication to Water Resources Management

Hydro-climatological study is difficult in most of the developing countries due to the paucity of monitoring stations. Gridded climatological data provides an opportunity to extrapolate climate to areas without monitoring stations based on their ability to replicate the Spatio-temporal distribution and variability of observed datasets. Simple correlation and error analyses are not enough to predict the variability and distribution of precipitation and temperature. In this study, the coefficient of correlation (R), Root mean square error (RMSE), mean bias error (MBE) and mean wet and dry spell lengths were used to evaluate the performance of three widely used daily gridded precipitation, maximum and minimum temperature datasets from the Climatic Research Unit (CRU), Princeton University Global Meteorological Forcing (PGF) and Climate Forecast System Reanalysis (CFSR) datasets available over the Niger Delta part of Nigeria. The Standardised Precipitation Index was used to assess the confidence of using gridded precipitation products on water resource management. Results of correlation, error, and spell length analysis revealed that the CRU and PGF datasets performed much better than the CFSR datasets. SPI values also indicate a good association between station and CRU precipitation products. The CFSR datasets in comparison with the other data products in many years overestimated and underestimated the SPI. This indicates weak accuracy in predictability, hence not reliable for water resource management in the study area. However, CRU data products were found to perform much better in most of the statistical assessments conducted. This makes the methods used in this study to be useful for the asHow to cite this paper: Hassan, I., Kalin, R.M., White, C.J. and Aladejana, J.A. (2020) Evaluation of Daily Gridded Meteorological Datasets over the Niger Delta Region of Nigeria and Implication to Water Resources Management. Atmospheric and Climate Sciences, 10, 21-39. https://doi.org/10.4236/acs.2020.101002 Received: November 22, 2019 Accepted: December 20, 2019 Published: December 23, 2019 Copyright © 2020 by author(s) and Scientific Research Publishing Inc. This work is licensed under the Creative Commons Attribution International License (CC BY 4.0). http://creativecommons.org/licenses/by/4.0/ Open Access


Introduction
The accuracy and reliability of climate datasets are crucial for scientific research and hydrologic studies related to climate change impact assessment, numerical weather prediction, flood forecasting, drought monitoring or water resources management [1]. Paucity of data remains a challenging task especially in developing countries and in remote parts of the world where ground-based precipitation measurements, such as radar networks and rain gauges are either sparse or non-existent due to the high cost of establishment and maintenance of the infrastructure [2].
Recently, climate impact studies have become increasingly detailed as global industrialisation results in an unprecedented increase in greenhouse gases concentrations (GHGs), and associated impact on the changing climate [3]. Climate change impact assessment requires climate data at various spatial and temporal scales [4]. Getting data observations at an acceptable spatial resolution is challenging in developing countries, and where available, their quality is either poor or expensive and may poorly represent a study area with large hydroclimatic gradients [5] [6]. To overcome these challenges, researchers resort to the use of multilayer global gridded representations of meteorological data to serve as inputs into climate and hydrological modelling studies [6].
Climate datasets are typically measured in three ways viz: Gauge-Based observations, Satellite Estimates, and Reanalysis datasets [7]. Gauge observations provide relatively accurate and trusted measurements at a single point and are subject to limitations, such as reporting time delays, sparse gauge networks, data gaps, unavailable over many sparsely populated and oceanic areas and limited access to available data [8]. Satellite observations provide broad coverage of global atmospheric parameters with adequate spatial and temporal resolution in un-gauged regions, such as the oceans, complex mountain areas, and deserts. They provide information at regular intervals with uniform spatial coverage but contain non-negligible biases and random errors owing to the complicated nature of the relationship between the observations, sampling, and deficiencies in the algorithms. Reanalysis datasets merge random observations and models that encompass many physical and dynamical processes to generate a synthesised estimate of the state of the system across a regular grid, with spatial homogeneity, temporal continuity, and a multidimensional hierarchy [7]. These datasets are the gridded datasets varies with time and regional climate [7]. This makes it necessary to evaluate the capability of gridded data before its application over specific locations.
Recent studies show an increase in the usage of station gauge-based datasets for validating estimates from different interpolated datasets, reanalysis datasets, satellite products and climate models. For example, [5] used gauge-based datasets for comparison of four gridded datasets namely Asian Precipitation Highly Resolved Observational Data Integration towards Evaluation (APHRODITE), Global Precipitation Climatology Centre (GPCC), Centre for Climatic Research-University of Delaware (UDel); and Climatic Research Unit (CRU) datasets at stations located in the arid, semi-arid, and hyper-arid regions of Balochistan province of Pakistan. [20] used unified rain gauge data with three daily gauge based gridded rainfall datasets, namely the Indian Meteorological Department (IMD), APHRODITE and Climate Prediction centre (CPC) over India. [21] used IMD dataset for comparison of two satellite datasets, namely, GPCP and Tropical Rainfall Measurement Mission (TRMM) [22]. Compared station datasets with two gridded rainfall datasets, namely Climatic Research Unit (CRU), Hydro-Sciences Montpellier (SIEREM) for Burkina-Faso in West Africa.
Considering such diverse use, it is necessary to carefully examine and compare the characteristics and pattern of the gridded datasets. An inter-comparison should analyse the prominent precipitation spell characteristics such as wet and dry spells among the different gridded datasets and importantly, the implications of using these gridded products on water resource management. Variations in frequency, length and intensity of dry and wet spells within predicted datasets often lead to faulty hydrological and agricultural decisions like extreme events estimation, improper selection of crops, incorrect estimation of sowing and harvesting time [23]. Wet spells are prolonged number of wet days and serve as an indicator of flood conditions while Dry spells are prolonged period of dry days, which also serve as an indicator of drought conditions [20]. This information is of prime importance to the hydrologists, agronomist, hydrogeologists and water-resources managers [24] [25] [26] [27]. Standardised Precipitation Index (SPI) is used to identify meteorological wet and drought events from precipita-tion time series data and serves as a useful tool in water resource management [28].
Several studies [1] [2] [5] [8] [21]- [30] compared gridded datasets with station measurements, however, no study has been conducted in any part of Nigeria to determine which of the best stable gridded climate products matches well with the observed station datasets, and thus serves as a substitute to the station datasets in this highly data-scarce region. Most previous studies use correlation and error analysis to evaluate the performance of gridded products by comparing them with observed station datasets, but performance assessment of the products based on correlation and error analysis alone can be misleading [5]. This is because the coefficient of correlation is highly sensitive to outliers, hence may not explain the model capabilities fully [31]. The RMSE also varies with the variability in squared errors which reduces its ability to determine the degree to which it reflects average errors and to what extent it reflects variability in the distribution of squared errors [31] [32]. To overcome these drawbacks, indices like mean bias error (MBE) and spell lengths analysis can be used. MBE is used to analyse the mean in overestimations and underestimations of models. Wet and dry spells analysis are referred to as extended periods of wet and dry days, respectively [20] which is used in this study to compare which of the gridded datasets matches the observed datasets in estimating the wet and dry spells.
This study aims to: 1) to evaluate data from three gauge based gridded datasets (CRU, PGF and CFSR) daily precipitation, maximum and minimum temperature datasets available for the Niger Delta and compares it with observed station datasets to identify their fundamental differences. 2) analyse different spell characteristics and identify prominent differences in the spells of the datasets and 3) use the Standardised Precipitation Index (SPI) to assess the implications of using the gridded precipitation products on water resource management in Nigeria. The results will aid water resource management practitioners in selecting the appropriate gridded precipitation and temperature dataset for studying the best-gridded precipitation and temperature products depending upon the scope and application. This study provides insights into the Spatio-temporal behaviour of these three datasets for extreme events estimation, which in turn will benefit hydrological management over un-gauged or sparsely gauged regions.

Description of Study Area
The study area located in the Niger Delta part of Nigeria comprises of Bayelsa and Rivers State is presented in Figure 1. The area is low lying drained by Rivers Imo, Aba, Kwa-Ibo, Bonny, and their respective tributaries. The topography of the area under the influence of high coastal tides results in flooding mostly during the rainy season [33]. The climatic condition in the region comprises of the wet seasons (March to October) and dry seasons (November to February) characterised by high temperatures and high relative humidity throughout the year.
A short spell of the dry season often referred to as the "August break" caused by the deflection of the moisture-laden current is often experienced in August and sometimes occurs in July or September [33] due to variations of weather.

Data
The gridded datasets used in this study are daily precipitation, maximum and minimum temperature datasets obtained from CRU, PGF and CFRS for the years 1980 to 2005. The observed station datasets used for this study are located at Warri and Port-Harcourt (PH) international airport, in Delta and Rivers state which were obtained from the Nigerian Meteorological Agency (NIMET).
The CRU gridded datasets are extracted from the CRU version 4.01 global climate dataset [14] and downloaded from http://www.cru.uea.ac.uk/. The PGF datasets were developed by the Princeton University Global Meteorological Forcing centre [18] and downloaded from http://hydrology.princeton.edu/data/pgf/.
While the CFSR datasets were developed by the National Centres for Environ- These daily datasets were downloaded as NetCDF files and extracted at 0.5˚ × 0.5˚ resulting to an equal number of grids (22 grids) which are spatially distributed across the study area as shown in Figure 1. The observed station data has only one observation within the study area with two other contributing stations outside the study area.

Methods
Two ways are generally used to compare gridded datasets with station observations: 1) Computing the average areal precipitation for each grid box from available station data and comparing them grid-to-grid [34]; 2) Interpolating the gridded data to station level and comparing the datasets with the observed station data [35]. Several methods such as Arithmetic mean, Thiessen polygon, isohyetal method, and gridding or distance weighting are used to estimate areal precipitation from point data. Distance weighting methods are used for gridding of the observed data to the same spatial resolution of the gridded datasets to be compared with the gauges [36]. For sparse and unevenly distributed gauges, a simple averaging of all the station data within the grid box is preferred to compute the average areal precipitation [37] [38]. Previous studies by [37] [39] [40] [41] reported that for evaluation of gridded data with in situ measurements of only one observed station within a grid box, pairwise statistical analyses between the grid point rainfall estimates and rain gauge estimates are carried out assuming that station rainfall is the average observed rainfall for the grid box. This method has been used by [35] [37] [40] [41] for the evaluation gridded datasets with observed station measurements. In this study, the performance of the three different gridded precipitation datasets was compared with single-station data located within the grid box. This is because no grid box was found to have more than one observed station.
This study is conducted in three steps. The first involves evaluating the performance of gridded datasets using statistical analysis and visual inspection, the second involves the analysis of spells among datasets. The third focused on assessing the implication of using each dataset for water resource management.

Performance Evaluation of the Datasets
The performance of gridded datasets was evaluated statistically by comparing the Seasonal variation plots of the observed and gridded datasets for the whole study period. Performance of the datasets was tested using the coefficient of correlation (R 2 ), the Root mean square error (RMSE) and Mean Bias (MB) and summarised in Table 1 and

Analysis of Spells Characteristics among Datasets
The analysis of the characteristics of mean dry and wet spells lengths was con- [24] [44] also defined spells as the number of consecutive rainy days with rainfall > 2.5 mm. A wet day, in general, represents a rainy day while dry day represents a non-rainy day. Wet and dry spells are defined as extended periods of wet and dry days, respectively [20].
In this study, the R Multi-Site Rainfall Generator (RMRAINGEN) program dw.spell [45] was used for the spell length analysis. Precipitation thresholds of 1 to 3 mm were selected with a spell length of at least 1 day as recommended in [43] which ensure that no spell with extreme rainfall magnitude is missed out.
This is because rainfall of higher magnitude over shorter durations may be disastrous [27].

Standardised Precipitation Index (SPI)
In this study, SPI was used as a water resource management tool to investigate extreme events in the selected stations. This gives an insight into the implication of the wrong choice of datasets for water resource management. SPI was developed for monitoring and defining meteorological drought and wet events from precipitation time series data [28]. SPI computes the precipitation deficit for multiple time steps and therefore facilitates the temporal analysis of drought [46]. It has also been reported that SPI provides a better spatial standardisation than any other indices. Positive SPI values show higher than median precipitation, while negative values indicate less than median precipitation [47]. SPI is calculated taking the difference of the monthly precipitation (x i ) from the monthly mean ( x ) then dividing by the standard deviation ( σ ) [48].

Performance of Gridded Datasets
The graphs comparing the distribution of the monthly mean for the station, CRU, PGF and CFSR daily precipitation, maximum and minimum temperature datasets of the study area are shown in Figures 2(a) descriptive statistics describing the characteristics of the datasets are also summarised in Table 1, and the statistical indicators are summarised in Table 2 (Table 1 & Table 2), and the graphical visual comparison, there is a better agreement between the monthly means of the observed station datasets and the CRU datasets which shows a better performance to that of PGF and CFRS datasets for both precipitation, maximum temperature (Tmax) and minimum temperature (Tmin) (Figure 2) even with the better performance of some of the datasets in correlation and error analysis ( Table 2). This shows that results of correlation and error analysis can often be misleading in the evaluation of gridded data products as reported by [5].

Wet Spell and Dry Spell
Results of spell analysis for the distribution of mean monthly Dry and Wet Spell lengths are shown in Figure 3  Atmospheric and Climate Sciences distribution of CRU and PGF rainfall datasets are similar to that of the station rainfall at both study location. However, in February and December, the PGF tends to show a much longer dry spell length. The CFRS precipitation datasets consistently overestimated the mean monthly wet-spell length and underestimated the mean monthly dry-spell length. The spell length correlation was then used to assess the annual performance of the spell indices among the datasets.
The correlation results summarised in Figure 4(a) and Figure 4

(b) for Port
Harcourt and Warri stations shows that the CRU datasets correlated better than all the remaining datasets in comparison with the observed station precipitation datasets. Therefore, the CRU can be considered as the most reliable precipitation data in term of temporal characteristics.

The Implication for Water Resource Management
The to an extremely wet event based on [28] classification. In 1991, the extreme wet event that was estimated by the Station, CRU and PGF datasets was also grossly underestimated by the CFRS to a near-normal event.
The evaluation and validation of different gridded datasets is necessary for any region in order to determine the best performing datasets in comparison

Discussions
Several statistical techniques have been used in this study to assess the performance of three available daily gridded datasets, commonly used in the Niger Delta part of Nigeria. The study suggests that the CRU datasets replicated the observed station datasets more appropriately when different characteristics of the datasets are considered together. In Nigeria, no study has so far been conducted to assess the performance of daily gridded precipitation and temperature data products that can serve as an alternative to the unavailable station datasets. In West Africa, the only study conducted so far was in Burkina-Faso by Mahe et al., [22].
The study evaluated the ability of CRU and SIEREM (Hydro Sciences Montpellier, France) data to reproduce observed station precipitation characteristics of Burkina-Faso and to analyse the consequences of choosing each one of them on the simulated river flows in the five basins across the country. Mahe et al., [22] reported the superiority of the SIEREM precipitation in replicating the station observed precipitation product compared to CRU precipitation. AS SIEREM datasets are developed for largely Francophone African countries [49], hence they are therefore not available for Nigeria. However, comparison of the results from CRU, SIEREM and Burkina-Faso of meteorological station datasets shows that the distribution of annual rainfalls, Mean, standard deviation and minimum values are very similar [22].
Several factors contribute to the performance of gridded data products in a particular area which includes the topography of the area, method of interpolation, the quality of observed stations data, numbers of meteorological stations data and the distribution of the observed meteorological stations used in developing the datasets [50] [51]. The availability of quality long-term raw data for use in the construction of the gridded product is very crucial which is in many cases not good enough especially in developing countries where the data are often found to be nonuniform [52]. Observed meteorological time-series datasets are generally included in gauge-based gridded database preparation after checking its quality [5]. These quality control processes differ for different gridded products. In CRU, all the datasets are passed through a two-stage extensive quality control measures conducted manually and using a semi-automatic approach.
The data is checked for the consistency in the first stage followed by the removal of stations or months that gives a large error during the interpolation process in the second stage [53] [54]. CRU also considers essential factors such as elevation during interpolation which consequently, enhances its ability for accurate estimation [54]. The CRU data was developed from more than 4000 weather stations distributed around the world [14] [54]. In the southern part of Nigeria alone, more than fifteen station datasets were used as contributing stations which Though PGF was also found to perform very well compared to CFSR in term of various characteristics. It was, however, found to underestimate the mean and median of precipitation, minimum and maximum temperature. This may cause uncertainty in the impact estimated using gridded datasets. It should be noted that no gridded dataset can correctly represent the actual station datasets [52].
Relevant dataset for a particular study can be selected based on the ability of the dataset to reconstruct specific property of the data required for such study.

Conclusion
The performances of three gridded daily precipitation and temperature products, namely, CRU, PGF and CFRS available for the study area and widely used as an alternative to observed precipitation and temperature, were evaluated in this paper. The performance of the data was evaluated using statistical indicators, comparison of time-series graphs and spells lengths analysis among datasets. The Standardised Precipitation Index (SPI) was used to determine the implication of using poor gridded precipitation data for water resource management. RMSE and MBE indicate the consistency of the CRU dataset to provide lowest errors in predicted precipitation, and PGF dataset in prediction of maximum and minimum temperatures. Results of spell analysis also show a good association between the station datasets, CRU and PGF for estimating the mean monthly distribution of Wet and Dry Spell lengths. SPI values indicate a good association between observed and CRU and PGF precipitation while the CFRS overestimate and underestimate the SPI in many years. The results revealed a clear superiority of CRU daily precipitation and temperature predictions over the other gridded products for replicating the mean monthly distribution of the station precipitation, maximum and minimum temperature in the Nigeria study area. It can, therefore, be concluded that the CRU datasets are the best performing datasets in this region and hence can confidently be used as observed datasets for further hydrological studies and future climate projection in the study area. The results obtained through the present study can be expanded to other regions in Nigeria having different climate and topography. It is expected the statistical methods used in the present study can be used in other parts of the world to select better performing data products before application of any gridded product for hydrological and climate studies in study areas with paucity of data.