Validation of Three Satellite Precipitation Products in Two South-Western African Watersheds: Bandama (Ivory Coast) and Mono (Togo)

Satellite precipitation products are widely used in different domain, in area where there is a lack in observation. These have different spatio-temporal resolutions consequently resulting in different precipitation amounts depending on the product. The pr esent study validates three satellite products, namely the Climate Hazard group Infrared Precipitation with Stations (CHIRPS), the Climate Research Unit (CRU) and the Global Precipitation Climatology Project (GPCP) over Bandama and Mono river basins for 1981-2005 and 1981-2016 respectively by comparing them to the observation precipitation of the basin. The available studies are focused on the regional scale but not on a watershed scale for hydrological studies. The analysis reveals product following by GPCP give the lowest mean absolute error (MAE) at annual and seasonal time scales while CHIRPS is followed by CRU at monthly scale. Overall, all products overestimate the precipitation at Bandama basin while they underestimate it over Mono river basin. The comparison over 1981-2017 period of the total annual precipitation increasing southern ward (from Sahel to the coastal zone) for all the three studied products which varies from 300 mm to 2400 mm/year. All the three products are not signifi-cantly different from one another and they all highlight the same areas of hotspot rainfall in the region. The same conclusion is made at monthly and seasonal scales. Therefore, any of these products especially CHIRPS can be used for study in this region due to its lowest bias and MAE.


Introduction
Precipitation is the main component of the global water cycle and energy balance as well as a major contributor to extremes climate events crucial factor in water resources management [1]. Precipitation is a vital component of how water moves through earth's water cycle, connecting the ocean, land, and atmosphere. Knowing where it rains, how much it rains and the character of the falling rain, snow or hail allows scientists to better understand precipitation's impact on streams, rivers, surface runoff and groundwater. Frequent and detailed measurements help scientists make models of and determine changes in Earth's water cycle. All this is possible thanks to observations. Observations are essential to climate monitoring since they are the basis for: 1) assessing century-scale trends; 2) the validation of climate models; 3) the detection and attribution of changes in climate at regional scale [1]. But sometimes get these observations is a challenge.
Observations precipitation data are given by installed rain gauges. In some areas difficult to assess either due to complexity of orography or with scare human settlement or social conflict, those rain gauges are not enough to provide data that covers the area for simulation. This makes study characterizing and understanding current changes in precipitation challengeable especially in Western region with the lowest gauge density in sub-Saharan Africa [2]. Satellite retrieval and climate reanalysis data usage are made to solve lack of data challenge and cover ungauged area [3]. But how those data are built?
A climate reanalysis gives a numerical description of the recent climate, produced by combining models with observations [1] for different purposes [4]- [9].
They are produced via data assimilation, a process that relies on both observations and model-based forecasts to estimate conditions. The estimates are produced for all locations on earth, and they span a long time period that can ex-  several decades or more for atmospheric parameters such as air temperature, pressure and wind at different altitudes, and surface parameters such as rainfall, soil moisture content, ocean-wave height and sea-surface temperature [10]. Their usage is increasing for weather and climate studies. For instance, reanalysis data can be used to calculate climate trend [11] especially for temperature and precipitation trends [2] and extremes [12]- [14]. Some reanalysis products are Tropical Applications of Meteorology using SATellite data and groundbased observations (TAMSAT) [5], Tropical Rainfall Measuring Mission (TRMM) [6], CHIRPS [7], CRU [8], GPCP [9]. What is the accuracy of those data despite the hybrid sources of them?
It has been recommended that some validation should be done for these data before made the usage due to their multiple spatial and temporal resolution.
This validation is made by some metric computation and some statistical analysis mainly at seasonal and annually timescale. Indeed, the metric parameter such as MAE and correlation coefficient are enough to compare the reanalysis data to observation. Additionally, other factors namely the bias, mean error, the root mean square error (RMSE) and Nash-Sutcliffe Efficiency coefficient are computed. For instance, Caroletti et al., [1] use the MAE and STD as well as the Pearson correlation to validate some satellite precipitation based dataset while the statistic metrics such as correlation (R), bias, root mean-square difference (RMSD) and the ratio of standard deviation (STD) have been used to validate multiple satellite-based products by Ullah et al., [4] and Beck et al., [15]. It is worth noting that this type of study is widely done at regional scale but not at a basin scale for hydrological studies.
This study aims to evaluate the skills of three satellite and reanalysis datasets four (04) rain gauge for Bandama with outlet at Taabo and Mono river basins respectively were used. Indeed, average area of total observed, and reanalysis precipitation was computed from each of both basins. The study area is one of the sub-Saharan regions with low rain gauge distribution which is a challenging area for rainfall studies. All these sets have been validated using the same metric factor as in Caroletti et al., [1] study namely 1) a three-metrics set consisting of adimensional, relative MAE, bias and standard deviation error (STD) and 2) Pearson correlation.

Study Area
The Bandama and Mono rivers basins are in the south part of West Africa. Ban-dama river basin locates in Ivory Coast and stretches between 5˚ -10˚N latitude and 4˚ -7˚W longitude while the transboundary Mono river basin is shared between Togo and Benin and locates between 6.5˚ -9.5˚N latitude and 1˚ -2˚E longitude (see Figure 1). The area of interest has an average and highest altitude of 440 m and 985 m above sea level. The West Africa region is typically tropical and equatorial climate. The climate of the region is controlled by the movement of intertropical convergent zone (ITCZ) guided by the monsoon. The precipitation increases when moving at southern ward (from Sahel to Guinean zone) while the potential evapotranspiration (PET) increase northern ward. The lowest rainfall is about 300 mm/year recorded in Sahel area and the highest value is obtained at mountainous area with total precipitation amount about 2400 mm/year. The climate system over Mono and Bandama river basin are presented by Lamboni et al. [16] and Soro et al. [17], respectively. The northern part presents unimodal rainfall regime while the central and southern part show bimodal. The Sahelian area is arid and received the lowest annual rainfall while the sudano-sahelian is semi-arid. The Sudanian zone is dry sub-humid while the Guinean which record the highest rainfall is humid.

Data Processing
The following dataset have been evaluated and validated: 1) CHIRPS data: CHIRPS is a 30+ year quasi-global rainfall dataset. Spanning 50˚S -50˚N (and all longitudes), starting in 1981 to near-present, CHIRPS incorporates 0.05˚ resolution satellite imagery with in-situ station data to create gridded rainfall time series for trend analysis and seasonal drought monitoring 2) CRU data: Time-series (TS) datasets are month-by-month variation in climate over the last century or so as produced by the Climatic Research Unit (CRU) at the University of East Anglia. These are calculated on high-resolution (0.5 × 0.5 degree) grids from 1901 to present purposely made for climate variation study [8].
3) GPCP data: The Global Precipitation Climatology Project (GPCP) was established by the World Climate Research Programme to quantify the distribution of precipitation around the globe over many years. GPCP version 2 has a spatial resolution of 2.5-degree with temporal resolution of monthly scale and covers the period January 1979 to the present, with a delay of two to three months for data reception and processing [9] [19] [20].
The anomaly between reanalysis data was computed for the period 1981-2017 at annual and seasonal time scale to find out how closer each product is closer to one another.

Validation Metrics
The common period between observation and reanalysis data was chosen according to the basin. For instance, the 1981-2005 and 1981-2016 periods were selected for Bandama and Mono river basins, respectively. The comparison is made for monthly, annual, seasonal and interannual scale at basin level, the metrics were computed between in-situ precipitation as independent and reanalysis as dependent variables. The validation has been conducted for monthly precipitation due to its importance on monthly normal climatology, standard precipitation index and hydrological modelling [1]. Additionally, the seasonal and annual time scale was used due to their usefulness on seasonal correlation between precipitation and other climatology events and for precipitation trend. In term of metrics used for validation, MEA, absolute standard deviation and bias at monthly, seasonal and annual were computed. The MAE and standard deviation are among the commonly used metrics for data validation [1] [21]. Lastly the Pearson correlation commonly used in climate science to evaluate the data dependency [15] is performed to detect the relationship among dependent variables as well as their link with independent variable. Other metrics used daily data like number of wet days could not be possible since daily data is not available for all products except the CHIRPS.

Mean Error and Standard Deviation Error
The mean error evaluates how well the reanalysis corresponds to the observed values, indicating the degree of estimation of total precipitation by the models over the basin. The standard deviation assesses the average magnitude of estimated errors, and the capability at reproducing variability [1].
The following equations are adapted from Caroletti et al. [1]. Assuming that P b (m, y) the monthly precipitation, for month m, year y and for a dependant dataset v to be validated. And N y the number of years of monthly averaged precipitation P b (m, y), starting with year y 0 . This data is collected as an averaged value over Bandama and Mono river basin. The N y -years average of the monthly precipitation for each month in the annual cycle µ b (m) is: The standard deviation of the month m is given by: For the observation data, the P b (m, y) is replaced by P o (m, y), in equations (1) and (2) and finally the mean and standard deviation error become µ o (m) and σ o (m) respectively. Where P o (m, y) is the observed total annual precipitation over the basin.
MAE and mean absolute error of STD were computed after Caroletti et al. [1].
MAE and mean absolute error on the STD between dependant and independent variables were introduced introduce by Deidda et al. [22] to assess the ability of regionals climate models (RCMs) to perform temperature and precipitation.
Caroletti et al. [1] used the approach to validate reanalysis data and RCMs at monthly time scale.
1) The mean absolute error on the monthly (seasonal/annual) mean

Pearson Correlation Coefficient
The Correlation coefficients are used in statistics to measure how strong a relationship is between two variables [23]. For each season (s) and each dataset (b), the coefficient is defined as:

Interannual Variability of Reanalysis Products and Observed Data over Bandama and Mono River Basin
Over Bandama and Mono river basins, the estimated precipitation from reanalysis data presents the same interannual variability, same amplitude and magnitude as the observed data. Over Mono river basin, the reanalysis data could not well estimate precipitation for 2008-2011 period. The interannual variability of all dataset is displayed in Figure 2.

1) Monthly time scale
The mean error (bias), mean and standard deviation error are presented in

1) Annual
The Annual time series dataset (observed and reanalysis) over Bandama and

Interannual Precipitation Anomaly and Trend Comparison
The standardized precipitation anomaly is computed used Lamb coefficient [25]. The indices are defined by World Meteorology Organization (WMO) [26]. This index is commonly used in climate extremes assessment to study drought or deficit of water. The analysis reveals that all the reanalysis products present the same index as the observed data over both river basins but with different magnitude depending on the product. Overall, reanalysis products indicate either a deficit (surplus) in precipitation at any time the observed data exhibit scarcity (excess) in index ( Figure 11).
All the evaluated reanalysis dataset gives the same interannual variability and almost the same amplitude and magnitude ( Figure 12). None of three products show significant trend over both considered river basins.
Generally, all the three products display a total precipitation ranging from below 300 to 2400 and above mm/year which increases southern ward over the West African region ( Figure 13). Moreover, the area with the highest rainfall amount is highlighted by all the three products ( Figure 13) and the difference between each product to one another is not significant and the degree of the difference depends on geographical location. Furthermore, the difference between products varies around ±20 mm for most the area except some precipitation hotspot where CHIRPS give the highest estimation than other products (CRU and GPCP) and where CRU gives the highest/lowest estimation than GPCP ( Figure 13).

Discussion
Multiple metric errors in addition to standardized precipitation index performance were used to validate the three-reanalysis data to provide a deeper understanding of the dataset's skills and limitations. There is no significant difference among three reanalysis data at annual and seasonal time scales over 1981-2017 period. All the products capture the north-southern rainfall gradient. This is also found in Burkina Faso using satellite-based precipitation [27].   However, all products overestimate (underestimate) precipitation over Bandama (Mono) river basin which is in accordance with Dembélé et al. [27] study.
The underestimation of rainfall over Mono basin could be since the performance of the different satellite products exhibits high spatial variability with weak performances over coastal and mountainous regions [28]. The satellite generally overestimates precipitation [29].
By comparing the reanalysis datasets to the observation over Bandama  and Mono (1981-2016) river basin, CHIRPS has an extremely high correlation value (R ≥ 0.8) and a low MAE and bias compared to other products. This is also found by Caroletti et al. [1] over southern part of Italy and by Ullah et al. [4] over Pakistan. CHIRPS adequately estimates the precipitation probably due to its finest spatial resolution (0.05˚).
The performance of CHIRPS product was demonstrated in Cyprus where it has proved to correlate very well spatially with the available station data [12] and over Tekeze-Atbara Basin in Ethiopia, where it performed well and were able to capture the rainfall measured by rain gauges [30]. Over Eastern Africa, CHIRPS product was demonstrated to perform well precipitation than ARC2 and TAMSAT2 [28]. The difference among satellite-based products could be due to coarser spatial and temporal resolution of some.
Based on statistical metrics performed and additional analysis, CHIRPS performs better the precipitation at monthly, seasonal and annual time scale and capture well the occurrence of extremes event. This was proved by Wu et al. [21] over Yunnan Province in China. They demonstrated that CHIRPS data performed well in terms of monthly precipitation estimation and is adequate in capturing the spatial distribution of precipitation. This is also confirmed by other research elsewhere [31] [32].
Moreover, the CHIRPS-based SPI was shown to be able to capture the occurrence and characteristics of drought events, suggesting that the CHIRPS dataset could be used as an alternative precipitation source for monitoring drought. The magnitudes of the anomaly are very different in all years which is confirmed by Lamptey [29] study. Over Sub-Saharan Africa, CHIRPS [33] and TRMM products have ranked highest for multiples indices in performing changing precipitation extremes identification [34].

Conclusions
The comparison of three reanalysis products namely CHIRPS, CRU and GPCP reveals that they give almost the same total annual (seasonal) precipitation average over 1981-2017 period over West Africa region. The difference among products varies according to the geographical location (generally around ±20 mm) which is not significant. All the products exhibit the same interannual variability, trend, amplitude and magnitude at basin level (Bandama, Mono). Generally, none of the products shows any trend for 1981-2017 period (not statistically significant at 95% confidence level, p-value < 0.05) over both river basins.
By comparing the reanalysis product to the observed data over Bandama that, considering spatial distribution, trend analysis, error metrics, Pearson correlation and standardized precipitation index performance, the best dataset is the satellite-based CHIRPS over Bandama river basin while CHIRPS and GPCP datasets over Mono river basin, but GPCP is too coarse spatial resolution at 2.5˚.