A Novel Approach for the Joint Use of Rainfall Monthly and Daily Ground Station Data with TRMM Data to Generate IDF Estimates in a Poorly Gauged Arid Region

In poorly gauged regions, rainfall data are often short or even absent, hindering the possibility of estimating IntensityDuration-Frequency (IDF) relations with operationally acceptable accuracy. In this research, a novel idea is presented for the use of three separate rainfall datasets: maximum annual daily data, monthly data and Tropical Rainfall Measuring Mission (TRMM) satellite data to develop robust IDF in Namibe, south of Angola. TRMM data is used to derive relations between maximum monthly and maximum daily rainfall and between sub-daily and daily rainfall depths. Frequency analysis is undertaken on the mixed daily record using several distributions and the best fitting is selected based on discriminant plots of the distribution tails and the moment ratio diagram as well as Bayesian criteria. The IDF curves are derived based on the estimates of daily rainfall at various return periods, with the derived sub-daily rainfall duration ratios. Robust IDFs are thus developed for a scarce data region in Africa.


Introduction
Frequency analysis is a statistical procedure based on studying past events, which are representative of the characteristics of a given process (hydrological or other), in order to estimate the probability of occurrence of rare events.This estimation is based on the definition and fitting of a frequency model, which is an equation describing the statistical behavior of the process.These models describe the probability of occurrence of an event of a given value.The validity of the results of a frequency analysis depends on the choice of the frequency model and particularly its type.Various pathways can facilitate this choice, but unfortunately there is no universal method that ensures the correctness of the results.In fact, the estimation of the probability of occurrence of extreme rainfall is an extrapolation based on limited data.Large variations associated with small sample sizes cause the estimates to be unrealistic.In practice, however, data may be limited or in some cases may not be available for a site.
This paper presents the approach and results aiming at developing Intensity-Duration-Frequency (IDF) curves, for a region where ground rainfall stations data are scarce.To complement the old daily dataset, available from 1937 to 1952, monthly rainfall values, available for the recent period from 1998 to 2010, along with Tropical Rainfall Measuring Mission (TRMM) corrected satellite data, available from 1998 to 2011, were used.The Tropical Rainfall Measuring Mission (TRMM) is a joint mission between NASA and the Japan Aerospace Exploration Agency (JAXA) designed to measure rainfall for weather and climate research.TRMM is a research satellite designed to improve the understanding of the rainfall distribution and variability within the tropical and subtropical region (40˚S -40˚N).Several studies have compared the TRMM data with ground station data, throughhout the world and particularly in Africa [1][2][3][4][5][6].Very few studies used the TRMM data in the process of developing IDF relations [7,8].The main aim of this study is continue in this line of investigating the joint use of TRMM data with ground station data to produce IDFs.
The first step of this current research methodology is to disaggregate the monthly data to derive the daily annual maxima data using the characteristics of TRMM data.The second step is to assess if the daily data derived from monthly data is significantly different from the ground station daily data with respect to the mean and variance of both datasets.
Once we establish the possibility of the joint use of the daily data derived from monthly data and the available maximum daily rainfall, the frequency analysis is undertaken on the merged record of maximum daily rainfall, using several distributions.The best fitting is selected based on discriminant plots of the distribution tails, the moment ratio diagram, as well as Bayesian criteria.
The IDF curves are then derived based on the estimates of the daily rainfall at various return periods, with the sub-daily rainfall duration ratios derived using a mixture between the published storm distributions and the TRMM sub-daily rainfall duration ratios.IDFs are thus developed for a poorly rainfall gauged region.The approach is applied to Namibe city (Angola, Africa) which is located in the South of Angola (15˚12'S and 12˚09'E) with an altitude of 3 m.

Rainfall Data Available from Ground Stations (Daily and Monthly) and TRMM
In the procedure of developing the Intensity Duration Frequency (IDF) curves for Namibe, Angola, data for ground rain gauging stations were collected.Unfortuately, recent ground stations daily records were not available.Old data were retrieved from "Elementos Meteorológicos e Climatológicos" of the "Serviços de Marinha, Repartição Técnica de Estatística Geral".The reports available cover the period from 1937 to 1952.They were retrieved from the National Oceanic and Atmospheric Agency in the USA, which keeps in its data rescue website scanned copies of these reports [9].Table 1 shows the maximum daily rainfall depths recorded in millimeters.Furthermore, monthly data are available for the period from 1998 to 2010.However, no daily data were available for the same period.Tropical Rainfall Measuring Mission (TRMM) data were used.TRMM data give rainfall depths every 3 hours and are downloadable from the below mentioned website [10]: http://gdata1.sci.gsfc.nasa.gov/daac-bin/G3/gui.cgi?insta nce_id=TRMM_3B42_Daily Analyzing the ratio between monthly data and maximum annual data of TRMM, it was found that the maximum rainfall amount that fell in one day is nearly equal to 0.6 the respective monthly data (which is an envelope line for the scatterplot, as shown by Figure 1).This ratio will be used to derive the maximum annual data from the monthly available data for storms greater than 20 mm/ day.For smaller storms, it was decided to consider that the monthly maximum is equal to the daily annual maximum as a conservative option.It is worth mentioning that the correlation coefficient between the monthly data of ground stations and those of TRMM is only 0.4, which is not enough to use the TRMM data as they are.In this research, we assume that the TRMM data is consistent in itself and we derive the envelope ratio between monthly and maximum daily TRMM data and apply this ratio on the ground station rainfall data.This approach can only be valid in an arid region with limited rainy days per  month; which is the main limit of applicability of the presented methodology.

Homogeneity Check
The frequency analysis of rainfall data records is affected by the number of records for each station.Therefore, carrying the analysis for all the records available shall give higher confidence to the results.However, the daily rainfall derived from monthly data should be tested first if they can be combined to the ground daily old data in one set.Several tests are available to test the homogeneity of the means and variances of 2 datasets.In this paper we follow the same approach presented in Awadallah et al. [7] using the Mann-Whitney U [11], Wilcoxon W [12], the Moses extreme reactions [13], the Kolmogorov-Smirnov Z, and the Wald-Wolfowitz runs tests [14] to check the homogeneity of the 2 means and the Levene's test [15] to check the homogeneity of the 2 variances.For details about these tests, the reader is referred to [16].Tables 2-6 present the tests results.All tests confirm that there is no statistical evidence-to a 5% level of significance-that there is a difference in the means of the two samples (Tables 2-5).The application of Levene test (Table 6) shows also that the variance of the ground daily old data is not significantly different (at 5% level of significance) from the variance of the recent daily data derived from the monthly data.

Frequency Analysis of the Maximum Annual Daily Rainfall
Distributions used in frequency analysis can be grouped in three categories [17] that contain the most common distributions that are widely used in hydrology to represent maximum annual series (Figure 2     -Class E (Exponential distribution) [17].
Other distributions not mentioned in the above classification include the three-parameter Lognormal distribution and the Generalized Pareto distribution.
To choose between distributions, the visual fitting comparison, although necessary, is highly subjective and misleading.To overcome this subjectivity, several methods are available for the choice between distributions.One can use the moment ratio diagrams whether the ordinary or the linear moments.Another methodology is the one proposed by El-Adlouni et al. [18].A third approach is the use of Akaike and Bayesian Information Criteria.The results of the three approaches are described hereafter.
Based on the ordinary moment ratios diagram (Figure 3), the Generalized Pareto distribution, the Pearson type III distribution and consequently the Gamma distribution (which is identical to the Pearson type III as far as the moment diagram is concerned) are three candidate distributions for fitting the Namibe data.
On the other hand, the approach proposed by El-Adlouni et al. [18] allows for the identification of the most adequate class of distributions to fit a given sample through two discriminant plots.Applying this approach on Namibe data, it was found that the data belong to class D, in which Gumbel (EV1), Pearson type III (PIII),   and Gamma (G) distributions belong.It should be noted that 3-parameter lognormal and the Generalized Pareto should also be tested as they do not belong to any of the classes (refer to Figure 2).The fitting of the Namibe data to Gumbel, Pearson type III, Gamma, three-parameter lognormal and Generalized Pareto distributions are shown in Figures 4-8.Zero data (2 values out of 28) were neglected to be on the safe side and to allow for log transformations.
To choose between tested distributions, the Akaike Information Criterion (AIC) ( [19] and [20]) and Bayesian Information Criterion (BIC) [21] can be used.Both criteria are based on the deviation between the fitted distribution and the empirical probability with a penalization that is function of the number of parameters of the distribu-     tion and the sample size.The distribution having the smallest BIC and AIC is the one that best fits the data.Calculating the BIC and AIC for all tested distributions, it was found that the Gamma distribution is the one having the smallest BIC and AIC (Table 7).In any case, the difference between the top 3 candidates (Gamma, Pareto and three-parameter lognormal) is minor even for high return periods such as the 100-year (Table 7).The frequency analysis results of the Gamma distribution fitting to the daily Namibe rainfall are shown in Table 8.

Intensity Duration Frequency Curves Development
A theoretical ratio of 1.13 to 1.14 is adopted to transform the daily rainfall values and 24-hr values [22].In the absence of short duration records or any similar information, sub-daily rainfall duration ratios could be assumed between rainfall intensities of 24-hr and those of the 12-, 6-, 3-, 2-, 1-hr, 30-, 15-, and 5-min ratios (refer to Table 9).An example of such ratios were first proposed from durations of 2-hr to 5-min by Bell [23] based on studies in the USA, and extended to the 24-hr rainfall depth by the Soil Conservation Service of the USA through their SCS type II dimensionless rainfall curve [24].Using The TRMM data from 3 hours to one day, we found that the ratios between X-hr storms and one day storms are 0.7, 0.84, 0.92, for 3-hr, 6-hr, and 12-hr, respectively.These ratios are nearly equal to the ratios of Bell with the SCS type II extension shown in Table 9.The Bell ratios were modified to conform to the TRMM derived ratios as shown in the same Table 9.It is well known that ratios for durations from 2 hours to 5 minutes are fairly constant in different climates because of the similarity of convective storms patterns ( [23,25] and references there in).
Based on multiplying the rainfall depths of Table 8 by 1.13 and then by the ratios of Table 9, the intensity-duration rainfall values are calculated.IDF curves for return periods of 100-, 50-, 25-, 10-, 5-and 2-year are shown for Namibe, Angola (Figure 9 and Table 10).

Conclusion
This research presents an approach based on the joint use of the available ground data (both daily and monthly data) with TRMM satellite data to generate Intensity Duration    Frequency (IDF) estimates, for a poorly gauged region, where a long and reliable rainfall daily maximum record is not available.First, daily rainfall data are derived from monthly ground station data using TRMM characteristics for the same period.Then, homogeneity of the means and variances are checked for the data.Frequency analysis is carried out and the best distribution is selected based on several criteria, including moment ratio diagram, discriminant plots of the distribution tails and Bayesian criteria.TRMM data from 3 hours to one day are used with the Bell/SCS type II short duration ratios to derive location corrected sub-daily rainfall duration ratios.Daily rainfall depths at various return periods and short duration rainfall ratios are combined to obtain IDF curves for the location of interest.

Figure 4 .
Figure 4. Fitting of the Gumbel distribution.

Figure 5 .
Figure 5. Fitting of the Gamma distribution.

Figure 6 .
Figure 6.Fitting of the Pearson type III distribution.

Figure 8 .
Figure 8. Fitting of the Pareto distribution.

A
Novel Approach for the Joint Use of Rainfall Monthly and Daily Ground Station Data with TRMM Data to Generate IDF Estimates in a Poorly Gauged Arid Region 2

Table 2 . Mann-Whitney U and Wilcoxon W statistics.
a Not corrected for ties.

Table 5 . Wald-Wolfowitz test statistics.
a No inter-group ties encountered.

Table 7 . Bayesian criteria for tested distributions.
* * * Log likelihood function is not available for this distribution.

Table 9 . Ratios between 24-hr duration intensity and other storm duration intensities.
Novel Approach for the Joint Use of Rainfall Monthly and Daily Ground Station Data with TRMM Data to Generate IDF Estimates in a Poorly Gauged Arid Region 6