Generic Method for Merging Satellite and Historical Ground Station Data to Design Rainfall Intensity Duration Frequency ( IDF ) Curves in Recordless Sub-Saharian Countries

The availability of long-term rainfall records is essential to conduct a serious frequency analysis in order to estimate the effective precipitation depth. The development of the process of elaboration of IDF (Intensity-Duration-Frequency) curves for a given location requires very precise data, at least with daily frequency, obtained through the use of rainfall records. The present study presents a method used to merge historical precipitation data with the latest data collected by satellite in order to perform graphs with IDF curves in places where rainfall records are scarce. The homogeneity of the data used is analyzed in order to guarantee its statistical utility and the frequency analysis was performed with the statistical distributions of Extreme Values Type I (Gumbel), Gamma, Pearson Type III and finally with Log-Pearson Type III, in order to verify which one of them applies better to the sites chosen for this analysis: the cities of Benguela and Lobito in the south of Angola. Daily rainfall data from the TRMM mission and historical daily data were used to derive the relationships between the maximum daily precipitation and the sub-daily precipitation values. From the observed daily data, techniques of disaggregation of the collected data were used, in order to generate a synthetic precipitation sequence with the extreme values in periods of time inferior to the daily one, with statistical properties similar to the registered data. Then IDF equations are established, with which the occasional storm depth is calculated for various return periods and various durations and, after them, the IDF curves are drawn for these two geographic stations.


Introduction
After the end of the colonial era and the advent of independence, some countries in sub-Saharan Africa have or had more relevant priorities than the need to collect and record scientific data, including rainfall.In many of them the war had a disastrous effect on the archives of scientific data.Angola was in a state of civil war since its independence in 1975 until 2002, and during this period many records of climate data were lost and no new records were obtained in most cities in the territory.
Engineering projects usually have to take into consideration rainfall.Projects of roads, bridges, sewage networks, small and large dams, sewage treatment plants, stadiums, large buildings, etc., need to have rainwater drainage systems which, if not properly collected and forwarded may contribute to the deterioration or ruin of the building itself.The absence of proper plumbing for storm water runoff usually leads to heavy losses.The occurrence of heavy rain generates a pluviometric height whose value is above normal and this water slide can cause flooding, runoff and water erosion.
The IDF (intensity-duration-frequency) curves are the curves that result from joining the dots representative of the average intensity in intervals of different duration, all of them corresponding to the same frequency or return period.It is a graphic representation of how intense a precipitation is, depending on the duration of the rain episode and how likely it is to exceed this rainfall episode.The IDF curves of a given zone or city allow predicting the maximum intensity of precipitation over a certain period of time (usually in minutes) and at a given frequency, which is usually called the rain return period, and which may be 2, 5, 10, 20, 50, 100 or 200 years.
With these curves established, (and knowing the mathematical expression with which they are designed) engineering projects can be designed for a determined maximum intensity of rainfall, for a return period that can be of 10, 20 or 50 more years, depending on the importance of the project.
In Angola, a vast country with an area of 1,246,700 km 2 , there are not many studies of this kind.The only known IDF curves are those of the capital city of Luanda, studied in colonial times, by Lobo de Azevedo [1], from the city of Namibe in the south of the country, and from three homogeneous rain regions in northern Angola-Cabinda, Soyo and M'Banza Congo, the latter site recently integrated into UNESCO's World Heritage List.These most recent studies were conducted by Awadallah A. G. [2] and [3].The present work presents for the first time the IDF curves of the city of Benguela, which is the second largest city in Angola, and the neighboring city of Lobito, where the second largest seaport of the country is located, and the biggest country's oil refinery is under construction.
The question of the frequency of heavy rainfall has been studied in almost all the different regions in countries that have rainfall records and for different types of climate.Bell [4] and Chen [5] developed IDF formulas for some regions of the United States.In zones with monsoons, Nhat et al. [6] presented the IDF equations in Vietnam.In arid regions, several authors presented their studies and their IDF equations (see [7]- [12]).
For sites without ground pluviometers, several authors also presented solutions using several techniques.Raiford et al. [13] updated the IDF curves of the eastern zone of the United States using new frequency analysis techniques with isohyetsmaps.Also El-Sayed [11] developed a set of regional IDF curves for zones without measuring devices, using isohyetsmaps in the Sinai Peninsula in northeastern Egypt.In Malaysia the application of IDF curves is extended to remote locations and without pluviometers using the meteorological records of the nearest stations with an adjustment within the typical interval of the zone-Liew et al. [14].
More recently, several studies have used data from the TRMM (Tropical Rainfall Measuring Mission) mission to analyze precipitation in areas with scarce recordings of surface gauges.Nicholson et al. [15], together with several meteorologists from 11 African countries, compared the data collected by this satellite with the data collected in situ by their national meteorological services during the year 1998.With this analysis it was possible to correct the satellite algorithm, in order to make the supplied data more realistic.Angola did not participate in this study group.
The TRMM satellite data were used by Naumann G. et al. [16] to analyze the doubts in the calculation of the Standardized Precipitation Index (SPI) and its impact on confidence levels in drought monitoring in four rivers in Africa, Limpopo, Niger, Oumer-Rbia and West Nile.They concluded that TRMM data, compared to GPCC data, were preferable because of higher spatial resolution.
Several other authors have already tested TRMM data in Africa [17] [18] [19] [20] [21] demonstrating their suitability for scientific use in countries that do not have records of collected data in the field in sufficient quantity to make valid studies.
This study presents a method of calculating the IDF equations using old terrestrial data and joining them with the most recent TRMM data in order to establish reliable IDF curves in Benguela (12˚58'S and 13˚40'E; altitude 5 m) and Lobito (12˚35'S and 13˚55'E; altitude 7 m), in Angola whose location can be seen in Figure 1.
These two cities are only separated by 30 km.However the city of Lobito and its inhabitants have been suffering more with floods caused by heavy rains.In 2015, large floods caused by the rainfall caused the collapse of homes and the deaths of dozens of people.There is a longstanding perception in the population that, in the city of Lobito, des pite the geographical proximity to Benguela, rainfall is more pronounced.This study intends to study this hypothesis and, at the same time, to prepare IDF curves that will serve for future projects.

Precipitation Data
For this study historical precipitation records and recent satellite data were used.Open Journal of Modern Hydrology Elements" yearbook, found online on the NOAA website (US National Oceanic and Atmospheric Agency) [22], which, in 2017, maintains photocopies of the original reports.
For the most recent data (2000 to 2015), the TRMM mission satellite was chosen, despite the lack of accuracy in Africa, this satellite is the one that has presented more reliable results.See on this subject [15], [16], and [21].Rainfall data obtained by this satellite are available to the public on NASA's GIOVANNI website (https://giovanni.gsfc.nasa.gov/giovanni).The TRMM 3B43 v7 missions were used for monthly data and TRMM 3B42 v7 for the daily data, both with a spatial resolution of 0.25˚.The data were collected in 2016 and covered by option the first 15 years of this century.Table 1 shows the maximum daily and annual rainfall values for each year of this study.
Figures 2-5 show a graphical representation of the annual precipitation in the two cities, as well as trend lines of each era.A division was made between the historical data (from the 1940s) and the recent satellite data, in order to be able to observe the general averages and the statistical behavior of the two samples.
For the city of Lobito, we can observe that in the 1940s, although the annual values were generally smaller, the tendency was for gradual increase of rainfall (see Figure 2).Only in 1944 rainfall exceeded 400 mm/year.In the first 15 years     of this century (Figure 3), the trend is for annual rainfall to decrease, although the precipitation exceeds 400 mm/year in 5 years.In the city of Benguela, we almost observe the opposite trend.In the 1940s the annual total rainfall trend was decreasing, and there were two years in which rainfall exceeded 400 mm (Figure 4).In this century we can say that it maintains an almost constant level, around 230 mm/year (Figure 5).
The analysis of the descriptive statistics of these four groups of annual rainfall data, which can be seen in Table 2, shows that in the city of Lobito in the last 15 years it has rained more than in Benguela, unlike what happened 50 years ago.
On the other hand, in these 50 years, there was a decrease of the average annual rainfall in Benguela and there was a substantial increase of the average annual rainfall in the city of Lobito.

Statistical Validation of Data Samples
However, the IDF curves are not done with the annual data but with the daily highs in each year.The frequency analysis of precipitation data records is affected by the number of records for each station.Therefore, it is necessary to perform some statistical tests on the records available to provide a greater confidence level to the results.As the data sets for each of the stations relate to different years spaced more than 50 years apart, no correlation between the daily data of the terrestrial stations and those of the TRMM was analyzed.However TRMM data should be tested to see if they can be combined with the data collected on the ground.
Several tests are available to test the homogeneity of means and variances of the four sets of data.To test the homogeneity of the means, non-parametric tests were used.In general, non-parametric tests assume no data distribution, reduce the effects of "out of series" and heterogeneity of variance and do not imply confidence intervals.In addition to being known that the rainfall data sets do not present a normal distribution, the box-plot plots of the data reveal the non-symmetry of the samples.Thus non-parametric tests are the most appropriate.In Figure 6 and Figure 7 the box-plot graphs of the initial data are shown.In this paper, the same approach presented in [2] was followed, using Mann-Whitney [23], Wilcoxon Rank-Sum [24], Kolmogorov-Smirnov [25] and Kruskal-Wallis [26] tests to verify the homogeneity of the two averages and the Levene test [27] to verify the homogeneity of the two variances.
Table 3 presents the results of the statistical tests carried out, where it can be seen that all of them confirm that there is no statistical evidence at a level of 5%   of significance-that there is a significant difference in the average of the two samples.
For the verification of variance homogeneity, the Levene test was used to evaluate the equality of the variance of different samples when the distribution is not normal.There are three versions of the Levene test, namely; using the means of the samples; using the medians instead of the means; and using the modified means (trimmed by 10%, that is, by averaging the truncated values of the highest 10%).These three options to take the Levene test determine its validity.One should choose the right option to prevent the test from detecting false inequalities in the sample variance when the data is not normally distributed.
The original Levene test only used the mean of the data.Brown and Forsythe [28] extended the Levene test to use it with the median or the modified or trimmed mean (10%).They found that using the modified mean the test gave better results when the data followed a Cauchy distribution (i.e., with long tails) and using the medians the best results were obtained when the data had an asymmetric distribution.
The mean (initial test) better results are obtained for symmetric distributions with moderate tails.Despite the choice of the type of which Levenetest to use depends on the distribution of the data, the test done with the medians is taken as the choice that gives more robust results in many types of non-normal distributions.Since the distributions of rainfall data are never normal, the medians were used to make the Levene test to check the homogenization of the variances.
The results are shown in Table 4.
These results were achieved by applying an ANOVA table to the absolute residual values of each sample using Expression (1); where  i x is the median of the ith subgroup.
This test demonstrated the homogeneity of variance which means that the population from which the data were collected can be considered equal.Once

Choice of the Theoretical Distribution That Best Fits
The analysis of precipitation frequencies to perform extreme precipitation curves is part of the study of the theoretical statistical distribution that best suits the study site.Several statistical distributions have been used for the elaboration of IDF curves in Africa.In this study, and given that the cities of Benguela and Lobito are located at a latitude for which no studies of this type are known, the distributions pointed out by Chow [29] and widely used were chosen such as the Type I Extreme Values (EVI), according to Gumbel [30], Gamma, according to Abramowitz and Stegun, [31], Pearson Type III, according to Foster [32] and Log-Pearson Type III according to Benson [33].The adjustment of these distributions to the data can be seen in Figure 8 and Figure 10 (for Benguela) and But to choose carefully which of the distributions should be used to make the IDF curves, tests were performed such as the Chi-square Goodness of Fit, according to Pearson [34] and Kolmogorov-Smirnov [35], which tests a sample which comes from a continuous distribution with specific parameters, against an alternative that does not come from the same distribution.
The Chi-square test compares the observed data with the empirical distribution, which in this case was that of Cunnane [36], with the values expected to be obtained with the theoretical distribution chosen.By making this comparison with the probabilities of each distribution, based on precipitation data recorded, with the same expected period given by the distributions studied, and assuming a significance level of α = 0.05, the results obtained that can be seen in Table 5.
The distribution with a smaller value of χ 2 will be the one that best fits the data of each locality.It is verified that for the city of Benguela the curve that most adapts is that of Log-Pearson T-III, because it presents χ 2 = 6.92, whereas for the Taking the Kolmogorov-Smirnov test and analyzing the samples two by two, one of which is always the result of the observation and the other the distribution under analysis, it is concluded that, for the city of Benguela, the distribution that best adjust is that of Gumbel, since D max = 0.126 is the smallest of all values compared to that location.As for the city of Lobito, the analysis of D max values leads to the conclusion that both Gamma distribution and Log-Pearson T-III can be adopted to predict heavy rains, since they are the lowest values and of the same order of magnitude.The results of these two tests are presented in Table 5.
Analyzing the results of these two tests and the visual appearance of the adjustment curves of Figures 8-11 Once the distribution that most closely approximates reality has been carefully selected, a forecast can be made of the maximum rainfall that can occur for the most usual return periods (T) for this type of study.In this case, the return periods of 2, 5, 10, 25, 50, 100, 200, and 500 years were chosen.The results of the study for the two cities with two different theoretical distributions, clearly show that the precipitation forecasts for the two cities are very similar, as can be seen in Table 6, where the precipitation values expected for the periods of return (T) under study can be seen.
The descriptive statistics of the two data sets, for the same return periods, show that these two cities, which are distanced of 30 km and are practically at sea level, show homogeneity of results and therefore are in the same rainfall region.Thus it would not make sense to make separate IDF curves for each city, and it was chosen to produce only one IDF equation and its graph.To do this, the averages of the values obtained by the theoretical distributions for each return period were used, which can also be seen in Table 6.

Determination of the Parameters of the IDF Equations
The maximum values for 24 h of rainfall are not necessarily equal to the daily maximum values, but with these records it is possible to study the maximum precipitation distribution with duration of 24 h.In order to obtain the values of maximum precipitation with a duration of 24 hours, it is current to affect the daily values with a coefficient superior to the unit: P max × 24 h = C × P day .The value C = 1.13 was taken as indicated by World Meteorological Organization in 2009.
According to Bell [4] and Chen [5], in the absence of short rainfall records it is necessary to assume precipitation values for rainfall duration less than one day.Bell [4] proposed the use of ratios between the intensity of daily rainfall and those of shorter duration.These ratios were then used by the USA Soil Conservation Service in 1986 for the design of their rainfall frequency curves and have since been used to calculate rainfall of less than one day duration.Table 7 presents the Bell's ratios used.Open Journal of Modern Hydrology Thus, from the mean values of Table 6, multiplied by 1.13 and after applying the ratios of Table 7 to periods of less than one day, through the potential regressions for each return period, the IDF Expression (2) was obtained for the cities of Benguela and Lobito in Angola, where I is the precipitation intensity in mm, T is the return period in years and t is the duration of the rainfall in minutes.(1) and in Figure 12 the graph of the IDF curves for these cities is presented.

Conclusions
Hydrologists and civil and environmental engineers need the IDF curves for various types of engineering projects involving surface runoff caused by precipitation.Due to the lack of rainfall records, many African countries do not have studies of extreme rainfall forecasts that allow them to carry out engineering designs capable of withstanding these events and, as a consequence, many recent works are prematurely damaged.
This study allowed to evaluate and compare precipitation in two relatively close cities in Angola and to verify that, despite of in the city of Lobito in the first 15 years of this century, the rainfall has been more frequent than in Benguela; this two cities have the same IDF curves.Both are located in a coastal zone of Angola that lies on the desertification frontier, at the entrance of the Namib Desert, which extends to South Africa.However, in terms of extreme events, it has been proved that the two cities have very similar rainfall extremes, creating a pluvially homogeneous region, which may have only a single IDF equation and respective curves.
Despite the importance of this knowledge for these two cities, these curves are valid only for them because from the coast, and as it can be seen in Figure 1, the topography of the terrain rises a lot until reaching the center plateau of Angola that has an average elevation of 1600 m.
However, the methodology is valid for rainfall studies in other cities of the country, and even in neighboring countries.This study will resume in order to extend this knowledge to the main sites of the Benguela, and Huambo provinces, which share hydrographic basins of important rivers where a better use of watercourses shall be implemented for hydroelectric production.

Figure 1 .
Figure 1.Location of the Angolan cities of Benguela and Lobito (source: author).

Figure 6 .
Figure 6.Box-plot graph of maximum daily precipitations with TRMM data.

Figure 7 .
Figure 7. Box-plot graph of maximum daily precipitations with gauge data.

Figure 9 and
Figure 9 and Figure 11 (for Lobito) where the probabilities of exceedance are presented with linear and logarithmic vertical scales.Apparently, the Gamma distribution is farther away from the data, and Pearson Type III and Log-Pearson Type III are the ones that best fit the data of the two cities.Relative to the city of Lobito the curves of Gumbel and Person Type III are almost superposed, which does not happen in the city of Benguela.
, it was decided to forecast extreme rainfall in Benguela with the distribution of Gumbel and to the city of Lobito with the distribution of Pearson Type III.

Figure 8 .
Figure 8. Distributions fitting curves (logarithmic scale) for the city of Benguela.

Figure 9 .
Figure 9. Distributions fitting curves (logarithmic scale) for the city of Lobito.

Figure 10 .
Figure 10.Probability of exceedance curves for the analyzed distributions, for the city of Benguela.

Figure 11 .
Figure 11.Probability of exceedance curves for the analyzed distributions, for the city of Lobito.

Figure 12 .
Figure 12.IDF curves for the cities of Benguela and Lobito in Angola.

Table 1 .
Maximum daily rainfall values per year.

Table 2 .
Comparison of the descriptive statistics of the annual precipitation values in the two cities.

Table 3 .
Results of the statistical tests performed to verify the homogeneity of the mean.

Table 4 .
Results of the Levene test to verify the homogeneity of the variance.

Table 5 .
Results of the Chi-Square and Kolmogorov-Smirnov tests between each theoretical distribution and the collected data.

Table 6 .
Comparison of extreme precipitation forecasts for the two cities with different distributions.

Table 8
presents the precipitation intensities values deduced from Expression

Table 7 .
Bell's ratios for rainfall lasting less than one day.

Table 8 .
Precipitation intensity forecasts (mm) according to the duration of the rainfall and the period of return period.