Regionalization of Rainfall Intensity-Duration-Frequency (IDF) Curves in Botswana

A regional analysis of design storms, defined as the expected rainfall intensity for given storm duration and return period, is conducted to determine storm Rainfall Intensity-Duration-Frequency (IDF) relationships. The ultimate purpose was to determine IDF curves for homogeneous regions identified in Botswana. Three homogeneous regions were identified based on topographic and rainfall characteristics which were constructed with the K-Means Clustering algorithm. Using the mean annual rainfall and the 24 hr annual maximum rainfall as an indicator of rainfall intensity for each homogeneous region, IDF curves and maps of rainfall intensities of 1 to 24 hr and above durations were produced. The Gamma and Lognormal probability distribution functions were able to provide estimates of rainfall depths for low and medium return periods (up to 100 years) in any location in each homogeneous region of Botswana.


Introduction
The rainfall Intensity-Duration-Frequency (IDF) relationship is one of the most commonly used tools in water resources engineering.It can be used as an input in the planning, design and operation of water resources projects.It can also be used for models that are meant for flood protection and flood risk management of various engineering projects such as dams, roads, urban infrastructures, among others.A typical problem that is met in many developing countries is the non-existent or very sparse network of recording stations, whose data are the natural basis for the calculation IDF relationships.
As a solution to this problem, additional information from the denser network of nonrecording stations can be utilized.There is also the need for developing an appropriate methodology for incorporating data from non-recording stations.
Design storms, usually defined as the expected rainfall intensity for given storm duration and return period, are needed in many hydrological studies, and especially for providing an indirect estimation of the design flood.To this end, depth-duration-frequency curves are often employed.These curves allow to estimate the design storm, provided that historical rainfall extremes are available.When observed rainfall data are lacking, the estimation of the design storm may be conducted by using regional frequency analyses [1].
In this study, regional depth-duration-frequency relations for the estimation of rainfall extremes in Botswana are derived by combining simple rainfall depth-duration and depth-frequency relationships.The proposed formulation allows to estimate the expected rainfall depth for a duration ranging from 1 to 24 hours and for low and medium return periods (up to 100 years) in any location of the study area.A simulation experiment is developed in order to assess the reliability of the proposed formulation, whose performances are tested also in comparison with other regionalization approaches recently proposed by the scientific literature.
Design storm values are needed in many hydrological studies, and especially for providing an indirect estimation of the design flood.To this end, intensity-durationfrequency (IDF) curves are often employed.These curves allow one to estimate the design storm, provided that historical rainfall extremes are available using scaling and stochastic models [2] [3] [4].When observed at-site rainfall data are lacking, the estimation of the design storm may be conducted by using regional frequency analyses for infrastructure design [5] [6] [7].Intensity-Duration-Frequency (IDF) models have been tested for their application in infrastructure design as well as for climate projections under non-stationary condition as a result of climate change [8] [9].
In Botswana there are few continuous rainfall recording stations and more of daily non-recording stations with records of daily time interval that one may apply stochastic modeling of extreme daily rainfall series only.Botswana receives most of its rainfall from convective processes such as instability showers and thunderstorms.A detailed account of occurrence and distribution of rainfall in Botswana is available in [10].The incidence of rainfall in Botswana is thus highly variable from place to place, from time to time and sometimes in both space and time.The rainfall is generally well distributed but highly localized [10].Some exceptionally high intensity rainfalls are recorded at Gaborone, Francistown and Maun were the basis for developing rainfall IDF curves [11].
In this study we recommend the development of a regional parametric Intensity-Duration-Frequency (IDF) model for Botswana.The regional IDF model can be used for the estimation of design storm depths.The proposed formulation used in the model allows one to estimate the expected rainfall depth for a duration ranging from 5 minutes to 2 hours and beyond, and for low and medium return periods (up to 50 or more years) in any location in the country.
The IDF model is developed based on few optimized parameters and on record annual rainfall data only as an input.The use of the annual rainfall data, which is a measure of aridity in Botswana, is fairly reliable available information and its long-term distribution spatially in most regions over Botswana is fairly known, as documented in [12].The proposed model reproduced the observed IDF both qualitatively and quantitatively with high values of model efficiency criteria of the 1970 Nash-Sutcliffe approach.Given the lack of recorded continuous and short duration rainfall information, the model can be used to assist hydrologists in the design of drainage and other water control structures for ungauged sites in Botswana.The approach can be replicated in data scarce environments so that practicing drainage engineers can undertake complete analysis and design of drainage and flood control structures.Furthermore, and we demonstrate the regional application of the model in IDF estimation for each homogeneous region in Botswana.

Study Area and Data
Botswana is one of the countries in Southern Africa with an average rainfall of 300 -600 mm per annum at most part of the nation.The distribution of the mean annual rainfall and the stations used in the study including the respective record lengths is shown in Figure 1.
Precipitation data with daily time scales within Botswana were collected.Most of the Figure 1.Distribution of rain gauge stations used in the study and mean annual rainfall map of Botswana.The distribution of sites with respective record lengths is also shown.

No. Sites
records for the stations spans from 1950 to early 2010s which is adequate for rainfallintensity-duration (IDF) studies.We have considered a total of 76 stations in this study.
The raw daily rainfall data was processed and the 24 hr annual maximum rainfall (AMR) time series data as well as the mean annual rainfall (MAR) were extracted for all the stations covering the years of record.The characteristics of the raingauge sites is summarized in Table 1.

Regionalization of IDF
The determination of homogenous IDF regions was undertaken through clustering.The Fuzzy C-Means (FCM) clustering was employed for this purpose.The FCM algorithm is a modification of the K-means algorithm and minimizes intra-cluster variance [12] to include grouping of sites using the clustering algorithm.Factors that affect rainfall extremes and storm characteristics considered in the study and used for clustering are: elevation, longitude, latitude and mean annual rainfall.The algorithm assumes that the attributes are from a vector space and is targeted to achieve a minimized total intra-cluster variance function, D v is given as [13] [14]: where c k is the centroid point of all the points in cluster k; N the total number of clusters; S k the set of points in the k th cluster; x j the standardized vector for site j.
The FCM algorithm is initiated with an initial set of k groups and then it calculates the centroid point of each set.The next step is to construct a new partition by associating each point with the closest centroid.The centroids are recalculated for the new clusters and the algorithm is repeated by alternate application of these two steps until convergence [14] is achieved.As for K-means, this algorithm minimizes intra-cluster Table 1.List of ten of the 76 rain gauge sites and their characteristics.variance and this is obtained when the points no longer switch clusters.In contrast to the K-means algorithm, which assigns each site to only one cluster, partial membership is permitted in FCM.This entails that each point has a degree of membership in each of the clusters.Thus points on the edge of a cluster may be in that cluster to a lesser degree than points in the center of a cluster.The degree of belonging of site i in the k th cluster is equal to the inverse of the distance of site i to the centroid of cluster as defined in [15]: where ( ) b i is the degree that the site i belongs to the k th cluster, ( ) tance of site i to the centroid of cluster k.Each station is assigned to the cluster with which it has the highest degree of membership.The coefficients are normalized to ensure that the sum of membership of one site of interest to all different clusters is unity or one [16], that is ( ) where ( ) k U i is the normalized coefficient of site i in the k th cluster.

IDF Curve Determination
The most important contribution of this study was to determine the IDF curves using a parsimonious modelling approach of relying on few model parameters of a robust frequency model to model maximum rainfall intensity, duration and frequency, and to construct the IDF curves.The IDF curves were constructed based on the method proposed in [17].It which envisioned developing an IDF curve applicable on any location within Botswana, that was subsequently able to reproduce the IDF charts of three locations presented in the previous data reported in [11] that was based on rainfall extreme records of [12].This model of IDF curves is developed in a manner that is applicable to any location within Botswana.In this study two types of empirical IDF equations that relate intensity and duration were considered.The first one was a simple relationship, which has the following form: In which R is the mean annual rainfall (mm/a); I is average rainfall intensity (mm/hr); t d is storm duration, time of concentration (minutes); n is return period (years); and a, b, c are constants that depend on the units employed.In Equation (1) the constants a, b and c do not depend on return period, however, the constants vary significantly with location and estimated for specific region.Given a rainfall intensity I o , the sum of the squared deviation (SSE) to minimum, we have Equation ( 1) and ( 2) are utilized to compute the required intensities for respective stations and durations.Formulas of the form given in Equation ( 1) above are used in the region [17] including Zambia and Zimbabwe and it has been widely used afterwards.

Construction of IDF Curves
The Intensity-Duration-Frequency (IDF) Curves were developed for each region.As shorter durations of storm intensities below 24 hr are of great importance for different drainage design and water resource management, then such as 0.5 hr, 1 hr, 3 hr, etc are selected for the time scale of IDF to construct the IDF curves for each homogeneous region for this study.
In a previous study three mass curve of storm profiles shown in Figure 2 were used for estimating rainfall intensities for durations less than 24 hr [17].These storm profiles were developed and tested to account for the regional differences and Profile II was found to be comparable with earlier records of [11].
In periods, the hydrological annual maximum daily (24 hr) values of rainfall intensity and duration are used for frequency analysis in this study.These rainfall intensities were fitted to four candidate probability distributions, namely Lognormal (LN), General Extreme Value (GEV), General Pareto (GP), Exponential (EXP) and 2-Parameter Gamma (G2).After model fit and performance evaluation, the best IDF model was selected as a regional IDF model, based on the comparison between the observed and model-estimated quantiles 24 hr rainfall rainfall as discussed in the next section.
The most important contribution of this study was to determine the IDF curves using a parsimonious modeling approach of relying on few model parameters of a robust frequency model to model storm frequency and construct the IDF curves.

IDF Quantile Estimation
The relationship between return period (T) and probability of rainfall quantile of nonexceedence probability (F) is expressed as: The value of F is the probability of an event having a magnitude of P T or less and the T-years magnitude.P T is determined from the parent probability distribution using the methods of moments, method of likelihood or method of L-Moments [22].Alternatively, in terms of frequency factors it can be determined for some of the probability distributions functions as: where KT is the frequency factor which is a function of return period and the parameter of the distribution, and μ and σ are the location and shape parameters of the probability distribution.Equivalently, the quantile can be estimated from Extreme Value Type I (EV1) probability distribution function on a linear plot on EV1 reduced variate (y T ) axis as follows: ln ln 1 where u and a are the parameters of EV1 distribution.This plot can be used as a simple curve to derive P T from y T once the parameters are estimated.Typical plots for three stations is presented in Figure 3.
The goodness-of-fit of the equality of the parent population probability distribution with the sample frequencies were investigated to test of descriptive ability of the candidate frequency models.For this the Kolmogorov-Smirnov (K-S) test was applied.Furhermore, the predictive ability of a candidate probability distribution needs to be established.The estimates of 24 hr quanitles of rainfall intensity can be evaluated using the standard error of the estimated quantile corresponding to a return period using the standard error of estimate, SE given by: ( ) Figure 3. Frequency plot of 24 hr rainfall quantiles on an EV1 reduced variate scale.
Standard error of estimate justifies error due to small sample, but it does not imply error due to inappropriate choice of distribution.The most efficient method of parameter estimation is the one which gives the least standard error of estimate.

Identified Homogenous IDF Regions in Botswana
There was a marked variability of the 24 hour annual maximum rainfall for the various stations used in the study.The annual rainfall is different in heterogeneous regions; hence the homogenous regions in this study are formed by identifying the Fuzzy C-Means (FCM) clusters in the space of site characteristics of the mean annual rainfall with latitude, longitude and elevation.There were a few stations which were initially classified into a cluster region that are geographically far away.These particular stations were subject to further scrutiny to associate them to the respective nearby cluster regions.It was found that the rainfall of Botswana is divided into three homogenous regions (Table 2, Table 3).
The homogenous regions which were formed through the identification of FCM clusters and the various stations contained in each cluster region are presented Table 2 and spatially in Figure 4.

Developed Regional IDF Curves
Time series of 24 hr annual maximum time series events in all 76 stations was extracted and then the IDF parameters for each station in each region were used to derive regional IDF curves.Storm intensities corresponding to the short-, medium-and longterm are of great importance for different flood drainage and water resource management strategies.Once this was completed the IDF curves for each homogeneous region was constructed for durations at steps below and above 24 hr.All of the distributions have adequately represented as the for the annual maximum 24 hr rainfall samples at all  stations.The goodness-of-fit of the equality of the parent population probability distribution with the sample frequencies have been investigated as a test of descriptive ability of the candidate frequency models.This test was conducted using the Kolmogorov-Smirnov (K-S) test at the 90% confidence level, and was found to be adequately for each homogeneous region.
A summary of parameters of the IDF curve for the region that provided lower SSE of estimates in the 24 hr rainfall depths is summarized in Table 4.
Further, investigation of the predictive ability tests were conducted using the standard error (Equation ( 7)) and results of the four frequency models in each region was determined.Among the four frequency models, the 2-Parameter Gamma distribution followed by Lognormal distribution has resulted comparatively high accuracy in model predictions of rainfall intensities in all regions, and these models could be adopted.
The quantile estimates were in a good agreement with the observed 24 hr rainfall intensity in terms of lower standard of error (SE) for the various return periods.For instance, a comparison among the 4 probability distributions in terms of quantiles of   rainfall intensity (P T ) and return periods for a station at Maun in Region 1 is shown in Table 5, where for station at Gaborone in Region 2 is shown in Table 6.Similarly, for station at Bokpits in Region 3 is shown in Table 7.
Table 5. Quantile estimates and their standard error of 24 hr rainfall intensity at Maun station (Region 1).

Rainfall IDF Characteristics
The quantile estimates of the 24 hr rainfall intensities for were in good agreement with the observed IDF in terms of lower standard of error (SE) for the various return periods.Figure 5 shows a comparison between observed and simulated 24 hr rainfall intensity based on the Lognormal frequency model.The corresponding Frequency plot of 24 hr rainfall quantiles on an EV1 reduced variate scale based on Equation ( 6) are illustrated in Figure 3.The performance of the IDF model was judged against a number of model performance indices, which gave very high model efficiency (R 2 ) that is well above 90%, and a correlation coefficient (r) close to unity as illustrated in Figure 5.
IDF curves for each homogeneous region are illustrated.The regional IDF curves developed for the region are illustrated for three typical stations of Maun, Gaborone, and Bokpits, representing Region 1, 2 and 3, respectively.These IDF curves are shown in Figure 6.The spatial distribution of the frequency model results of the IDF curves for 24 hr rainfall intensity is highlighted.Figure 7 portrays the regional distribution of 24 hr rainfall intensities for return periods of 5, 10, 25 and 50 years in Botswana.

Conclusions
We have developed homogenized and regionalized Intensity-Duration-Frequency (IDF) curves of Botswana using regional storm IDF modeling approach.It can be used

Figure 2 .
Figure 2. Three storm profiles tested in Botswana for distribution of 24-hour rainfall distribution.

Figure 4 .
Figure 4. Homogeneous regions for IDF curve regionalization of Botswana.
as a generalized IDF model being proposed to produce the IDF relationships in Botswana.Three homogeneous regions were identified, each representing the northern region (Region 1), region (Region 2) and south western region (Region 3) of Botswana.Each cluster region portrays the relative and distinct climatic regions in Botswana.Region 1 represents a typically relatively human region of Botswana with mean annual rainfall (MAR) of above 450 mm.Whereas, Region 2 presents semi-arid part of Botswana with a MAR of 300 -450.Region 3 represents the arid sub-region of Botswana with MAR of hardly 300 mm and below.The performance of the model was judged against a number of model performance indices, which gave very high model efficiency (R 2 ) [23] that is well above 90%, and a correlation coefficient (r) close to unity as illustrated in Figure 4.

Figure 5 .Figure 6 .
Figure 5.Comparison between observed and simulated 24 hr rainfall intensity based on the lognormal frequency model.

Figure 7 .
Figure 7. Spatial distribution regional IDF quantiles of 24 hr rainfall intensities for various return periods.

Table 2 .
The identified homogeneous regions and corresponding stations in each cluster.

Table 3 .
Details of final cluster centers and number of cases in each cluster.

Table 4 .
Parameters of the design storm model.

Table 6 .
Quantile estimates and their standard error of 24 hr rainfall intensity at Gaborone station (Region 2).

Table 7 .
Quantile estimates and their standard error of 24 hr rainfall intensity at Bokspits station (Region 3).