Sensitivity of Statistical Models for Extremes Rainfall Adjustment Regarding Data Size: Case of Ivory Coast

The objective of this study is to analyze the sensitivity of the statistical models regarding the size of samples. The study carried out in Ivory Coast is based on annual maximum daily rainfall data collected from 26 stations. The methodological approach is based on the statistical modeling of maximum daily rainfall. Adjustments were made on several sample sizes and several return periods (2, 5, 10, 20, 50 and 100 years). The main results have shown that the 30 years series (1931-1960; 1961-1990; 1991-2020) are better adjusted by the Gumbel (26.92% - 53.85%) and Inverse Gamma (26.92% - 46.15%). Con-cerning the 60-years series (1931-1990; 1961-2020), they are better adjusted by the Inverse Gamma (30.77%), Gamma (15.38% - 46.15%) and Gumbel (15.38% - 42.31%). The full chronicle 1931-2020 (90 years) presents a notable supremacy of 50% of Gumbel model over the Gamma (34.62%) and Gamma Inverse (15.38%) model. It is noted that the Gumbel is the most dominant model overall and more particularly in wet periods. The data for periods with normal and dry trends were better fitted by Gamma and Inverse Gamma.


Introduction
The extreme value theory was developed to estimate the probabilities of occur-rence of rare events [1]. It is a branch of statistics which is interested in the asymptotic characterization of the maxima or minima of a random variable. It establishes the limiting behavior of the probability law tails of random variables. When these behaviors through parameters have been estimated, it becomes possible to calculate the probability of a large amplitude event [2]. The asymptotic nature of these results, however, calls for caution in the conclusions insofar as we do not have an infinite number of data. The application of the theory of extreme values in the estimation of the recurrence of extreme rains provides essential elements for the construction of infrastructures such as dikes and sanitation work, in order to effectively protect the population and their property [3]. Statistical modeling of extreme values with maximum daily rainfall values is generally preferred to that of daily rainfall above a threshold, both by researchers and planners, because it is easier to apply and often statistically more effective [4]. Thus, several authors have used the same variable of annual maximum daily rains to model extreme rains [3]- [10]. Frequency analysis is the most widely used statistical approach to quantify rainfall hazards [3] [4] [6] [10] [11] [12] [13]. Several authors who have worked with series of annual maximum daily rainfall of different sizes have come to different conclusions. According to the first group [6] [8] [9] [10] [11] [14] with size series ranging from 47 to 81 years, the authors reached the same conclusion according to which Gumbel's law best adjusts the maximum annual rainfall. However, in a study on the long pluviometric series of Athens (136 years), [15] found that the Gumbel law is not adapted to the annual maxima of the series of 136 years, whereas it appeared appropriate if, for example, only the last 34 years are considered. Other authors such as [4] [16], etc. reached conclusions similar to that of [15]. According to [16], a single conclusion is difficult to establish, hence the need to test several laws for each given local station on different sizes in order to appreciate the sensitivity of the laws according to the size of the samples. The question that guides this research is the following: are the statistical laws of extremes used for the statistical modeling of the annual maximum daily rainfall used for the sizing of hydraulic structures from the project flow rate sensitive to the size of the samples of the series?
The objective of this work is therefore to analyze the sensitivity of the statistical laws of extremes according to the size of the samples of the data used.

Data
The data used to carry out this study come from the national meteorological measurement network of Côte d'Ivoire. The annual maximum daily rainfall data used covers the period 1931-2020 and comes from twenty-six (26) rainfall stations distributed throughout the country (Figure 3). They were made available to us by SODEXAM (Aeronautical, Airport and Meteorological Development and Exploitation Company). These stations have been classified in the main climatic zones of Ivory Coast (Table 1). The choice of stations was guided by the availability and quality of chronological data (fewer gaps with a threshold of 5%).

Statistical Modeling of Annual Maximum Daily Rainfall
The aim is to analyze the influence of the size of the data on the choice of the best statistical law that best represents the extreme rainfall data. This analysis will be done by varying the size of the data in order to assess the possible impacts of the size of the data series on the statistical laws. The sizes of the series considered are 30 years (1931-1960; 1961-1990; 1991-2020)  The methodological approach consisted first of verifying a certain number of statistical hypotheses in the context of the application of frequency analysis, such as independence, homogeneity, stationarity and independence of the data. Then, the different distribution laws chosen were adjusted to the annual maximum daily rainfall data over the different periods defined after the choice of the best classes of laws. Finally, an evaluation of the validity of the preselected models was carried out.

Assumptions of Frequency Analysis
The frequency analysis passes first of all by the appreciation of the quality of the series to have a distribution function by using the stationarity tests of Kendall These tests all operate on the same principle, which consists of stating a hypothesis on the mother population and checking on the observed observations whether they are plausible within the framework of this hypothesis. The hypothesis to be tested is called H 0 (null hypothesis) and is imperatively accompanied by its alternative hypothesis called H 1 . The test will focus on validating or reject-Journal of Water Resource and Protection ing H 0 (and consequently drawing the opposite conclusion for H 0 ). If the result of the test leads to accept the null hypothesis H 0 , then the probability that the distribution is random is high. On the other hand, the rejection of H 0 means that this distribution of the answers conceals information which does not seem to be random and that it is advisable to deepen the analysis. The test of independence of Wald-Wolfowitz is useful to verify in the observations, the existence of a sequential dependence which would lead, when this one is true, to define the type and the level of this one before continuing the study of the frequency process. For this, the assumptions are as follows: H 0 : the observations are independent; H 1 : the observations are dependent.

Choice and Estimation of Statistical Model Parameters
The choice of the various statistical models retained for the adjustment of the annual maximum daily rains is based on theoretical considerations and the recommendations of previous work in this area [ After checking the various hypotheses, the frequency analysis is carried out using several statistical tests (Jarque-Bera test, Log-Log graph, mean function of excess (FME), Hill's ratio and Jackson's statistic) [18]. Three main categories in which we can classify the ten most used distributions in hydrology, for the maximum values were distinguished by the decision support system (SAD) of the Hyfran tool (Table 3) Class E (exponential law).
The parameters u, α, k respectively denote the position, scale and shape parameters of the different laws. The position parameter u characterizes the order of magnitude of the series of extreme rains. The shape parameter k indicates the behavior of the extremes or the shape of the distribution. According to the sign   k = 0, light tail law (or Gumbel distribution);  k < 0, heavy-tailed law (or Fréchet distribution);  k > 0, bounded tail law (or Weibull distribution).
The parameters μ and σ respectively denote the mean and the standard deviation of the lognormal distribution. These parameters were determined by the method of weighted moments.

Calculation of Empirical Frequencies
The determination of the experimental frequencies is based on the critical and Journal of Water Resource and Protection  [14]. After a ranking in ascending order of a sample of maximum rainfall of size n, the expression of the empirical or experimental frequency of non-exceedance of Hazen for a value x of rank i is written (Equation (1)): With n the size of the sample considered. in [9].

Validation of the Statistical Model
The procedure of the applied chi-square test is as follows. Let be a sample of n values classified in ascending (or descending) order and for which a distribution law F(x) has been determined; this sample is divided into a certain number k of classes each containing ni experimental values. The number v i is the theoretical number of values assigned to class i by the distribution law. This number υ i is given by (Equation (2)): f(x) being the probability density function corresponding to the theoretical law.
The expression of experimental χ 2 is presented as follows (Equation (3)): The probability of overshoot corresponding to the number of degrees of freedom λ is thus determined (with λ = k − 1 − n, n being the number of parameters of the law F(x)). If this probability is greater than 0.05, the fit is satisfactory.
Otherwise, the law is rejected.
The selection of the statistical distribution best fitted to the samples was made The expression of the Bayesian information criterion (BIC) is presented as follows [9] [14] (Equation (5)): Or: L: the likelihood; k: the number of parameters.

Characterization of Return Periods
The best law identified which best adjusts the extreme rains was applied to the daily rainfall heights to characterize the return periods of extreme rainfall events in order to verify whether the rainy episodes, sources of flooding identified could be qualified as extreme events or no. According to [10], the return period, or return time, characterizes the statistical time between two occurrences of a natural event of a given intensity. This term is widely used to characterize natural hazards. The calculation of the frequency of occurrence of extreme rains provides interesting indications for management managers (Equation (6)). where: T: return period (year); F: Frequency of non-exceeding.
A rainy event is qualified as very exceptional if its return period is more than 100 years; exceptional if the return period is between 30 and 100 years; very abnormal if the return period is between 10 to 30 years; abnormal if the return period is between 6 to 10 years and normal if the return period is less than 6 years [10].

R.
A.-K. Nassa et al.  (17) at the 5% threshold and seven (7) at the 1% threshold. For the stationarity test, 92% of the stations validated it, including thirteen (13) at the 5% threshold and eleven (11) at the 1% threshold. As for the homogeneity test, it is validated at 96% of stations, including sixteen (16) at the 5% threshold and nine (9) at the 1% threshold. In sum, 81% of the stations verify all the hypothesis tests. Thus, it is therefore possible to proceed with the frequency analysis.

Identification of the Best Classes of Laws
The summary of the different selected classes is presented in Table 4. The best classes for the 30-year series are classes C with a percentage of appearance of 57.47% and E with a percentage of appearance of 42.53%. Class C appears twenty-three (23) times alone as the best class against four (4) times for class E. For the 60-year-old heats, the best classes are classes C, D and E. Class C has an onset percentage of 46.84%, class E has a percentage of 29.57% and class D has 23.59%. Class C appears (2) times alone as the best class against (4) times for each of the E and D classes. As for the 1931-2020 series, it presents the D class (sub-exponential distribution) in addition to the C classes and E as the best classes. 42.86% of the appearance of class C is observed against 35, 71% for class E and 21.43% for class D. Classes C, D and E appear alone as best classes, respectively, two (2), six (6) and two (2) times. The lognormal law by means of the Jarque-Bera test is inapplicable on all the data.

Graphical Analysis of Adjustments
The identification of the best laws goes through the adjustment of the laws resulting from the different selected distribution classes. The preliminary

Numerical Analysis of Adjustments
After the adjustments, the numerical chi-square test of the adequacy of these was applied to better assess their relative quality. The application of this test proved to be conclusive for all the adjustment laws on the annual maximum daily rainfall at the significance level of 5%. The classification of the best laws on the basis of the AIC and BIC criteria has been carried out.    Indeed, over all the 30-year series, we notice that the laws of Gumbel and Gamma Inverse are those which fit best. Also, the closer we get to recent periods, the more the representativeness of Gumbel's law decreases and that of the Inverse Gamma law increases. We therefore observe a predominance over the last two 30-year periods (1961-1990 and 1991-2020) of the Gumbel and Gamma Inverse laws. It is noted an instability of the three best laws at the level of series of normal size (n = 30 years). The OMM historical normal, 1931-1960 tends to fit well by Gumbel's law, the 1961-1990 past normal used is best fitted by the Inverse Gamma law followed by Gumbel's law.
Regarding the 60 years series, they present the following observations ( Figure  8 Over the periods of the 60-year series, the laws of Gumbel, Gamma Inverse and Gamma are those which fit best (Figure 8). A certain stability of the Inverse Gamma law is observed. It is also noted that the closer we get to recent data, the more the representativeness of Gumbel's law decreases and that of the Gamma law increases. Here, we notice an inverse evolution of the laws of Gumbel and Gamma. From the age of 60, the first best law remains either Gumbel's law (EV1) or the Gamma law (G2) and the Inverse Gamma law (IG) remains the second-best law of the 60-year series. This marks a certain stability of the laws for the series of average size (n = 60 years).
As for the entire series, the Inverse Gamma law is adjusted on four (4) stations, i.e. 15.38%, the Gamma law on nine (9) stations, i.e. 34.62% and Gumbel's law on thirteen (13) stations, i.e. 50% (Figure 9). Indeed, the complete chronicle of 1931-2020 presents a notable supremacy of the law of extreme values of type 1 (Gumbel) over the Gamma and Gamma Inverse laws. The laws of Weibull, Frechet, Exponential, Log pearson type 3 and Pearson type 3 have a low representativeness.

Discussion
The main results showed that the 30 years series are better adjusted by the The sensitivity of the statistical laws applied to extreme rains has shown that all the probability distributions of the annual maximum daily rains in Ivory Coast could not only be assimilated to a single law regardless of the size and climatic context of the data series. The use of the DSS tool (decision support system) for frequency analysis has revealed certain laws such as the Gamma and Gamma Inverse laws which are rarely used for the adjustment of rainfall maxima. These results reflect the sensitivity of statistical models for adjusting extreme values to the size of the data samples and to the climatic context. Several authors who have worked with series of annual maximum daily rainfall of different sizes have reached conflicting conclusions. According to the first group, for large series (47 -81 years), the authors concluded that Gumbel's law is predominant over other laws (lognormal, Fréchet, Weibull, GEV, etc.). However, according to the second group, the Gumbel law is not predominant over the other laws for large series.
In the first case, several studies have been carried out. Thus, the work of [6] [14] based on the frequency analysis of maximum daily annual rains on the one hand of 34 rainfall stations in Ivory Coast covering the period 1947-1995 (49 years) and on the other hand on 47 Ivorian rainfall stations with annual maximum rainfall data covering the period 1947-1993 (47 years), came to the same conclusion that Gumbel's law and lognormal law best adjust the annual maximum rainfall. According to the work of [11], carried out in Ivory Coast on the maximum daily annual rains during the period 1942-2002 (61 years), the best laws retained are respectively Gumbel's law (34.1%), Fréchet's law (29.5%), Lognormale law (22.7%) and the Weibull law (13.6%). According to the study by [10] from annual maximum daily rainfall data cover the period from 1961 to 2014 (54 years) on the Abidjan station (Port-Bouët), the law which best fits these data is Gumbel's law. The results of the work of [9] carried out using data from 35 stations covering the period from 1921 to 2001 (81 years) overall, showed a predominance of the Gumbel (51.43%) and Lognormale (28, 57%). The work of [10] carried out in Benin (Sota basin) from daily rainfall data ranging from 1965 to 2008 (44 years) from eight (8) stations showed that Gumbel's law and the log-Pearson Type law III are the predominant laws.
In the second case, several studies have also been carried out. According to [4], a frequency analysis of annual series of maximum daily rainfall was carried out on data from 27 rainfall stations from the period 1970 and 2005 (35 years) of the Chott Chergui basin (Algeria). The GEV law has shown a good adequacy to the series of maximum daily rains in the Chott Chergui basin (Algeria). [16] came to the same conclusion as Habibi et al. (2013) regarding the supremacy of the GEV law. In a study on the long pluviometric series of Athens (136 years), [15] found that the Gumbel law is not adapted to the annual maxima of the series of 136 years, whereas it seemed appropriate if for example, only the last 34 years are considered. The results of the work of [3] in the Cheliff region have shown from a comparative analysis between the GEV and Gumbel methods based on four data samples spread over a period of between 21 and 30 years, the GEV methods and Gumbel provided similar results.
The results obtained during this work are more in agreement with the results of the first group. Beyond the size, the difference in results could be due to the climatic context. These results also raise the debate on the skepticism of the predominance of the model of Gumbel at the level of the estimate of the annual maximum daily rains. It can be said that the Gumbel law is therefore not always the best model in the adjustment of extreme rains.

Conclusions
The objective of this study is to analyze the sensitivity of the statistical laws of extremes as a function of the size of the data samples. The methodological approach is based on the statistical modeling of annual maximum daily rainfall. The adjustments were made on several sample sizes, namely samples of 30 years (1931-1960; 1961-1990; 1991-2020), 60 years (1931-1990; 1961-2020) and 90 years (1931-2020). Several return periods (2, 5, 10, 20, 50 and 100 years) were retained.
The results of the tests prior to the frequency analysis indicated that the hypotheses of application to the frequency analysis were verified on almost all the series. Then, the series of annual maximum daily rains are constituted by independent, homogeneous and stationary values. Indeed, the independence test is validated by 92% of the stations, including seventeen (17) at the 5% threshold and seven (7) at the 1% threshold. For the stationarity test, 92% of the stations validated it, including thirteen (13) at the 5% threshold and eleven (11) at the 1% threshold. As for the homogeneity test, it is validated at 96% of the stations, including sixteen (16) at the 5% threshold and nine (9) at the 1% threshold. In sum, 81% of the stations verify all the hypothesis tests. Thus, it is therefore