Fitting a Probability Distribution to Extreme Precipitation for a Limited Mountain Area in Vietnam

In this paper, an analysis of adapted 20 extreme precipitation indices is calculated for a limited mountain area in southern Vietnam. The daily precipitation data from four stations in the period of more than 30 years are selected. The statistical characteristics of maximum, minimum, mean, standard deviation, skewness, and kurtoris for each index are also analysed. A variety of distributions such as Normal, Lognormal, Beta, Gamma, Exponential, Loglogistic, and Johnson is used to find the best fit probability distribution for this area on the basic of the highest score. The scores are estimated based on the ranking of statistical goodness of fit test. The goodness of fit tests is the Anderson-Darling and Shapiro-Wilks tests. The best fit distribution for each index of extreme precipitation at each station is found out. Results revealed that the Johnson distribution is the best fit distribution to the data of very heavy precipitation days greater than 50 mm. Over a limited mountain area, it is difficult to fit a probability distribution to the precipitation fraction due to extremely wet days, number of extremely wet days, and number of extremely wet days when precipitation greater than 99 percentage. The lognormal, Johnson, and Loglogistic distribution are the best choices to fit most of the extreme precipitation indices over this area.

KEYWORDS

1. Introduction

In the field of climate research, precipitation is considered to be one of the key terms for balancing the energy budget, and one of the most challenging aspects of climate modeling, especially for convection precipitation parameterization schemes. Therefore, high quality estimations of precipitation distribution, amounts and intensity are important to fully interpret the climate regime and the effects of climate on other fields (e.g., water and agriculture) at a variety of scales from global to local. For daily precipitation, a Markow chain was firstly suggested for the representation of the sequence of wet and dry days [1] . Later, this approach has been broadly used to model for the occurrences of wet and dry days (e.g., [2] [3] ). As a part of the Markow chain, a chain dependent process was also proposed to compute the distribution of the maximum amount of daily precipitation and the total amount of precipitation on the example of State College, Pennsylvania [4] . In the Markow process, an assumption of precipitation state on the next day is related to the state of precipitation on the numbers of previous days. On a wet day as a defined precipitation amount is commonly generated using the distributions of Gamma [5] [6] [7] [8] , an exponential and a mixed Exponential distribution [9] [10] [11] , a skewed Normal [12] , and a truncated power of Normal [13] [14] . The first-order Markov chain-dependent exponential, gamma, mixed-exponential, and lognormal distributions also applied to model the daily precipitation for the observed data from 10 stations in the Yishu River [15] . Reviews on the modeling of precipitation can be found in [9] [16] [17] [18] [19] .

Wilks [20] used the daily precipitation data from 30 locations across the USA to investigate the effects of different formulations for the occurrence- and amounts-components of stochastic daily precipitation. The study showed that a mixed exponential distribution offered an improvement only in case the extremes were not very high (less than 100 mm). The extreme daily precipitation amounts were underestimated using the Gamma model in comparison to a mixed exponential model.

Semenow [21] used the weather data from 20 locations at Baker Lake, Canada, to desert at Boise, USA to investigate how well weather extreme events were simulated by weather generators. Annual maximum daily precipitation was modeled using the generalized extreme value distribution and computing confidence intervals. The results showed a good simulation for extreme events in the range of the observations with diverse climates.

Olofintoye [22] used a variety of distribution (i.e., Gumbel, Log-Gumbel, Normal, Log-Normal, Pearson and Log-Pearson distributions) to study the peak daily precipitation in Nigeria. The weather data from 20 stations were taken into account with a period of 54 years (1952-2005). The tests of chi-square, Fisher, correlation coefficient, and coefficient of determination were used to define how best the fits are. The results showed that the best performance of log-Pearson type III distribution, followed by the Pearson type III, and log-Gumbel with a percentage of the considered station 50%, 40%, and 10% respectively.

Su [23] performed a simulation of extreme precipitation using Wakeby distribution. Data from 147 stations in the Yangtze River Basin were selected. To quantify the high and low extreme precipitation characteristics, the distribution of probability were applied (i.e., Gerneral Extreme Value, General Pareto, General Logistic, and Wakeby). Results showed that the Wakeby distribution well simulated the probability distribution of extreme precipitation, both in observed and projected data.

In most of the hydrological models, a long series of precipitation data is significantly of great importance to assess the effects of precipitation regime on water resources, environmental and agricultural planning fields. A temporally and spatially continuous distribution of precipitation can robust the quality of computing results. However, it is quite difficult to find enough evenly spread weather variables like precipitation to cover the entire nation in general and the area of interest in particular, both in space and time resolution. The reason for this is the limitations of observation system (e.g., density of precipitation measuring network). Furthermore, insufficient terrestrial meteorological observations are considered to be the most important source of uncertainty in the different studies (e.g., [7] ). To overcome this problem, a stochastic weather generator (e.g., [24] ) is applied to produce synthetic time series of weather data of unlimited length based on the statistical characteristics of observed weather data for a given location [25] .

Located on the eastern margin of the Indochinese peninsula, Vietnam’s climate is strongly dominated by a typical climate of tropical monsoon of a peninsula in the Southeast of the European-Asian continent. The geographical characteristics of Vietnam is a long coastline of 3260 kilometers, series of mountain in the direction from northwest to southeast (in the northeast mountain regions), from west to east (in the central regions), central highland, and a limited low mountain area in the Southeast Vietnam. Besides that, with a long and narrow shape, the climate of Vietnam is significantly complex from place to place and time to time. Under a global warming, change in extreme weather events (e.g., heavy precipitation) is unevenly, especially for a limited mountain area in southern Vietnam in which the weather patterns are dominated by a variety of natural conditions (e.g., oceanic air mass and elevation). Thus, it is a cruel task to analyse and define the variability in extreme precipitation characteristics as well as its distribution to get significant information for a limited area. This information will be very useful for different objects, especially in the studies in flood events, urban flooding, flash flood, or environmental planning fields.

As pointed out by the Expert Team of the World Meteorological Organization and Climate Variability and Predictability on Climate Change Detection and Indices, 11 [26] [27] and later on updated to 15 extreme precipitation indices are suggested in the climate change projects. In this study, however, adapted 20 extreme precipitation indices are analysed for a limited mountain area located in southern Vietnam. The daily precipitation at four precipitation measuring stations has been selected to find a fitted distribution of probability to mountain extreme precipitation for this area.

2. Data and Methods

2.1. Data

A limited mountain area of southern Vietnam is selected to investigate extreme precipitation indices (Figure 1). Daily precipitation data has been obtained from the Vietnam Hydro Metorological Data Center (Table 1). The records from four precipitation measuring stations cover a period of more than 30 years (1981 to 2013). A tropical monsoon climate pattern with two distinguishable seasons (i.e., rainy season and dry season) dominates the climate regime over this area. Rainy season prolongs from May to November and dry season lasts from December to April. The rainy seasonal precipitation contributes about 93% of the annual precipitation (2000 mm).

Table 2 shows a list of extreme precipitation indices. All characteristics of precipitation are aggregated from the time scale of daily. A wet day is defined as daily precipitation greater than or equal to 0.5 mm.

2.2. Methods

In this study, the distribution of probability included normal, lognormal, gamma, loglogistic, beta, Johnson, and exponential were taken into account to define the best fit probability distribution for extreme precipitation. The description of various probability distribution functions and density functions, ranges and the parameters involved are shown in Table 3.

Figure 1. Topography map of the study area.

Table 1. Lists of the precipitation measuring stations.

Table 2. Extreme precipitation indices.

Table 3. Description of various probability distribution functions.

The goodness of fit test refers to measuring how well do the compatibility of random sample with the theoretical probability distribution. A goodness of fit statistic tests is applied for testing the following null hypothesis:

H0: the model of extreme precipitation parameter fits the specified distribution

HA: the model of extreme precipitation parameter does not fit the specified distribution.

The Anderson-Darling and Shapiro-Wilks tests were used to identify if a sample comes from a population with a specific distribution. The chi-square test at α (0.05) level of significance for the selection of the best fit probability distribution was applied. Several studies related to these tests can be found in [28] [29] [30] , or [31] . According to Anderson and Darling [28] , a Anderson-Darling statistic (AD) is defined as follows:

(1)

where and is the ordered sample

This test allows comparing the fit of an observed cumulative distribution function to an expected cumulative distribution function.

The Shapiro-Wilks (SW) test, suggested in [29] , calculates a SW statistic. It is defined as follows:

(2)

where

are the ordered sample values (is the smallest).

ai are constants generated from the means, variances and covariances of the order statistics of a sample of size n from a normal distribution.

(3)

3. Results

Statistical characteristics of extreme precipitation indices for a limited mountain area in Vietnam are shown in Supplementary (from Tables 5-8). They are mean, standard deviation, skewness, kurtoris, maximum, and minimum values. It was observed that the maximum and minimum of very heavy precipitation days and extreme heavy precipitation can be reached sixteen and thirteen days in a year at Long Thanh station, respectively. The maximum of maximum number of consecutive wet days is calculated for Long Thanh and Cam My station (49 days), followed by Xuan Loc (36 days). Meanwhile, the maximum of maximum number of consecutive dry days is calculated for Cam My station (126 days), followed by Long Thanh station (114 days). Specifically, the maximum of highest 7-day precipitation amount are calculated for Thong N hat station (508.9 mm), followed by Xuan Loc station (442.2 mm). The maximum value of coefficient of skewness was observed at the station Xuan Loc for the maximum number of consecutive wet days.

In this study, the statistic of each test were tested at α = 0.05 level of significance. Based on minimum test statistic value, the ranking of different probability distributions were marked from 1 to 7 for the Anderson-Darling and Shapiro-Wilks tests of mentioned probability distributions. In case the value of test is not significant at α level, they are marked as zero. To find the best fit distribution, the maximum score of probability distribution was totaled based on the cumulative ranking. With the highest scored obtained is selected as the best fit distribution. The value of test is additionally considered as a criterion in case the same scores are seen between the probability distributions. The p-values of statistical tests are presented in Supplementary (from Tables 9-12).

Based on the calculated total test score obtained for each index for seven probability distributions, the best selected probability distributions for each data set are presented in Table 4. These distributions were defined using maximum total score from the selected goodness of fit test. As shown in Table 4, it is noteworthy that none of these probability distributions fits to the precipitation fraction due to extremely wet days, number of extremely wet days, number of

Table 4. Best fit probability distributions for extreme precipitation indices.

extremely wet days when precipitation greater than 99 percentage, and extreme heavy precipitation days (≥100 mm) at α level except the lognormal distribution is the best fit distribution to the precipitation fraction due to extremely wet days at Xuan Loc station.

It was shown that the best fit distributions of lognormal, Johnson, and loglositic fit to most of the extreme precipitation indices over this area. Specially, it was found out that the Johnson distribution is the best fit distribution to the data of very heavy precipitation days greater than 50 mm for a limited mountain area as presented in Table 4.

4. Conclusions

The results indicated a large range of fluctuation during the period of study for the maximum number of consecutive wet days from 4 days (minimum) to 49 days (maximum) and the maximum number of consecutive dry days from 13 days (minimum) to 126 days (maximum), respectively. The number of heavy precipitation could be up to 25% of days in a year (e.g., at Xuan Loc station). The maximum of annual precipitation (nearly 2900 mm) was seen at Long Thanh station. Analysis results revealed a potential precipitation amount over this area. The highest precipitation amount of 1, 3, 5, and 7-days could significantly contribute to potential the extreme flood events due to a large recorded precipitation amount.

It was seen that the best probability distributions were different for different extreme precipitation indices. In general, the distributions of Johnson, Loglogistic, and Lognormal are the best choices for most of extreme precipitation indices for a limited mountain area. Over this area, the best probability distributions are Lognormal and Loglogistic for the highest precipitation amount of 3, 5, and 7 days, respectively. Therefore, the author gives a recommendation that it should be firstly investigated the Lognormal, Loglogistic, and Johnson distributions in the studies dealing with extreme precipitation indices for other limited mountain areas in which are normally challenging to gather data.

Supplementary

Table 5. Summary of statistics for extreme precipitation indices at Xuan Loc station.

Table 6. Summary of statistics for extreme precipitation indices at Long Thanh station.

Table 7. Summary of statistics for extreme precipitation at indices Cam My station.

Table 8. Summary of statistics for extreme precipitation indices at Thong Nhat station.

Table 9. Summary of Anderson-Darling and Shapiro-Wilks tests of probability distributions statistics for extreme precipitation indices at Thong Nhat station.

Table 10. Summary of Anderson-Darling and Shapiro-Wilks tests of probability distributions statistics for extreme precipitation indices at Long Thanh station.

Table 11. Summary of Anderson-Darling and Shapiro-Wilks tests of probability distributions statistics for extreme precipitation indices at Cam My station.

Table 12. Summary of Anderson-Darling and Shapiro-Wilks tests of probability distributions statistics for extreme precipitation indices at Xuan Loc station.

Table 13. Total score of probability distributions for extreme precipitation indices at Thong Nhat station.

Table 14. Total score of probability distributions for extreme precipitation indices at Cam My station.

Table 15. Total score of probability distributions for extreme precipitation indices at Long Thanh station.

Table 16. Total score of probability distributions for extreme precipitation indices at Xuan Loc station.

Conflicts of Interest

The authors declare no conflicts of interest.

Cite this paper

Thanh, N. (2017) Fitting a Probability Distribution to Extreme Precipitation for a Limited Mountain Area in Vietnam. Journal of Geoscience and Environment Protection, 5, 92-107. doi: 10.4236/gep.2017.55007.

 [1] Gabriel, K.R. and Neumann, J. (1962) A Markov Chain Model for Daily Rainfall Occurrence at Tel Aviv. Quarterly Journal of the Royal Meteorological Society, 88, 90-95. https://doi.org/10.1002/qj.49708837511 [2] Haan, C.T., Allen, D.M. and Street, J.O. (1976) A Markov Chain Model for Daily Rainfall. Journal of Water Resources, 12, 443-449. https://doi.org/10.1029/WR012i003p00443 [3] Todorovic, P. and Woolhiser, D.A. (1976) Stochastic Structure of the Local Pattern of Precipitation. In: Shen, H.W., Ed., Stochastic Approaches to Water Resources, Vol. II, Fort Collins, 15.1-15.37, Vol. 49, 765-769. [4] Katz, R.W. (1977) Precipitation as a Chain-Dependent Process. Journal of Applied Meteorology, 16, 671-676. https://doi.org/10.1175/1520-0450(1977)016<0671:PAACDP>2.0.CO;2 [5] Aksoy, H. (2000) Use of Gamma Distribution in Hydrological Analysis. Turkish Journal of Engineering and Environmental Sciences, 24, 419-428. [6] Jones, J.W., Colwick, R.E. and Threadgill, E.D. (1972) A Simulated Environmental Model of Temperature, Evaporation, Rainfall and Soil Moisture. ASAE, 15, 366-372. https://doi.org/10.13031/2013.37909 [7] Nguyen, T.T. (2016) Improved Meteorological Data for Hydrological Modeling in the Tropics under Climate Change. [8] Wilks, D.S. (1998) Multisite Generalization of a Daily Stochastic Precipitation Generation Model. Journal of Hydrology, 210, 178-191. [9] Richardson, C.W. (1981) Stochastic Simulation of Daily Precipitation, Temperature, and Solar Radiation. Water Resources Research, 17, 182-190. https://doi.org/10.1029/WR017i001p00182 [10] Woolhiser, D.A. and Pegram, G.G.S. (1979) Maximum Likelihood Estimation of Fourier Coefficients to Describe Seasonal Variations of Parameters in Stochastic Daily Precipitation Models. Journal of Applied Meteorology, 18, 34-42. https://doi.org/10.1175/1520-0450(1979)018<0034:MLEOFC>2.0.CO;2 [11] Woolhiser, D.A. and Roldan, J. (1982) Stochastic Daily Precipitation Models: 2. A Comparison of Distributions of Amounts. Water Resources Research, 18, 1461-1468. https://doi.org/10.1029/WR018i005p01461 [12] Nicks, A.D. and Lane, L.J. (1989) Weather Generators. USDA-Water Erosion Prediction Project: Hillslope Profile Version. Report No. 2, West Lafayette, IN, Washington DC. [13] Bárdossy, A. and Plate, E. (1992) Space-Time Model for Daily Rainfall Using Atmospheric Circulation Patterns. Water Resources Research, 28, 1247-1259. https://doi.org/10.1029/91WR02589 [14] Hutchinson, M.F. (1995) Stochastic Space-Time Weather Models from Ground-Based Data. Agricultural and Forest Meteorology, 73, 237-264. [15] Liu, Y., Zhang, W., Shao, Y. and Zhang, K. (2011) A Comparison of Four Precipitation Distribution Models Used in Daily Stochastic Models. Advances in Atmospheric Sciences, 28, 809-820. https://doi.org/10.1007/s00376-010-9180-6 [16] Buishand, T.A. (1978) Some Remarks on the Use of Daily Rainfall Models. Journal of Hydrology, 36, 295-308. [17] Chapman, T.G. (1997) Stochastic Models for Daily Rainfall in the Western Pacific. Mathematics and Computers in Simulation, 43, 351-358. [18] Richardson, C.W. (1985) Weather Simulation for Crop Management Models. Transactions of the ASAE, 28, 1602-1606. https://doi.org/10.13031/2013.32484 [19] Srikanthan, R. and McMahon, T.A. (2001) Stochastic Generation of Annual, Monthly and Daily Climate Data: A Review. Hydrology and Earth System Sciences, 5, 653-670. https://doi.org/10.5194/hess-5-653-2001 [20] Wilks, D.S. (1999) Interannual Variability and Extreme-Value Characteristics of Several Stochastic Daily Precipitation Models. Agricultural and Forest Meteorology, 93, 153-169. [21] Semenov, M.A. (2008) Simulation of Extreme Weather Events by a Stochastic Weather Generator. Climate Research, 35, 203-212. https://doi.org/10.3354/cr00731 [22] Olofintoye, O.O., Sule, B.F. and Salami, A.W. (2009) Best-Fit Probability Distribution Model for Peak Daily Rainfall of Selected Cities in Nigeria. New York Science Journal, 2, 1-12. [23] Su, B., Kundzewicz, Z.W. and Jiang, T. (2009) Simulation of Extreme Precipitation over the Yangtze River Basin Using Wakeby Distribution. Theoretical and Applied Climatology, 96, 209-219. https://doi.org/10.1007/s00704-008-0025-5 [24] Semenov, M.A. and Barrow, A. (2002) LARS-WG: A Stochastic Weather Generator for Use in Climate Impact Studies. Hertfordshire, UK. [25] IPCC (2007) Climate Change 2007: The Physical Science Basis. Contribution of Working Group I to the Fourth Assessment Report of the Intergovernmental Panel on Climate Change. Cambridge University Press, Cambridge, UK and New York, USA. [26] Karl, T.R., Nicholls, N. and Ghazi, A. (1999) Clivar/GCOS/WMO Workshop on Indices and Indicators for Climate Extremes: Workshop Summary. Climatic Change, 42, 3-7. https://doi.org/10.1023/A:1005491526870 [27] Peterson, T.C. (2005) Climate Change Indices. WMO Bulletin, 54, 83-86. [28] Anderson, T.W. and Darling, D.A. (1954) A Test of Goodness of Fit. Journal of the American Statistical Association, 49, 765-769. https://doi.org/10.1080/01621459.1954.10501232 [29] Shapiro, S.S. and Wilk, M.B. (1965) An Analysis of Variance Test for Normality (Complete Samples). Biometrika, 52, 591-611. https://doi.org/10.1093/biomet/52.3-4.591 [30] D’Agostino, R.B. and Stephens, M.A. (1986) Goodness-of-Fit Techniques. Marcel Dekker, New York. [31] Royston, J.P. (1982) An Extension of Shapiro and Wilk’s W Test for Normality to Large Samples. Journal of the Royal Statistical Society. Series C (Applied Statistics), 31, 115-124. https://doi.org/10.2307/2347973