1. Introduction
The earthquake magnitude probability distribution is one of the underlying input data for certain earthquake potential assessments, such as probabilistic seismic hazard analysis (PSHA) that is aimed at estimating the site-specific annual rate of a ground motion exceedance; say, the annual rate of PGA > 0.1 g at a study site is equal to 0.01 per year (Cornell, 1968 [1] ; Roshan and Basu, 2010 [2] ; Liu et al., 2013 [3] ; Yazdani et al., 2012 [4] ; Peñarubia et al., 2020 [5] ; Anbazhagan et al., 2009 [6] ). Figure 1 shows the observed earthquake magnitude probability distribution based on more than 50,000 earthquakes occurring around Taiwan from 1980 to 2021. Figure 2 shows the epicenters of the earthquakes.
Nonetheless, the observed distribution is not used in PSHA, but the simulation is. To the best of our knowledge, the method proposed by McGuire and Arabasz (1990) [7] is the “standard” approach for obtaining the simulated magnitude distribution used by PSHA (e.g., Wang et al., 2013 [8] ). The method is based on simple probability calculations that would be elaborated in one following section. Note that the conventional method (McGuire and Arabasz, 1990 [7] ) was referred to as the M-A method hereafter.
Figure 1. The observed magnitude distribution based on the earthquakes around Taiwan from 1980 to 2021.
Figure 2. The epicenters of the earthquakes around Taiwan from 1980 to 2021.
Using the M-A method, Figure 3(a) shows the simulated magnitude probability distribution along with the observed one around the region of Taiwan (Figure 2). Compared to the observation, the two look like in good agreement, but actually the normalized differences (i.e., (observation – simulation)/observation) are quite substantial, in the range of –12% - 35% as shown in Figure 3(b).
(a) (b)
Figure 3. (a) the observed and simulated magnitude probability distributions using the M-A method; (b) the normalized difference, i.e., (observation – simulation)/observation, between the observation and simulation.
Like the normal distribution that is commonly used in statistical and probabilistic studies, the gamma distribution has also been utilized in a variety of studies (e.g., Okagbue et al., 2020 [9] ). For example, in medical science, it was found that the sum of basal and menstrual iron losses in women can be modeled by the gamma distribution satisfactorily (Yokoi, 2020 [10] ). In archaeology, the age-at-death profile of a caprine was considered a random variable following the gamma distribution, which helped reconstruct the herd management strategies (Timpson et al., 2018 [11] ).
In addition, Xie and Wu (2017) [12] found that using the gamma distribution in finance engineering is also advantageous. Specifically, they found that the gamma distribution can better model the volatility of financial assets. In ocean engineering, the joint probability density function of the steepness and height of deep water waves can be well modeled by a bivariate gamma distribution (Antão and Soares, 2016 [13] ), and in operational research, Snyder (1984) [14] considered that using the gamma distribution can help manage the inventory more effectively than using other probability distributions (e.g., normal distribution). Such “gamma” examples can go on and on.
This study introduces a new method to simulate the magnitude probability distributions, which is based on one of the commonly used probability distributions in statistical studies, namely, the gamma distribution. Based on the seismicity in four different regions, we statistically obtained the observed magnitude distribution, and used the current M-A method and the proposed “gamma” method to generate the respective simulations. Accordingly, the conclusion was drawn from the data and the analysis.
2. Methodology
2.1. G-R Law and M-A Method
The aforementioned M-A method for simulating the observed earthquake magnitude probability distribution was based on the Gutenberg-Richter (G-R) recurrence law (Gutenberg and Richter, 1944 [15] ). Therefore, before introducing the technical details of the M-A method, the G-R law was first introduced in the following.
The G-R law was discovered by Gutenberg and Richter (1944) [15] when they examined the relationship between “the logarithm of earthquake numbers” and “the magnitude of exceedance” in California. Mathematically, it can be expressed as follows:
(1)
where
denotes the number of earthquakes with magnitude above
, and a and b are the model parameters, also known as the a-value and b-value in engineering seismology. Figure 4 shows the Gutenberg-Richter relationship based on the earthquake data around Taiwan (Figure 2), and indeed “the logarithm of earthquake numbers” and “the magnitude of exceedance” were well correlated with R2 > 0.99 (coefficient of determination). Specifically, the slope1 of the line is called the b-value, which is equal to 1.06 in this case.
Therefore, using the G-R law, we can estimate the number of earthquakes with magnitude above m*:
(2)
As a result, the number of earthquakes with magnitude between m1 and m2 (m1 < m2) becomes:
(3)
Similarly, given the cutoff magnitude and the maximum magnitude as m0 and mmax, respectively, the earthquake number with magnitude within the boundary is:
(4)
Therefore, within the given range of m0 and mmax, the probability of the earthquake magnitude between m1 and m2 is equal to Equation (3) divided by Equation (4) based on the fundamental of probability calculation:
(5)
Because 10a is a constant, it can be canceled out; then Equation (5) becomes:
(6)
Figure 4. The G-R law and the b-value for the earthquake data around Taiwan (Figure 2).
Essentially, such a derivation (from Equation (2) to Equation (6)) based on the G-R law (Equation (1)) is the methodology of the M-A method for obtaining a simulated earthquake magnitude probability. As expressed in Equation (6), the simulation is governed by the b-value of the G-R law. In other words, as long as the b-value of the study region is available, we can develop the simulated earthquake magnitude probability distribution using the M-A method. For instance, the simulated magnitude probability distribution around Taiwan (shown in Figure 4) was based on the b-value of 1.06 using the M-A method.
2.2. Skewness and Gamma Distribution
In probability and statistics, skewness is the third central moment of a random variable (Ang and Tang, 2007 [16] ), which is an indicator to the level of symmetry of the variable. For instance, for a random variable following the normal distribution, its skewness is equal to 0, because the normal distribution is perfectly symmetrical by definition. By contrast, the observed magnitude probability distribution (shown in Figure 1) is highly skewed, or the distribution is very asymmetrical.
Therefore, for modeling a highly-skewed observation, the perfectly symmetrical probability distributions (such as the normal distribution and uniform distribution) can be ruled out. Instead, a more “versatile” probability distribution, which can be either symmetrical or asymmetrical, will be more suitable for the simulation, such as the gamma distribution.
The probability density function (PDF) of the gamma distribution is as follows (Ang and Tang, 2007 [16] ):
(7)
where α and β are the model parameters of the gamma distribution;
denotes the gamma function as:
(8)
Originally, the “gamma variable” is in the space of [0, ∞]. Nevertheless, if a gamma variable is in the space of [x0, ∞], the gamma distribution is still applicable to modeling the variable by “shifting” or modifying Equation (8) as:
(9)
where x0 is a real value.
2.3. Model Calibration
As the gamma distribution is chosen and used for the simulation, the next step is to calibrate the two model parameters, i.e., α and β. In the old days, the task could be very difficult with hand calculation, but it becomes so easy nowadays by using a personal computer along with the software like Excel. Specifically, we used Excel in this study for the model calibration.
More technical details concerning the model calibration are as follows: 1) choosing a proper pair of initial values of α and β; 2) substituting the initial value into Equation (9) to calculate the interim gamma distribution; 3) by setting the objective as the minimum difference between model and observation in Excel Solver, obtain the parameter that can best-fit the observation.
2.4. Chi-Square Test
The chi-square test is one of the commonly used goodness-of-fit tests for statistical analyses, and its essence is to calculate the difference between observational and theoretical frequencies in the histogram; then based on the difference, the hypothesis (e.g., earthquake magnitude follows the gamma distribution) would be rejected or accepted with statistical significance. In particular, the difference is referred to as the chi-square value (
) in the chi-square test, which was defined as (Ang and Tang, 2007 [16] ):
(10)
where ti and oi denote the theoretical and observational frequency; n is the number of data bins in the histogram. Understandably, when the chi-square value is 0, the inference is that the model and observation are in perfect fit. On the contrary, the larger the chi-square value is, the greater the difference between model and observation is. In other words, when the difference between model and observation is too large, it is very unlikely that the observation would follow the theoretical distribution with statistical significance.
Therefore, like any other statistical goodness-of-fit tests, the chi-square test is subject to a given level of significance for determining the threshold or critical value. Usually, a 5% level of significance is used in statistical studies (Wang et al., 2011 [17] ). After the critical value is determined based on the given level of significance, the practitioner will compare the critical value to the calculated difference or the chi-square value between model and observation. When the difference is greater than the critical value, the hypothesis is rejected with statistical significance; otherwise it will be accepted.
2.5. USGS Earthquake-Catalog Database
For “reinforcing” this study, we conducted more case studies in addition to the study using the earthquake data around Taiwan. However, since we do not own the earthquake catalogs in different regions than Taiwan, we utilized to the open database available on the website of the U.S. Geological Survey (https://earthquake.usgs.gov/earthquakes/search/) (USGS, 2023 [18] ).
Figure 5 shows the interface of the online USGS earthquake database (USGS, 2023 [18] ). Basically, users simply need to input the cutoff earthquake magnitude, the target searching region and the target time span, and then the earthquake data will be compiled in an Excel file for the users to download. In addition, the USGS earthquake database features the user-friendly functionality: users can “draw rectangle on map” (see the bottom right corner in Figure 5) to “input” the searching area of interest by this very simple act, without looking up their longitudes and latitudes in advance.
Nevertheless, it must be noted that although most earthquakes in the USGS database are in moment magnitude (Mw), some are in local magnitude (ML), surface wave magnitude (Ms), body wave magnitude (Mb), and duration/code magnitude (Md). As a result, we had to convert those into moment magnitudes for consistency purpose in order to obtain a more meaningful result. Therefore, the following relationships were used (Kadirioğlu and Kartal, 2016 [19] ):
(11)
(12)
(13)
(14)
(15)
3. Results
3.1. Taiwan
Using Excel Solver, Figure 6 shows the best-fit gamma distribution with α = 1.29 and β = 0.37, in comparison to the observed magnitude probability distribution in Taiwan. It shows that the two are in very good agreement, so the inference is that the earthquake magnitude could be a gamma variable, or the observed magnitude distribution could follow the gamma distribution.
Figure 5. The interface of the USGS online earthquake database.
Figure 6. The observed earthquake magnitude probability distribution around Taiwan and the simulation using the gamma distribution.
To quantitatively validate the presumption, we conducted the chi-square test, and the numbers we obtained are as follows: 1) the critical value with 5% level of significance is equal to 9.5; 2) the difference between the observed and simulated frequencies (based on Equation (10)) is equal to 5.9. As a result, with the difference less than the critical value, the hypothesis—earthquake magnitude distribution follows gamma distribution—was accepted with statistical significance.
3.2. Japan
Figure 7 shows the study area around Japan, which is also the searching area for collecting the earthquake data from the USGS database. More than 10,000 earthquakes with cutoff magnitude 4.5 were collected from 1980 to 2021. Accordingly, Figure 8 shows the G-R law for the areas, and the b-value is equal to 1.07.
Next, also based on the observed earthquake data, we conducted the statistical calculation to develop the observed earthquake magnitude distribution. Then with the b-value = 1.07, we used the M-A method to develop the simulated distribution. Finally, based on the observed magnitude distribution, we calibrated the best-fit gamma distribution for simulating the observation. The three are shown in Figure 9 altogether.
As shown in Figure 9, the same outcome was found: 1) the gamma distribution can better simulate the observation than the conventional M-A method; 2) based on the chi-square test, the hypothesis was accepted: earthquake magnitude is a gamma variable or the earthquake magnitude probability distribution follows the gamma distribution.
Figure 8. The G-R law for the region of Japan.
3.3. California
Figure 10 shows the searching area around California. From 1980 to 2021, more than 10,000 earthquakes with cutoff magnitude of 4.5 were also collected from the USGS database for this study area. Accordingly, Figure 11 shows the G-R law for this area, and the b-value is equal to 0.92.
Also based on the observed earthquake data, we conducted the statistical calculation to develop the observed earthquake magnitude distribution. Next, with the b-value = 0.92, we used the M-A method to develop the simulated distribution. Finally, based on the observed magnitude distribution, we calibrated the best-fit gamma distribution to model the observation. The three are shown in Figure 12 altogether.
As shown in Figure 12, the same outcome was also found: 1) the gamma distribution can better simulate the observation; 2) the hypothesis (magnitude is a gamma variable) was also accepted.
Figure 9. The observed earthquake probability distribution for the area of Japan, along with the two simulations generated with the conventional and proposed methods.
Figure 11. The G-R law for the region of California.
Figure 12. The observed earthquake probability distribution for the area of California, along with the two simulations generated with the conventional and proposed methods.
3.4. Greece and Turkey
Figure 13 shows the searching area around Greece and Turkey within the same time span as the previous cases. Accordingly, Figure 14 shows the G-R law for the area, and the b-value is equal to 1.09.
With the observed earthquake data, we conducted the statistical calculation to develop the observed earthquake magnitude distribution. Next, with the b-value = 1.09, we used the M-A method to develop the simulated distribution as the previous cases. Finally, based on the observed magnitude distribution, we calibrated the best-fit gamma distribution to model the observation. The three are shown in Figure 15 altogether.
As shown in Figure 15, the same outcome was also found from this case study: 1) the gamma distribution can better simulate the observation than the conventional M-A method; 2) the hypothesis was accepted based on the chi-square test: the earthquake magnitude distribution follows the gamma distribution.
Figure 13. The study area around Greece and Turkey.
Figure 14. The G-R law for the region of Greece and Turkey.
Figure 15. The observed earthquake probability distribution for the area of Greece and Turkey, along with the two simulations generated with the conventional and proposed methods.
3.5. Summary
The four case studies all demonstrated that the proposed “gamma” method can better model the observed earthquake magnitude distribution than the conventional M-A method. Besides, the four case studies also show that the hypothesis—earthquake probability distribution follows the gamma distribution—was accepted by the chi-square test. Table 1 summarizes the results of the four case studies, including model parameters, chi-square values, etc.
4. Discussions
4.1. Log-Delog Transformation
Given that the Gutenberg-Richter relationships (e.g., Figure 4) are evident, and the earthquake magnitude probability derived by the M-A method is flawless (i.e., Equation (2) to Equation (6)), it is puzzling why the method cannot model the observed earthquake magnitude histogram satisfactorily (e.g., Figure 3). We consider the cause is attributed to the “log-delog” transformation, which is elaborated in the following.
As mentioned previously, the Gutenberg-Richter recurrence law describes a strong linear relationship between “the logarithm of earthquake number” and “the magnitude of exceedance” (i.e., m*). Nevertheless, as shown in Figure 4, although the regression relationship is indeed very strong with R2 greater than 0.99, it is not a perfect fit with R2 = 1 nonetheless. In other words, the difference between the prediction and observation is still in existence. Taking Figure 4 for example, the observed value of logNM≥4.5 is equal to 3.16, and the predicted value of logNM≥4.5 (on the regression line) is equal to 3.24; the difference is “only” 0.08.
Table 1. Summary of the result of using the gamma distribution to model the observed magnitude probability distribution.
* On 5% level of significance. ** Earthquake magnitude follows gamma distribution.
However, we care to realize and use “earthquake number” in certain earthquake assessments, not “logarithm of earthquake numbers.” As a result, the “de-log” calculation is necessary in the process. Using the example above: given logNM≥4.5 = 3.16, thus the observed earthquake number with magnitude above 4.5 (i.e., NM≥4.5) is equal to 1447 (=103.16) after the “de-log” calculation. Similarly, given the predicted value of logNM≥4.5 = 3.24, the predicted earthquake number is 1717 (=103.24) after de-log. Suddenly, the difference of 0.08 (=3.24 – 3.16) in the log scale is amplified to 270 (=1717 – 1447) in the “normal” scale. Therefore, this is the fundamental cause that the conventional M-A method, based on
, cannot well capture the observed magnitude probability distribution. By contrast, the proposed “gamma” method does not utilize the G-R law, and the log-delog transformation is not involved in the process, either.
4.2. User-Friendliness
Indeed, it is easy or user-friendly to use the conventional M-A method to generate the (simulated) earthquake magnitude probability distribution, as long as the practitioners have obtained the b-value from the Gutenberg-Richter relationship. This might be the reason that the M-A method is widely used nowadays for obtaining a simulated earthquake magnitude probability distribution.
Nevertheless, the proposed method is as user-friendly as the M-A method. All the practitioners need to do is calibrate the two model parameters with Excel Solver (or other similar software), as we just have done it for this study.
4.3. “Try-And-True”
As other statistical fitting, the finding of this study is a result of “try-and-true.” In fact, this study tried several asymmetrical probability distributions for this study, including the lognormal distribution, beta distribution, Weibull distribution, and obviously, the gamma distribution.
Actually, we found that the Weibull distribution can also better model the observed magnitude probability distributions than the M-A method. However, the performance of the Weibull distribution is about 12% less than the gamma distribution, in terms of the calculated difference between the observed and simulated frequencies.
4.4. Earthquake Prediction
The reason we developed different seismic hazard mitigations, such as seismic hazard analysis, is that the earthquake prediction seems still impractical as we speak (Geller et al., 1997 [20] ). Geller et al. (1997) [20] attributed it to the fact that the stress and strain conditions of “the material” several kilometers deep where earthquakes are initialized cannot be monitored by modern instrumentations because they cannot be installed there with our current technology.
However, recent studies show some promising perspectives toward earthquake prediction from different “angles” than focusing on the law of physics about material failure. The studies include the use of the very-low-frequency electromagnetic emission as the precursor of earthquakes (Kachakhidze and Kachakhidze-Murphy, 2022 [21] ), and the observation of the cloud anomaly as the observed value exceeds 2 - 3 times of the standard deviation (Guo, 2021 [22] ). As a result, with more collective efforts spent on earthquake prediction continuously with various “unconventional” methods in addition to the “conventional” mechanics perspective, we believe that we can keep making progress in earthquake prediction, and one day the human being can predict a major earthquake with our knowledge.
4.5. PSHA “Controversy”
PSHA seems a controversial subject in engineering seismology. Although many PSHA case studies have been conducted, a few studies commenting on PSHA’s robustness were also reported. We suggested the proponents of PSHA could review the paper by Mulargia et al. (2017) [23] and provide some rebuttal, in an effort to make the analysis more objective, transparent, and robust.
5. Conclusion
Based on four case studies, this paper shows that using the gamma distribution to model the observed earthquake magnitude probability distribution is superior to the conventional M-A method. The reason that the conventional method cannot well capture the observed magnitude probability distribution is due to the log-delog transformation in the process, while it is not needed in the newly proposed (gamma) method. Besides, the new method is as user-friendly as the conventional approach, with the model parameters can be easily calibrated with Excel Solver.
Acknowledgements
The authors appreciate the unbiased and constructive comments from the Editors and Reviewers. The financial support on the research from National Science and Technology Council (Taiwan) is also appreciated (project no. 110-2221-E-008-017-MY2).
NOTES
1Actually, b = the slope × (−1).