Ultimate Olympics Records in Athletics Using Extreme Value Theory

Extreme value theory provides methods to analyze the most extreme parts of data. We used the generalized extreme value (GEV) distribution to predict the ultimate 100 m, 200 m, 400 m, 4 × 100 m relay, and long jump records of male gold medalists at the Olympics. The diagnostic plots, which assessed the accuracy of the GEV model, were fitted to all event records, validating the model. The 100 m, 200 m, 400 m, 4 × 100 m, and long jump records had negative shape parameters and calculated upper limits of 9.58 s, 19.18 s, 42.97 s, 36.71 s, and 9.03 m, respectively. The calculated upper limit in the 100 m (9.58 s) was equal to the record of Usain Bolt (August 16, 2009). The 100 m and 200 m world records were close to the calculated upper limits, and achieving the calculated limit was difficult. The 400 m and 4 × 100 m relay world records were almost equal to the calculated upper limits and the 500-year return level estimate, and slight improvement was possible in both. At the Tokyo Olympics in August 2021, in the 100 m, 200 m, and 4 × 100 m, in one year the probability of occurrence for a record was about 1/30. In the 400 m and long jump, it was about 1/20. In the 100 m, 200 m, and 4 × 100 m relay, more difficult records show that a fierce battle has taken place.


Introduction
Extreme value theory (EVT) has emerged as one of the most important statistical disciplines in applied science. Extreme value techniques are also widely used in other disciplines, such as financial market risk assessment and telecommunications traffic prediction [1]. EVT deals with statistical problems concerning the far tail of the probability distribution and is unique as a statistical tool since it develops models and techniques to describe the unusual event rather than the usual. Using EVT, the theoretical distribution and its population parameter that the maximum value follows are estimated from long-term observation data. Additionally, the maximum or large value that occurs every period can be predicted based on the calculated results. For instance, EVT was developed in the 1920s and has been used to predict events such as droughts and flooding [2] or financial risk [3]. The application of extreme value modeling has been used in the fields of biomedical data processing [4], thermodynamics of earthquakes [5], and public health [6].  1991). The ultimate records in athletics were calculated using extreme value theory [7] [8] [9]. This study predicts the 100 m, 200 m, 400 m, 4 × 100 m relay, and long jump records for men at the Olympics using the extreme value theory.

Data
We used the men's 100 m, 200 m, 400 m, 4 × 100 m relay, and long jump records of gold medalists for men at the Olympics [10].

Method
EVT concerns with phenomena of extreme data. We used the block maxima method. A method for modeling the extremes of a stationary time series is block maxima, in which consecutive observations are grouped into non-overlapping blocks of length n, generating a series of m block maxima, Mn, 1, …, Mn, m, to which the Generalized Extreme Value (GEV) distribution can be fitted for some large value of n. The usual approach considers blocks of a given time length, thus yielding maxima at regular intervals [1]. Here a block was considered as a year, i.e., annual maxima values were used. Although the block maxima method is suitable for analyzing maximum value data, it has the disadvantage of being easily affected by one realization value and having a large variance in the estimator.
When data are taken to be the maxima (or minima) over certain blocks of time (such as annual maximum precipitation), it is appropriate to use the GEV distribution: where z are extreme values from blocks, μ a location parameter, σ a scale parameter, and ξ a shape parameter. G(z) is defined for all z such that (1 + ξ(z − μ)/σ) > 0 for ξ ≠ 0, and all z for ξ = 0. Three families of GEV distributions are defined depending on the value of ξ. We get the Fréchet distribution with a heavy tail for ξ > 0, the Gumbel distribution with a lighter tail for ξ = 0, and the Weibull distribution with a finite tail for ξ < 0.
We want to know how small the value is as the fastest speed, hence, it is necessary to multiply the 100 m, 200 m, 400 m, and 4 × 100 m relay data by −1 to put it in the framework of extreme value statistics that consider the maximum.
If a GEV distribution is fitted to observations, it becomes possible to estimate the probability of an event that has not yet been observed. Estimates of extreme quantiles of the annual maximum distribution are obtained by inverting Equation (1): where G(z p ) = 1 − p. z p is the return level associated with the return period 1/p, since z p is expected to be exceeded on average once every 1/p year with a reasonable degree of accuracy. More accurately, z p is exceeded by the annual maximum in any particular year with probability p [1].
Modeling was performed using the evd package in R for the GEV calculations.
We also tried a non-stationary model in the GEV, but it did not work.

100 m Run
The GEV parameter estimates, which were the results of the GEV modeling on the 100 m records of male gold medalists using the block maxima method, are shown in Table 1. The GEV parameters were estimated using the maximum    Estimates of the maximum return levels for the return periods of 10, 20, 50, 100, and 500 years along with their respective 95% confidence intervals, CI, are shown in Table 2. We estimated the 10-year return level to be 10.03 s, with a 95% CI (9.88, 10.18). We estimated the 100-year return level to be 9.69 s, with a 95% CI (9.60, 9.78). We explain it differently, so it means that there is approximately a 1% chance (1/100) each year that the 100 m record will not exceed 9.69 s. There is approximately a 10% chance (1/10) each year that the 100 m records

200 m Run
Estimates of the GEV parameters, which were the results of the GEV modeling on the 200 m records using the block maxima method, are shown in Table 3.
Because ξ was negative, the 200 m records of male gold medalists had a finite upper limit. The maximum return levels are shown in Table 4. The diagnostic plots for assessing the accuracy of the GEV model fitted to the 200 m records are shown in Figure 5. The estimated curve is nonlinear in the return level curve due to the negative ξ. The diagnostic plots supported the fitted GEV model.

400 m Run and 4 × 100 m Relay
Estimates of the GEV parameters, which were the results of the GEV modeling on the 400 m records of male gold medalists using the block maxima method, are shown in Table 5. Since ξ was negative, the 400 m records of male gold medalists had a finite upper limit. The maximum return levels are shown in Table  6. The diagnostic plots for assessing the accuracy of the GEV model fitted to the 400 m records for male gold medalists are shown in Figure 6. The estimated curve is nonlinear in the return level curve due to the negative ξ. The diagnostic plots gave the validity of the GEV model. Estimates of the GEV parameters, which were the results of the GEV modeling on the 4 × 100 m relay records of male gold medalists using the block maxima method, are shown in Table 7. Since ξ was negative, the 4 × 100 m relay records of male gold medalists had a finite upper limit. The maximum return levels are shown in Table 8. The diagnostic plots for assessing the accuracy of the GEV model fitted to the 4 × 100 m records for male gold medalists are shown in Figure 7. The estimated curve is nonlinear in the return level curve due to the negative ξ. The diagnostic plots supported the fitted GEV model.

Long Jump
Estimates of the GEV parameter estimates, which were the results of the GEV modeling on the long jump records of male gold medalists using the block maxima method, are shown in Table 9. Since ξ was negative, the long jump records of male gold medalists had a finite upper limit. The predicted maximum return levels are shown in Table 10. The diagnostic plots for assessing the accuracy of the GEV model fitted to the long jump records of male gold medalists are shown in Figure 8. The estimated curve is nonlinear in the return level curve due to the negative ξ. The diagnostic plots supported the fitted GEV model.

Discussion
The return level plot in the 100 m and 200 m records of male gold medalists in a log-log plot is shown in Figure 9. Approximately straight lines are also shown. The 200 m record has a larger decrease and a larger growth margin. The calculated upper limit was 9.58 s in the 100 m, which is equal to the record of Usain Bolt (2009). Einmahl (2011) estimated the ultimate world record and found 9.51 s for men [8]. The return level plot in the 400 m and 4 × 100 m relay in a log-log plot is shown in Figure 10. Approximately straight lines are also shown. The 400 m record has a larger decrease and a larger growth margin. The slope of the approximate straight line in Figure 9 and Figure 10 ("a" is the slope in Table 11) is the largest at 200 m (−0.01130) and the smallest at 400 m (−0.009324). The larger the slope, the greater the improvement in future records. The calculated upper limit was 43.03 s and 36.71 s in the 400 m and 4 × 100 m relay, respectively. In the 400 m, for the 500-year return period, the return level for men was obtained as 43.03 s, with a 95% CI (42.86, 43.24). Hence, the probability of occurrence in one year of the record of Wayde Van Niekerk, 43.03 s (2016), which was the first record in the world, was 1/500. In the 4 × 100 m relay, the 500-year return level for men was obtained as 36.85 s, with a 95% CI (36.54, 37.14). Hence, the probability of occurrence in one year of the record in Jamaica, 36.84 s (2012), which was the first record in the world, was 1/500.
The return level plot in the long jump records in a log-log plot is shown in Figure 11. An approximately straight line is also shown. The calculated upper limit in the long jump at the Olympics was 9.03 m. The long jump world record (8.95 m) was close to the calculated upper limit. Growth was small after the 100-year return period. Table 11 shows the results when the approximate straight line is y = bx a in    Denny (2008) followed an extreme value theory-related approach [11]. The predicted records for the 100 m, 200 m, and 400 m are 9.48, 18.63, and 42.73 s, respectively. The results agree with our 9.58, 19.18, and 42.97 s for the 100 m, 200 m, and 400 m, respectively. In our previous paper [12], the calculated upper limit in the 100 m for men using the world records for 1970-2010 is 9.46 s, which agrees much with Denny's result.
The slopes of the regression lines of records are shown in Table 13. The slopes of regression lines were roughly proportional to the running distance for 100 m, 200 m, and 4 × 100 m relay. In the 100 m, the slope of the regression line of records was the smallest and it was difficult to update records. From the density plot (Figure 4), in the 100 m there was almost one peak and the difference in records was small. The tendency was similar between the 100 m and 200 m, but the record update and the difference of records in the 100 m were smaller. In the 400 m, the slope of regression lines of records was the largest, and the record growth was large. The density plot ( Figure 6) showed two peaks, and the records were distributed widely. The 4 × 100 m had a smaller record update and a smaller record difference. In the 100 m, it is thought that an individual's ability from the beginning has a significant influence.
At the Tokyo Olympics in August 2021, the records of male gold medalists were 9.

Conclusions
Extreme value theory can provide methods to analyze the most extreme parts of data. We used the generalized extreme value (GEV) distribution to predict the ultimate 100 m, 200 m, 400 m, 4 × 100 m relay, and long jump records of male gold medalists at the Olympics. The results are summarized as follows: 1) The diagnostic plots, which assessed the accuracy of the GEV model, were fitted to all event records, validating the model.
2 3) The 100 m and 200 m world records were close to the calculated upper limits, and achieving the calculated limit was difficult. The 400 m and 4 × 100 m relay world records were almost equal to the calculated upper limits and the 500-year return level estimate, and slight improvement was possible in both. 4) At the Tokyo Olympics in August 2021, in the 100 m, 200 m, and 4 × 100 m, in one year the probability of occurrence for a record was about 1/30. In the Open Journal of Applied Sciences 400 m and long jump, that was about 1/20. In the 100 m, 200 m, and 4 × 100 m relay, more difficult records show that a fierce battle has taken place.
5) The relationship between the return level of the records and the return period follows a power law.

Conflicts of Interest
The author declares no conflicts of interest regarding the publication of this paper.