Optimal Threshold Determination for the Maximum Product of Spacing Methodology with Ties for Extreme Events

Extreme events are defined as values of the event below or above a certain value called threshold. A well chosen threshold helps to identify the extreme levels. Several methods have been used to determine threshold so as to analyze and model extreme events. One of the most successful methods is the maximum product of spacing (MPS). However, there is a problem encountered while modeling data through this method in that the method breaks down when there is a tie in the exceedances. This study offers a solution to model data even if it contains ties. To do so, an optimal threshold that gives more optimal parameters for extreme events, was determined. The study achieved its main objective by deriving a method that improved MPS method for determining an optimal threshold for extreme values in a data set containing ties, estimated the Generalized Pareto Distribution (GPD) parameters for the optimal threshold derived and compared these GPD parameters with GPD parameters determined through the standard MPS model. The study improved maximum product of spacing method and used Generalized Pareto Distribution (GPD) and Peak over threshold (POT) methods as the basis of identifying extreme values. This study will help the statisticians in different sectors of our economy to model extreme events involving ties. To statisticians, the structure of the extreme levels which exist in the tails of the ordinary distributions is very important in analyzing, predicting and forecasting the likelihood of an occurrence of the extreme event.


Introduction
Certain values in the tails of any distribution, represent extreme events and they are pointers to eventuality. The values in the tails are rare, few, but can have great impact on the conclusion arrived at by the analysts. Different sectors of our life experience extreme events and here we mention just but a few. According to [1] and [2], extremely low production in agriculture results to famine if the agriculture depends on rainfall. This means that the amount of rain experienced in that region was too low that crops dried up [3] or very high rainfall that it destroyed all crops that had been planted. [4] studying extreme rainfall in mountainous region and [5] studying extreme rainfall in west Africa did observe that, how low or high the amount of rainfall depends on the threshold attached to the rainfall in that region. In insurance industries [6], while discussing tools in finance and insurance, noted that extreme high claims by the customers can be very dangerous for the company while extreme low claims by the customers can be very beneficial for the company's profit. This means that there is a critical level that the insurance company would wish it is not surpassed and if it is, according to [7], it must be prepared for this eventuality. Very high emissions of the waste product from the manufacturing industries is detrimental to the environment and ozone layer. However, countries must continue to industrialize or expand their industries for economic prosperity. A certain level of emissions must not be exceeded otherwise the environment and ozone layer would be destroyed. The critical value for which if exceeded, an eventuality occurs is called a threshold. The events beyond this threshold are called extreme events and they happen to be at the tails of the distribution. Extreme value theory (EVT) is a tool which attempts to provide us with the best possible estimate of the tail area of the distribution. In [8] work on the importance of tail dependence in Bivariate frequency analysis, there are two principal kinds of model for extreme values. The oldest group of models is the block maxima models; these are models for the largest observations collected from large samples of identically distributed observations. According to [9] and [10], the block maxima/minima methods are fitted with the generalized extreme value (GEV) distribution. A more modern group of models is the peaks-over-threshold (POT) models; these are models for all large observations which exceed a high threshold. The POT models are generally considered to be the most useful for practical applications, due to a number of reasons. First, by taking all exceedances over a suitably high threshold into account, they use the data more efficiently and second, they are easily extended to situations where one wants to study how the extreme levels of a variable Y depend on some other variable X for instance, Y may be the level of tropospheric ozone on a particular day and X a vector of meteorological variables for that day. This kind of problem is almost impossible to handle through the annual maxima/minima method. POT methods are used where the exceedances are modeled to understand the behavior of the data in the tails. Many methods of determining an optimal threshold have been developed. The most common one is the graphical method proposed by [11]. This method Open Journal of Modelling and Simulation is however subjective and requires experts to determine the threshold. The most successful method is the Maximum Product of Spacing (MPS). Maximum product of spacing (MPS) method or maximum spacing estimation (MSE) method was proposed by Cheng [12] and Ramnaby [13] as an alternative method to maximum Likelihood Estimate method (MLE). A threshold approach for peaks over threshold using MPS was carried out by [14] and noted that the selection of a threshold is an important and challenging problem [15]. While studying traditional estimation methods and MPS in Generalized Inverted Exponential Distribution found out that MPS outperformed MLE and least square (LSE) methods on the basis of K-S distance and Akaike Information Criterion (AIC). This method however encounters a problem whenever the exceedances have a tie. This study intended to offer a solution to this problem.

Improved MPS Methodology
The MPS allows efficient estimators in non regular cases where MLE may not exist. This is especially relevant to the GEV distribution in which the MLE does not exist when 1 ε < − . According to [12] Maximum spacing estimators are sensitive to closely spaced observations, and especially ties. In cases of ties, some scholars have suggested that one value of each tie is taken [16] and [17]. Let This leads to the modified MPS method as In case 1

Estimation of Generalized Pareto Distribution Using the Modified MPS Method
To estimate the parameters, we substitute the GPD ( ) into the MPS method. This lead to two cases of estimating the GPD parameters.

Case 1: When 0
ε ≠ In this case: and The estimation of the parameters involves taking partial derivatives of Equation (10) with respect to each of the parameters and setting the result to zero. For the estimation of ε , the first term on the R.H.S is worked out as: implying that ( ) Working out the second term of Equation (10); Therefore; Open Journal of Modelling and Simulation And the last term of Equation (10); Therefore, Similarly, we parameter to estimate is σ , from the: Finally, we estimate θ as follows: and Therefore, after differentiating partially Equation (10) with respect to the parameters, we get the normal Equations (23), (24) and (25); where the terms (12), (14) and (16) respectively.

Case 2: When
Parameters under this case are estimated here. When 0 ε = , the spacings become; Therefore, The equations for estimating θ can also be derived from, Next, let * 2 K and P be defined as Equation (32), so that K be defined as Equation (15), implying that; Therefore, after differentiating partially Equation (29) with respect to the parameters, we get the normal Equations (39) and (40); where the terms (33) and (35) respectively.
where the terms (36), (37) and (38) respectively. The parameters were obtained from Equations (39) and (40) using numerical analysis procedures for optimization. Suitable values for k and θ for which the gamma distribution would produce long tail were selected. In this case, a simulation using 2.6 k = and 1:1000 θ = was made and the density of the simulation is shown in Figure 1.

Simulation Study
The x-axis of Figure 1 represents the number of values generated for the specified parameters of gamma distribution. The y-axis represents the density of those values. Figure 1 indicates that majority of the values are concentrated Open Journal of Modelling and Simulation When the two parameter standard and improved models were used, the following results Table 1 from the simulation were obtained.
The threshold (location parameter) from the improved MPS model was higher than that obtained through standard MPS model the performance of the parameters in the standard and improved models,the GPD parameters determined in Table 1 and Table 2 were backtested in the simulated data. Table 3 contains a summary of the values obtained. From

Effects of Number of Repetitions on Threshold
In this section, the effect of number of repetitions on threshold was investigated.
The gamma distribution with parameters According to [16], some values of each tie have to be dropped to leave only one value of each tie for the standard MPS to work. Therefore, samples of size 300 with ungrouped ties of 0, 20, 40 and 60 repetitions were each subjected to the two parameter standard MPS model. The results obtained were as in Table 4.
The threshold (location) value increased as ties increased. However, the scale parameter increased from 5.993496 to 9.99465 then decreased as the number of ties increased. The samples with grouped ties were subjected to the two parameter improved MPS model and the results obtained are as in Table 5. It was observed that, the location parameter(threshold value) increased as the number of ties increased. The scale parameter increased from 4.097801 to

Location Parameter
When there was no tie Figure  The trend observed in the two parameter model Figure 3 was also observed in the performance of the three parameter models Figure 4. However, the drop at 32 repetitions in the three parameter models was not as big as in the two parameter models. From the two plots above, the improved MPS model performed better than the standard MPS models as ties increased.

Shape Parameter
The two parameter standard and improved model have location and scale parameters only. These models have a zero shape parameter. For the three parameter standard and improved models, the shape parameter performed as shown in Figure 7.
The two models had the same shape parameter when the data had no

Back Testing Two Parameter Models
The determined threshold obtained in different samples were back tested in the sample data they were obtained from to assess their performance.
For two parameter models at 0 repetitions, both models, the standard and the improved MPS model

Back Testing Three Parameter Model
To assess the performance of the threshold obtained when the GPD had the shape parameter,a back testing was done on the samples through the two models and the results were as in Table 9.
For 0 repetitions, the number of observations above the threshold were 16 in both models Table 9.

Conclusion
This study helped to improve the MPS model by introducing the concept of f to both two-parameter and three-parameter model 23, 24, 25, 39 and 40. Through simulation, the improved MPS, both two-parameter and three-parameter models yielded a higher threshold as compared to the two standard MPS model Table 1 and helps to yield a more optimal threshold which in turn would help different sectors of the country's economy to be adequately prepared for any eventuality.