Estimating the Components of a Mixture of Extremal Distributions under Strong Dependence ()
1. Introduction
In many applications of Statistics, the finite mixture model had been widely used to describe the distribution of data. A finite mixture model is a distribution that may be written as a finite, convex linear combination of distributions belonging to parametric classes. For instance, a mixture of k normal distributions, each one with its mean and variance, is a basic example, where the parameters involved are
non-negative weights (because their sum is one), and the 2k parameters corresponding to each mean and variance, making a total of
parameters. In both theoretical developments and specific applications, the use of finite mixture models and the development of techniques of estimation of the unknown parameters have been deeply studied, with developments such as the expectation-maximization algorithm (EM) and its variants [1] [2] [3] [4] [5] .
It should be noticed that the parametric classes of distributions involved in the mixture may be different. For instance, one may consider a mixture of a Normal distribution with an exponential one.
In a general, abstract framework, one of the first questions to answer when considering a finite mixture model is if it is identifiable, that is, if there is a unique combination of all the involved parameters to express a given distribution. It is obvious that if the finite mixture model is not identifiable, estimations will be affected seriously by the fact that there are different sets of parameters leading to the same distribution.
More recently, both in theoretical and applied developments, the finite mixture of extremal distributions has increased its consideration [6] [7] [8] [9] . In Propositions 2.3.3 of [6] , it is shown that finite mixtures of extremal distributions are identifiable, leading to the estimation of weights and parameters of the extremal components based on a random, iid, sample.
In a recent paper [10] , another reason to pay attention to finite mixtures of extremal distributions is provided, because it is shown in its Theorem 1 that, trying to mimic Fisher-Tippet-Gnedenko theory, when studying the asymptotic distribution of the maximum of a large sample, if data are non-stationary and strongly dependent, under very mild assumptions, the limit distribution is a finite mixture of extremal distributions, instead of an extremal one. This means that, when trying to fit a sample consisting on the list of the maximum values of blocks of a large number of continuous measures to a Generalized Extremal Value distribution (GEV), if the result of testing or diagnostic analysis is rejection, it may be related to a non-detected strong dependence and non-stationary structure on data. In addition, in many real data sets, in particular in environmental studies, non-stationarity and strong dependence should be expected. Consider the case mentioned in [11] , when each data of our sample is the maximum wind speed registered by an online anemometer in a 10 minutes period, that may be well-fitted to a mixture of extremal distributions. If we dispose of several years of data, since one year has 52.560 periods of 10 minutes, and wind speed is affected by global phenomena that induce dependence trough years, one finds a significative correlation between data with lags of the order of 105 (or more), and non-stationarity is often evident.
Therefore, we need to develop a method for the estimation of the components of a finite mixture of extremal distributions, for large samples of strong dependent, and non-stationary data. Such a method will be a substantial improvement for the statistical analysis of large samples of complex environmental data.
This is the focus of the paper. More precisely, we will first recall the strong dependent and non-stationary models presented in [10] and propose an estimation method for the components of the mixture of
extremal distributions. We will focus on the mixture of Fréchet distributions, for the sake of simplicity, and because they correspond to the most heavy-tailed data. Further, we will prove the consistency of our estimators and expose their performance using data simulated following models presented in [10] , and checking the quality of the fitting of the estimated model to data, using the test for these types of models provided in [11] .
Therefore, the method introduced here is a new and effective tool for the statistical analysis of strong-dependent data, as is required in several environmental applications.
2. Preliminary Results
At first, we will now recall the main result of [10] in a compressed manner. We assume that classical Fisher-Tippet-Gnedenko theory, in particular concepts like maximal domain of attraction (MDA, in what follows), are well-known for the reader. For a reference in the topic as well as some examples of its wide domain of application to real data, see [12] [13] [14] [15] .
Our data will be
with
, where
and we will assume the following hypotheses:
(H1)
where
is a positive random variable. More
precisely, if
, and
, then, since for any j,
is
-measurable, if
is trivial (what means weak dependence on the process Y),
are deterministic, but, if
is not trivial (what means strong dependence on the process Y), for some j,
may be non-deterministic.
(H2) For any j,
assume a finite numbers of values.
(H3) The three following conditions are fulfilled.
1)
is iid
2)
satisfy (H1) and (H2)
3) The processes
and
are independent.
(H4) For any
the process
belongs to the MDA of the GEV
, where
is the most heavy-tailed of them, and corresponds to a Fréchet distribution of order
(we will denote
the standard Fréchet distribution of order
.
We are now in conditions to present the main result of [10] .
Theorem 1 of [10] .
Under (H3) and (H4) there exists a random variable Z such that
In addition:
1) If
is trivial, then the distribution of Z is
.
2) If
is not trivial and
assumes the values
with probabilities
, then the distribution of Z is
(Mixture of Fréchet distributions).
Remark 1
Part b of Theorem 1 means that finite mixtures of Fréchet distributions of the same order, but with different scale parameters, appear when one tries to approximate the distribution of the maximum of a large sample of strongly-dependent data. As mentioned in the introduction, this is a situation that appears when dealing in practice with environmental data. Therefore, from now on, we will try to provide statistical procedures to estimate the order
, the weights
and the scale parameters
assuming that such a mixture applies to our data and validate (or not) its fitting by means of the test provided in [11] . Finally, for the sake of simplicity, and taking into account that estimations will be tested, in the case of the order
, we will just use an exploratory estimator. Even if the results exposed in this paper are auspicious, it is clear that for a deeper approach, the estimation of the order
must be refined.
We will provide now some classical statistical procedures enabling to prove consistency of estimators.
First, remember that for
independent, centered and bounded we have that
Let us also remember that this implies complete convergence of
to zero for n tending to infinity, i.e.,
what in turn implies almost sure convergence, i.e.,
Then we have the following consistency result.
Theorem 1: If
satisfy (H3) of [10] , and
is a bounded function, then
,
where
Proof
First, consider
It is clear that
, and that
are bounded. Then, calling
, and
a fixed element of
, we have, for any
,
But
where
, that are clearly independent, centered and bounded variables, and therefore
what implies that
, what in turn implies that
.
Therefore
But
and hence, we conclude that
Remark 2
As a clear consequence of Theorem 1 the empirical distribution of a large sample satisfying (H3), and where data are equally distributed, converges to the theoretical distribution at any given point. That is, the empirical distribution is a consistent estimator of the theoretical one at any given point. Calling F to the theoretical distribution and
to the empirical one, when F is continuous, since
is monotonous, by well-known elementary arguments, consistency is uniform, that is
This result is consistent with (slightly more general, in fact) Theorem 1 of [11] .
3. Mixture of Two Components - Simulation of Data
We will consider now the case of a mixture of
extremal distributions. The procedure to simulate our data follows very closely the one proposed in [10] , but we will explain it here, for a better reading and comprehension.
Example I:
Let U be a random variable such that
,
. Let
, …an iid sequence of random variables on
independent of U such that
,
,
,
, with
,
independent among them for any i,
,
,
. Set
.
Thus,
has the same distribution as
(by the Strong Law of Large Numbers).
On the other hand
has the same distribution as
. Therefore, we have that
Hence,
is not-deterministic and
is not trivial. Similar treatment applies to
.
Example II:
Now, if
, we have that
fulfills (H1), (H2) of section 0.2, with
,
random variables such that
Thus, if we asume
and consider two independent sequences
,
, with
,
and we set:
1) If
2) If
Then,
fulfills (H3), (H4) of section 0.2 and therefore, Theorem 1 of [10] applies and,
, with
, a mixture of Fréchet distributions of order
.We use this algorithm to simulate our data for evaluation of estimation methods in the case of
.
4. A Method for Estimation of Parameters
As explained in Remark 1 we will just provide a very rough estimation procedure for the order
.
4.1. An Exploratory Estimation for α
In our model:
where
,
,
,
, we may assume, without loss of generality that
. Since
, we have then
For x large enough,
and
are close to zero, and since
for u close to zero, we then have that, for x large enough
and, therefore,
which tends to
as x goes to infinity.
Then, since by Theorem 1 the empirical distribution
is an uniformly consistent estimator of MF,
will be estimated by the values of:
for x large enough.
As we will see later on, we simulate a mixture of two Fréchet distributions of order 1, and Figure 1 shows that the estimation procedure is consistent.
4.2. Estimation of p, v1, v2
From now on, we shall assume
known, and we will focus on the estimation of p (
), and
,
, (
).
Let us consider three particular values: 1,
,
. It is clear that
, and that
. We have:
(1)
Calling:
(2)
and since
, we have that
and we get:
(3)
and, thus, the estimation of
leads to the estimation of
. Further observe that:
and therefore, (1) may be rewritten as:
(4)
As usual in Statistics, and taking into account that Theorem 1 shows the uniform consistency of
as an estimator of MF for our model, if we replace in (4) MF by
and we manage to solve the equations in
, this will lead to a consistent estimation of
. For the sake of simplicity, we will denote
, their estimated values (instead of
). Therefore, we will solve (4):
(5)
Taking the first two Equation of (5), it is clear that they can be rewritten in matrix terms as:
(6)
Calling
we have that
, since
), what means that
is invertible with inverse matrix
and therefore, we have
(7)
Remark 3
It should be noticed, as be used later on, that, more in general, if
,
and we consider the
matrix:
then
is invertible.
Thus
calling
to the first and second rows of
, we get the non-linear system:
(8)
with
,
as variables. Adding to this system the only equation of (4) that we have not used yet,
, that can be rewritten
(9)
and imposing the restriction
(10)
we replace (9), and (10) in (7), obtaining
(11)
with
,
depending only on p, u, because v is replaced by (8), and p, u, restricted to the constraints
(12)
we arrive to the non-linear equation
(13)
under the constraints (12). (13) is solved by the Newton-Raphson method or any other non-linear equation-solving method. Then, using (13), from the estimators
we get the estimators
.
Remark 4
As mentioned before, the estimation procedure leads to consistent estimators of the parameters. Then, one may ask by their asymptotic distribution to perform confidence intervals, etc. Even if this is not included in the main goals of this work (because, as pointed out in the introduction, we will validate estimations by suitable testing), we shall explain briefly how this asymptotic distribution is obtained. The solutions of the non-linear Equation (13), using the Implicit Function Theorem may be expressed in the following way
(14)
with h a differentiable function.
Since in the preliminaries of Theorem 2 of [11] the asymptotic distribution of the empirical process is derived, a standard application of the Delta Method ( [16] ), leads to the asymptotic distribution of the estimators
. Off course, the same applies to
. Its estimation will be treated later on, but this remark also applies in that context.
5. Testing the Estimated Model
As a concrete example of the method as well as a validation procedure, we will now simulate a large sample with strong-dependence, where the common distribution of all the data is a mixture of two Fréchet distributions. We will test if data fits to a single Fréchet distribution and rejection is expected. Further, we will use our method to estimate the parameters of a mixture of two Fréchet distributions, and in this case it is expected that the goodness of fit test does not reject the estimated model.
We will then choose as the true model a mixture of Fréchet distributions with
,
,
, that is
We computed 4000 maximums, each one coming from samples of size 500 of the simulation procedure described in section 0.3, with parameters
,
and
. As mentioned before in section 0.3, by Theorem 1 of [10] , these maximum should follow a distribution very close to our choice of MF.
Remark 5
It should be noticed that indeed, we are not simulating data following the distribution MF, but following a distribution that is very close to MF, according to Theorem 1 of [10] . There are two reasons for that choice. At first, obviously, this choice makes harder the work for the estimation procedure because the real model is not exactly a mixture of two extremal distributions. At second, it is of particular interest this kind of data, because as pointed out in the Introduction and [10] , they appear in many applications.
With our simulated sample of 4000 maximum values, we first proposed for fitting (i.e., as H0 in our test) a simple Fréchet model with
(F1). In this example and all the further ones, we have used the adaptation of Kolmogorov-Smirnov test (KS) for this type of models provided by [11] . In this context, for F1, the KS statistic was 0.1928443, what means that
, implying a clear rejection.
Figure 2 shows the difference between the empirical distribution of our sample, and the theoretical distribution of the proposed model (F1). Clearly the distribution of the proposed model (red curve) is below the empirical distribution (black curve), reflecting the much more heavy-tailed nature of the proposed model with respect to the true model.
Therefore we turn our attention to the estimation of a mixture of two components. The exploratory estimation of
corresponds to the Figure 1 leading to
. Then, following the procedure of the previous section, we get the following results:
,
,
. We perform the KS-test proposing as H0 the mixture of two Fréchet of order 1 with the estimated parameters, leading to a KS statistic equal to 0.01380997, which implies
(Figure 3).
In conclusion, the simulated model fits the estimated two components mixture and does not fit an extremal distribution.
6. Mixture of Three Components - Simulation of Data
We will now turn our attention to the case of a mixture of
extremal distributions.
Again, the basis of the models that we will present here is provided in [10] , but we have to explain them for the sake of clarity.
Example III:
Let U be a random variable such that
,
,
,
,
,
. Let
, …an iid sequence of random variables on
.
,
,
, with
,
,
.
,
,
, with
,
,
.
,
,
, with
,
,
.
Set
. Thus,
Figure 2. The difference between the empirical distribution (ECDF), and the theoretical distribution of the proposed model F1.
Figure 3. The difference between the empirical distribution (ECDF), and the theoretical distribution of the proposed model M2.
Therefore if we assume that
,
,
,
,
,
, we have that
Example IV:
Now, we define
, for any
. We then have that
fulfills (H1), (H2) of section 0.2, with
random variables as in Example III.
Thus, if we assume
, and consider three independent sequences
,
,
,
, and for any i we set:
1) If
2) If
3) If
Then,
fulfills (H3), (H4) of section 0.2 and therefore, Theorem 1 of [10] applies and,
, with
7. A Method for Estimation of Parameters, Case k = 3
As pointed out in Remark 1, for the estimation of
we just use an exploratory method. Therefore, we will concentrate our attention in the estimation of weights and scale parameters.
Estimation of p, q, v1, v2, v3
Let us consider now
(15)
with
(16)
Following the ideas of section 0.4.2 we write down
(17)
Setting
and using Remark 2 we get
From the equations
we may express
as a function of
. Calling
to the first, second, and third (respectively) row of
with
replaced as a function of
, we have then the non-linear equation on
:
(18)
Solving this equation we get the estimates of
and therefore of
. As in the case of two components we will denote this estimations omitting its dependence of the sample size n. From the estimations of
, we finally get the estimations of
.
8. Testing the Estimated Model
Now as another concrete example of the method as well as a validation procedure, we will simulate a large sample with strong-dependence, where the common distribution of all the data is a mixture of three Fréchet distributions. In this case we will first estimate the parameters of a mixture of two Fréchet distributions. The estimated model will be tested, and rejection is expected. Further, we will use again our method but to estimate the parameters of a mixture of three Fréchet distributions, and in this case it is expected that the goodness of fit test does not reject the estimated model.
Therefore, here we consider as the true model a mixture of three Fréchet distributions of order 1, with parameters
,
,
,
,
, that is:
We computed 4000 maximums, each one coming from samples of size 1000 of the simulation procedure described in section 0.6, with parameters
,
,
,
,
. As mentioned before in section 0.6, by Theorem 1 of [10] , these maximum should follow a distribution very close to our choice of MF.
We first proposed for fitting (i.e., as H0 in our test) a mixture of two Fréchet distributions with
(M2). In this context, for M2, the estimated parameters were:
,
,
, and the corresponding KS statistic was 0.0326095, what means that
, implying a clear rejection.
Then, we proposed for fitting (as H0) a mixture of three Fréchet distributions with
(M3). In this context, for M3, the estimated parameters were:
,
,
,
,
and the KS statistic was 0.01936157 with a
, non-rejecting H0.
Figure 4. The difference between the empirical distribution (ECDF), and the theoretical distribution of the proposed model M2.
Figure 5. The difference between the empirical distribution (ECDF), and the theoretical distribution of the proposed model M3.
In Figure 4, we can appreciate a moderate deviation of the proposed M2 model with respect to the empirical distribution, but this discrepancy is systematic, in the sense that most of the time the proposed model is above the empirical distribution, what means that real data have heavier tails, what is coherent with a very small p-value.
In Figure 5, the proposed M3 model and the empirical distribution are almost equal, what is coherent with the no rejection decision of the test.
9. Discussion & Conclusions
Finite mixtures of extremal distributions appear in practice when dealing with environmental data (as well as in other fields) with a strong dependence structure. Therefore, one needs to be able to estimate the parameters of such a mixture under strong dependence, and test whether data fits to the estimated mixture.
In this paper, we successfully accomplish this task for the case of a mixture of two or three extremal distributions of the Fréchet type. The results obtained in simulated data show that this new estimation procedure developed here has an efficient performance.
Therefore, this work completes a line of research that includes [10] and [11] , what obviously make new questions and subjects of interest arise.
10. Further Work
As pointed out, the estimation of the order
should be improved in a similar way as the classical methods for the iid case [17] [18] . In addition, the asymptotic distribution of the weights and scale parameters, and their corresponding confidence regions can be more precisely exposed following the ideas mentioned in Remark 4.
In the case that methods based on moments (instead of quantiles) are applicable [19] , an alternative method must be developed and its performance compared to the estimation procedure of this paper should be studied.
Another direction of work is the study of mixtures of different types of extremal distributions, or mixtures of extremal distributions and non-extremal ones, or more general finite mixture models under strong dependence, as it has been done in the iid case [4] [5] [6] [8] [9] .
In a forthcoming paper, we deal with the problem of the estimation of the components in larger dimensions mixtures (k large) by using other techniques (i.e., Machine Learning) for faster estimations of k.
Acknowledgements
This work was partial supported by Proyecto CSIC-VUSP “Análisis de eventos climáticos extremos y su incidencia sobre la producción hortifrutícola en Salto” (Uruguay). Authors also thank to an anonymous reviewer for his valuable suggestions.