Peaks over Manifold (POM): A Novel Technique to Analyze Extreme Events over Surfaces ()

Gonzalo Perera^{}, Angel M. Segura^{}

CURE, Rocha, Uruguay.

**DOI: **10.4236/apm.2022.121004
PDF
HTML XML
148
Downloads
669
Views
Citations

CURE, Rocha, Uruguay.

We present a novel method to analyze extreme events of flows over manifolds called Peaks Over Manifold (POM). Here we show that under general and realistic hypotheses, the distribution of affectation measures converges to a Generalized Pareto Distribution (GPD). The method is applicable to floods, ice cover extent, extreme rainfall or marine heatwaves. We present an application to a synthetic data set on tide height and to real ice cover data in Antartica.

Share and Cite:

Perera, G. and Segura, A. (2022) Peaks over Manifold (POM): A Novel Technique to Analyze Extreme Events over Surfaces. *Advances in Pure Mathematics*, **12**, 48-62. doi: 10.4236/apm.2022.121004.

1. Introduction

Extreme events such as Tsunamis or floods are crucial for the impact they have in societies and ecosystems human developments as financial markets, industrial methods of production, telecommunication systems, and, at last but not least, on natural phenomena and its relationship with antropogenic factors [1] - [9]. The appropriate characterization of the impact of these events requires novel theoretical developments. Techniques such as the peak over threshold (POT) for univariate time series, which describe the distribution of the excess of a time series with respect to a given (*large*) threshold are well known from decades ago [10] [11] [12]. The change from univariate to n-dimensional data implies substantial additional complexity and in particular an unsolved problem is to characterize the distribution of excesses of the progressive development of mass, physical, biological or chemical properties on affected areas (tides, floods, ice cover, etc.) over surfaces or manifolds. We will call *flow* such kind of phenomena. Here, we derive a theoretical model on the distribution of extremes of flows over a surface or manifold. We also test the proposed model using synthetic data on tide heights and a real data set containing time series of ice cover over Antartica.

The paper is organized as follows. First we will expose the framework providing a precise model for flow data. We shall always consider the case of flows over surfaces, but with no particular effort, everything can be extended to Riemannian manifolds. Then, we will introduce two preliminary indexes of impact of a set of data flow, in Theorems 1 and 2, helping to arrive to the main result of this paper, Theorem 3. In our main result, we consider an impact measure, whose density may be chosen according to subjective valuation of risks. We then compute this impact of the flow expansion over the region with an (intrinsic) distance greater than *u* to the base line of the flow. In terms of tides, we measure affectation of areas at a distance greater than *u* meters from the tides base line. We then show that, for *u* tending to infinity. The distribution of this impact conditioned to the fact that the impact is positive, converges to a Generalized Pareto Distribution. We finally show the application of this result to simulated data on tide height and beach cover and to a real data set of ice cover in Antartica.

2. POM Derivation

Let *S* be a smooth (*differentiable*) surface with intrinsic distance *d* and intrinsic area measure
$\sigma $. Let us assume that *S* is simply connected and that *C* is a finite-length differentiable curve. We will say that *C* *splits* *S* in the following sense:

1) *C* defines two disjoint sets, namely
${C}_{int}$ and
${C}_{ext}$.

2)
${C}_{ext}$ is defined at each point *x* of *C* by
$n\left(x\right)$, the unitary vector on the tangent plane of *S* at *x* that is perpendicular to the tangent vector
$v\left(x\right)$ of the curve *C* at *x*,
$n\left(x\right)$ taking the anti-clockwise direction (Figure 1). Notice that
$S={C}_{int}\cup C\cup {C}_{ext}$ and recall that *C* is not included neither in
${C}_{int}$ nor
${C}_{ext}$.

Given two differentiable curves
$\stackrel{\u02dc}{C}$ and
$C\ast $ that splits *S*, we will say that they define a *coherent splitting* if any of the following conditions is matched (Figure 2):

Figure 1. Schematic representation of a surface *S* and a curve *C* that splits the space into two regions. The external region (*C _{ext}*) and an internal region

Figure 2. Schematic example of a coherent splitting of the space.

1) ${\stackrel{\u02dc}{C}}_{ext}\subset C{\ast}_{ext}\mathrm{\text{\hspace{0.17em}},\text{\hspace{0.17em}}}C{\ast}_{int}\subset {\stackrel{\u02dc}{C}}_{int}$

2) ${\stackrel{\u02dc}{C}}_{int}\subset C{\ast}_{int}\mathrm{\text{\hspace{0.17em}},\text{\hspace{0.17em}}}C{\ast}_{ext}\subset {\stackrel{\u02dc}{C}}_{ext}$

Given any $r>0$, we define

$C\left(r\right)=\left\{x\in {C}_{ext}\mathrm{:}d\left(x\mathrm{,}C\right)\le r\right\}$

We then have that

$\partial C\left(r\right)=C\cup {C}_{r}$

where ${C}_{r}=\left\{x\in {C}_{ext}:d\left(x,C\right)=r\right\}$ that we assume to be a differentiable curve. We will assume that $C\mathrm{,}{C}_{r}$ define a coherent splitting (Figure 3).

We will now consider a function *a* that quantifies, in terms of density, the impact of the flow at each point. More precisely, we will assume:

i) $a\left(x\right)\ge 0$ $\forall x\in S$

ii) $a\left(x\right)>0$ $\forall x\in {C}_{ext}$

iii) $a\left(x\right)=0$ $\forall x\in {C}_{int}$

iv) *a* is continuos in
${C}_{ext}\cup C$

v) ${\int}_{{C}_{ext}}}\mathrm{\text{\hspace{0.17em}}}a\left(x\right)\text{d}\sigma \left(x\right)=+\infty $

vi) ${\int}_{C\left(r\right)}}\mathrm{\text{\hspace{0.17em}}}a\left(x\right)\text{d}\sigma \left(x\right)$ is finite $\forall r>0$

Let us explain this in intuitive terms. *C* is the baseline for the flow.
${C}_{int}$ represents the “normal status”, where there is no danger or impact.
${C}_{ext}$ is the region exposed to risk by the flow. Given any measurable set *A* on *S*:

$\mu \left(A\right)={\displaystyle {\int}_{A}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}a\left(x\right)\text{d}\sigma (\; x\; )$

quantifies the impact produced by the flow over the region *A*.

Thus, condition (v) above means that if the flow covers ${C}_{ext}$ there is a total disaster where condition (vi) implies that if the flow covers $C\left(r\right),\mathrm{\text{\hspace{0.17em}}}r>0$ there is a serious event but not a total disaster.

Finally, it is straightforward to check that if

$I\left(r\right)=\mu \left({C}_{r}\right)={\displaystyle {\int}_{C\left(r\right)}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}a\left(x\right)\text{d}\sigma (\; x\; )$

then,
$I\left(r\right)$ is strictly increasing with *r* and that

$\underset{r\to \infty}{lim}I\left(r\right)={\displaystyle {\int}_{{C}_{ext}}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}a\left(x\right)\text{d}\sigma \left(x\right)=+\infty $

Figure 3. Coherent splitting of *C*, *C*(*r*).

Remark 1: The density is subjective as it should be, since different criteria may be applied to different contexts or by different agents. For instance, if *a* is constant and equals 1, the measure of affectation is nothing but the affected area. But if we are considering tides on a coast with human inhabitants, and *d*(*x*) denotes the population density at point *x*, if
$a=d$, the affectation measure is no longer homogeneous on the coast and it will give greater values as more human beings are affected by tides. Similarly, if we have a measure of the relevance for ecosystem conservation at each point *x*, *c*(*x*), due to relevant natural areas that may be seriously affected by marine water, taking a linear combination of *d* and *c* we will try to protect human populations and natural areas as well, with relative weights depending on the coefficients of the linear combination.

Consider now
${C}_{1}\mathrm{,}\cdots \mathrm{,}{C}_{n}$ *iid* differentiables curves on *S*, such that:

i) ${C}_{i}\in {C}_{ext}$ $\forall i$

ii) $C\mathrm{,}{C}_{i}$ define a coherent splitting $\forall i$

We will think
${C}_{1}\mathrm{,}\cdots \mathrm{,}{C}_{n}$ as *n* empirical realizations of the border of the flow.

For any $u>0$, let us define ${O}_{i}\left(u\right)={C}_{{i}_{int}}\cap C{\left(u\right)}_{ext}$ (Figure 4).

Based on the previous concepts and as a first auxiliary tool, we are now able to define a first impact index ${X}_{i}$ as:

${X}_{i}=\mu \left({\left({C}_{i}\right)}_{int}\right)={\displaystyle {\int}_{{\left({C}_{i}\right)}_{int}}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}a\left(x\right)\text{d}\sigma \left(x\right)$ (1)

Let us define

${D}_{i}=\mathrm{max}\left(d\left(x,C\right):x\in {C}_{i}\right)$ (2)

which is assumed to be a finite random variable.

In this first and auxiliary step, we will be interested in the asymptotic behaviour of the tail probability:

$P\left({X}_{i}\mathrm{\text{\hspace{0.17em}}}>\mathrm{\text{\hspace{0.17em}}}t+I\left(u\right)/{D}_{i}\mathrm{\text{\hspace{0.17em}}}>\mathrm{\text{\hspace{0.17em}}}u\right)\mathrm{\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}}\forall \mathrm{\text{\hspace{0.17em}}}t\ge 0$ (3)

The question we are interested here is: If the flow exceeds
${C}_{u}$, what is the probability that the impact exceed by *t* the impact of total covering of
$C\left(u\right)$ ?

Observe that:

$P\left({X}_{i}\mathrm{\text{\hspace{0.17em}}}>\mathrm{\text{\hspace{0.17em}}}t+I\left(u\right)/{D}_{i}\mathrm{\text{\hspace{0.17em}}}>\mathrm{\text{\hspace{0.17em}}}u\right)=P\left({X}_{i}\mathrm{\text{\hspace{0.17em}}}>\mathrm{\text{\hspace{0.17em}}}t+I\left(u\right),{D}_{i}>u\right)/P\left({D}_{i}>u\right)$ (4)

But if ${D}_{i}\le u$ it is easy to check that $\sigma \left({O}_{i}\left(u\right)\right)=0$, therefore ${X}_{i}\le I(\; u\; )$

Figure 4. Affected region exceeding the curve *C _{u}* defined by

and ${X}_{i}>t+I\left(u\right)$ does not hold. Thus we conclude that:

$\left\{{X}_{i}>t+I\left(u\right)\right\}\subset \left\{{D}_{i}>u\right\}$ (5)

and that

$P\left({X}_{i}>t+I\left(u\right),{D}_{i}>u\right)=P\left({X}_{i}>t+I\left(u\right)\right)$ (6)

taking into account that

$P\left({X}_{i}>t+I\left(u\right)\right)=P\left({X}_{i}>t+I\left(u\right)/{X}_{i}>I\left(u\right)\right)\times P\left({X}_{i}>I\left(u\right)\right)\mathrm{\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}}\forall t\ge 0$ (7)

and since as *u* goes to infinity,
$I\left(u\right)$ increases to infinity (recall that
$I\left(u\right)$ is deterministic), we conclude that if we assume:

(*H*_{1}):
${X}_{i}$ belongs to the domain of attraction of an extremal distribution, then the Pickands-Balkema-De Haan Theorem [10] [11] applies to
${X}_{i}$ and

$\underset{u\to \infty}{lim}P\left({X}_{i}\mathrm{>}t+I\left(u\right)/{X}_{i}\mathrm{>}I\left(u\right)\right)=Q\left(t\right)$ (8)

with *Q* the tail of a Generalized Pareto Distribution (GPD).

Coming back to (4), (5), (6), (7) and (8) we finally find out that if we assume:

(*H*_{2}): There exists a positive constant *M* such that:

$\underset{u\to \infty}{\mathrm{lim}}P\left({X}_{i}>I\left(u\right)\right)/P\left({D}_{i}>u\right)=M$ (9)

then,

$\underset{u\to \infty}{\mathrm{lim}}P\left({X}_{i}>t+I\left(u\right)/{D}_{i}>u\right)=MQ\left(t\right)$ (10)

were *Q *is the tail of a GPD. We then have the following first result:

Theorem 1:

Under *H*_{1} and *H*_{2}, as *u* goes to infinity,
$P\left({X}_{i}\le t+I\left(u\right)/{D}_{i}>u\right)$ approaches to
$1-MQ\left(t\right)$ where *Q* is the tail of a (GPD).
$\diamond $

3. Example 1

Let
$S=\left\{\left(x\mathrm{,}y\right)\in {\mathbb{R}}^{2}\mathrm{;\text{\hspace{0.17em}}0}\le y\le 1\right\}$ and let *C* be a curve of length *L* splitting *S*. Take
$\sigma $ the Lebesgue measure in the plane and

$a\left(x\right)=(\begin{array}{l}\mathrm{1\text{\hspace{0.17em}}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\forall x\mathrm{\text{\hspace{0.17em}}}\in \mathrm{\text{\hspace{0.17em}}}{C}_{ext}\mathrm{\text{\hspace{0.17em}}}\cup \mathrm{\text{\hspace{0.17em}}}C\\ \mathrm{0\text{\hspace{0.17em}}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\forall x\mathrm{\text{\hspace{0.17em}}}\in \mathrm{\text{\hspace{0.17em}}}{C}_{int}\end{array}$

Then $I\left(u\right)=\sigma \left(C\left(u\right)\right)=Lu$. If ${\alpha}_{i}\left(t\right)\mathrm{,\text{\hspace{0.17em}}}t\in \left[\mathrm{0,1}\right]$ is a parametrization of ${C}_{i}$,

${X}_{i}={\displaystyle {\int}_{{\left({C}_{i}\right)}_{int}}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}1\text{d}x\text{d}y={\displaystyle {\int}_{0}^{1}}\left({\displaystyle {\int}_{0}^{{\alpha}_{i}\left(t\right)}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}\text{d}y\right)\text{d}t={\displaystyle {\int}_{0}^{1}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{\alpha}_{i}\left(t\right)\text{d}t$ (11)

and

${D}_{i}={\mathrm{max}}_{t\in 0,1}{\alpha}_{i}(\; t\; )$

then *H*_{1} reads:
${X}_{i}={\displaystyle {\int}_{0}^{1}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{\alpha}_{i}\left(t\right)\text{d}t$ belongs to an extremal domain of attraction. On the other hand, H_{2} becames:
${\mathrm{lim}}_{u\to \infty}\frac{P\left({X}_{i}>Lu\right)}{P\left({D}_{i}>u\right)}=M$ where *M* is a positive constant.

Since
${X}_{i}\le {D}_{i}$, then
$P\left({X}_{i}>Lu\right)\le P\left({D}_{i}\ge Lu\right)$. If we assume that
${\alpha}_{i}$ has derivative bounded by *K*,
$\left|{\alpha}_{i}\left(t\right)-{D}_{i}\right|\le Kt$ and
${X}_{i}\ge {D}_{i}-K{\displaystyle {\int}_{0}^{1}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}t\text{d}t={D}_{i}-\frac{K}{2}$ and hence

$P\left({X}_{i}>Lu\right)\ge P\left({D}_{i}>Lu+\frac{K}{2}\right)$ (12)

Therefore, *H*_{2} is satisfied if for any positive constants
${\u2102}_{1}\mathrm{,}{\u2102}_{2}$, there exists a positive constant *M* such that the following holds true:

$\underset{u\to \infty}{\mathrm{lim}}\frac{P\left({D}_{i}>{\u2102}_{1}u\right)}{P\left({D}_{i}>u\right)}=\frac{P\left({D}_{i}>{\u2102}_{1}u+{\u2102}_{2}\right)}{P\left({D}_{i}>u\right)}=M>0$ (13)

For instance, if for *t* tending to infinity
$P\left({D}_{i}>t\right)\approx \frac{1}{{t}^{\beta}},\beta >1$.

$\underset{u\to \infty}{\mathrm{lim}}\frac{P\left({D}_{i}>{\u2102}_{1}u\right)}{P\left({D}_{i}>u\right)}=\underset{u\to \infty}{\mathrm{lim}}\frac{{u}^{\beta}}{{\u2102}_{1}^{\beta}{u}^{\beta}}=\frac{1}{{\u2102}_{1}^{\beta}}$ (14)

and

$\underset{u\to \infty}{\mathrm{lim}}\frac{P\left({D}_{i}>{\u2102}_{1}u+{\u2102}_{2}\right)}{P\left({D}_{i}>u\right)}=\underset{u\to \infty}{\mathrm{lim}}\frac{1}{{\left(\frac{{\u2102}_{1}u+{\u2102}_{2}}{u}\right)}^{\beta}}=\frac{1}{{\u2102}_{1}^{\beta}}$ (15)

and *H*_{2} is fullfiled.
$\diamond $

We consider now our main index of impact $\mu \left({O}_{i}\left(u\right)\right)$. But we will yet provide an intermediate result concerning the asymptotic behaviour of

$P\left(\mu \left({O}_{i}\left(u\right)\right)>t/{D}_{i}>u\right)\mathrm{\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}}\forall t>0$ (16)

for *u* tending to infinity.

Observe that $\left\{{D}_{i}>u\right\}=\left\{\sigma \left({O}_{i}\left(u\right)\right)>0\right\}=\left\{\mu \left({O}_{i}\left(u\right)\right)>0\right\}$, and thus, (16) may be rewritten as

$P\left(\mu \left({O}_{i}\left(u\right)\right)>t/\mu \left({O}_{i}\left(u\right)\right)>0\right)\mathrm{\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}}\forall t>0$ (17)

Before going to (16), we will discuss the asymptotic behaviour, for *u* tending to infinity, of:

$P\left(\mu \left({O}_{i}\left(u\right)\right)>t/{d}_{i}>u\right)\mathrm{\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}}\forall t>0$ (18)

where

${d}_{i}=\mathrm{min}\left\{d\left(x,C\right):x\in {C}_{i}\right\}\le {D}_{i}$ (19)

We will assume:

*H*_{3}:
${d}_{i}>0$ *a.s. *

If ${d}_{i}>u$ it is then clear that

$C\left(u\right)\subset C\left({d}_{i}\right)\subset {\left({C}_{i}\right)}_{int}\cap C{\left(u\right)}_{ext}\subset C\left({D}_{i}\right)\cap C{\left(u\right)}_{ext}$ (20)

Taking into account (19) and (20) and the basic properties of measures we get,

$\begin{array}{l}\mu \left(C\left({d}_{i}\right)\right)-\mu \left(C\left(u\right)\right)\\ =\mu \left(C\left({d}_{i}\right)\cap C{\left(u\right)}_{ext}\right)\le \mu \left({O}_{i}\left(u\right)\right)\\ \le \mu \left(C\left({D}_{i}\right)\cap C{\left(u\right)}^{c}\right)=\mu \left(C\left({D}_{i}\right)\right)-\mu \left(C\left(u\right)\right)\end{array}$ (21)

Therefore,

$\begin{array}{l}P\left(\mu \left(C\left({d}_{i}\right)\right)-\mu \left(C\left(u\right)\right)>t/{d}_{i}>u\right)\\ \le P\left(\mu \left({O}_{i}\left(u\right)\right)>t/{d}_{i}>u\right)\le P\left(\mu \left(C\left({D}_{i}\right)\right)-\mu \left(C\left(u\right)\right)>\right)t/{d}_{i}>u\end{array}$ (22)

Hence

$\begin{array}{l}P\left(\mu \left(C\left({d}_{i}\right)\right)>t+\mu \left(C\left(u\right)\right)/{d}_{i}>u\right)\\ \le P\left(\mu \left({O}_{i}\left(u\right)\right)>t/{d}_{i}>u\right)\le P\left(\mu \left(C\left({D}_{i}\right)\right)>t+\mu \left(C\left(u\right)\right)/{d}_{i}>u\right)\end{array}$ (23)

Lets us focus first on the left hand term of (23). Taking into account that:

$\left\{{d}_{i}>u\right\}=\left\{C\left(u\right)\subset C\left({d}_{i}\right)\right\}=\left\{\mu \left(C\left(u\right)\right)<\mu \left(C\left({d}_{i}\right)\right)\right\},$ (24)

that term may be written as

$\begin{array}{l}P\left(\mu \left(C\left({d}_{i}\right)\right)>t+\mu \left(C\left(u\right)\right)/{d}_{i}>u\right)\\ =P\left(\mu \left(C\left({d}_{i}\right)\right)>t+\mu \left(C\left(u\right)\right)\mathrm{\text{\hspace{0.17em}}}/\mu \left(C\left({d}_{i}\right)\right)>\mu \left(C\left(u\right)\right)\right)\end{array}$ (25)

Therefore, if we assume:

(*H*_{4})
$\mu \left(C\left({d}_{i}\right)\right)$ belongs to an extremal domain of attraction, then by the Theorem of Picklands-Balkema-DeHaan, since
${\mathrm{lim}}_{u\to +\infty}\mu \left(C\left(u\right)\right)=+\infty $, we have that

$\underset{u\to \infty}{\mathrm{lim}}P\left(\mu \left(C\left({d}_{i}\right)\right)>t+\mu \left(C\left(u\right)\right)/\mu \left(C\left({d}_{i}\right)\right)>\mu \left(C\left(u\right)\right)\right)=Q\left(t\right),$ (26)

where $Q\left(t\right)$ is the tail of a GPD. Thus the left-hand term of (23) satisfies that:

$\underset{u\to \infty}{\mathrm{lim}}P\left(\mu \left(C\left({d}_{i}\right)\right)>t+\mu \left(C\left(u\right)\right)/{d}_{i}>u\right)=Q\left(t\right)$ (27)

with $Q\left(t\right)$ the tail of a GPD.

Let us now consider the right-hand term of (23):

$\begin{array}{l}P\left(\mu \left(C\left({D}_{i}\right)\right)>t+C\left(u\right)/{d}_{i}>u\right)\\ =\frac{P\left(\mu \left(C\left({D}_{i}\right)\right)>t+C\left(u\right),{d}_{i}>u\right)}{P\left({d}_{i}>u\right)}\\ =\frac{P\left(\mu \left(C\left({D}_{i}\right)\right)>t+C\left(u\right),{D}_{i}>u\right)}{P\left({D}_{i}>u\right)}\frac{P\left({D}_{i}>u\right)}{P\left({d}_{i}>u\right)}\\ \text{\hspace{0.17em}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}-\frac{P\left(\mu \left(C\left({D}_{i}\right)\right)>t+C\left(u\right),\mathrm{\text{\hspace{0.17em}}}{D}_{i}>u\ge {d}_{i}\right)}{P\left({d}_{i}>u\right)}\\ =\left(I\right)-\left(II\right)\end{array}$ (28)

Assume that:

(*H*_{5})
${\mathrm{lim}}_{u\to +\infty}P\left({D}_{i}>u\right)/P\left({d}_{i}>u\right)=1$

This implies that:

$\mu \left(C\left({D}_{i}\right)\right)\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{and}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mu \left(C\left({d}_{i}\right)\right)$ (29)

belong to the same extremal domain of attraction.

Therefore, by (28), *H*_{5}, (29) and Picklands-Balkeman-De Haan Theorem we get:

$\begin{array}{l}\underset{u\to \infty}{lim}\left(I\right)=\underset{u\to \infty}{lim}P\left(\mu \left(C\left({D}_{i}\right)\right)>t+\mu \left(C\left(u\right)\right)/\mu \left(C\left({D}_{i}\right)\right)>\mu \left(C\left(u\right)\right)\right)\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\times \mathrm{}\underset{u\to \infty}{\mathrm{lim}}\frac{P\left({D}_{i}>u\right)}{P\left({d}_{i}>u\right)}=Q\left(t\right)\times 1=Q\left(t\right)\end{array}$ (30)

where *Q* is the same GPD as in (27).

On the other hand,

$\begin{array}{c}0\le \left(II\right)\le \frac{P\left({D}_{i}>u>{d}_{i}\right)}{P\left({d}_{i}>u\right)}=\frac{P\left({D}_{i}>u\right)}{P\left({d}_{i}>u\right)}-\frac{P\left({D}_{i}>u,\mathrm{\text{\hspace{0.17em}}}{d}_{i}>u\right)}{P\left({d}_{i}>u\right)}\\ =\frac{P\left({D}_{i}>u\right)}{P\left({d}_{i}>u\right)}-\frac{P\left({d}_{i}>u\right)}{P\left({d}_{i}>u\right)}=\frac{P\left({D}_{i}>u\right)}{P\left({d}_{i}>u\right)}-1\end{array}$ (31)

and it turns out from *H*_{5} that

$\underset{u\to \infty}{\mathrm{lim}}\left(II\right)=0$ (32)

therefore, by (28), (29) and (32), we have

$\underset{u\to \infty}{lim}P\left(\mu \left(C\left({D}_{i}\right)\right)\mathrm{>}t+\mu \left(C\left(u\right)\right)/{d}_{i}\mathrm{>}u\right)=Q\left(t\right)$ (33)

Then, adding (33) to (23) and result (27), we obtain the following result.

Theorem 2

Under *H*_{3}, *H*_{4} and *H*_{5},

$\underset{u\to \infty}{\mathrm{lim}}P\left(\mu \left({O}_{i}\left(u\right)\right)>t/{d}_{i}>u\right)=Q\left(t\right)$ (34)

with *Q* the tail of a GPD.
$\diamond $

Now let us come back to (16). We have that:

$\begin{array}{l}P\left(\mu \left({O}_{i}\left(u\right)\right)>t/\mu \left({O}_{i}\left(u\right)\right)>0\right)\\ =P\left(\mu \left({O}_{i}\left(u\right)\right)>t/{D}_{i}>u\right)=\frac{P\left(\mu \left({O}_{i}\left(u\right)\right)>t,{D}_{i}>u\right)}{P\left({D}_{i}>u\right)}\\ =\frac{P\left(\mu \left({O}_{i}\left(u\right)\right)>t,{D}_{i}>u,\mathrm{\text{\hspace{0.17em}}}{d}_{i}>u\right)}{P\left({d}_{i}>u\right)}\frac{P\left({d}_{i}>u\right)}{P\left({D}_{i}>u\right)}\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}+\frac{P\left(\mu \left({O}_{i}\left(u\right)\right)>t,{D}_{i}>u\ge {d}_{i}\right)}{P\left({d}_{i}>u\right)}\frac{P\left({d}_{i}>u\right)}{P\left({D}_{i}>u\right)}\\ =\left(III\right)+\left(IV\right)\end{array}$ (35)

But,

$\begin{array}{c}\left(III\right)=\frac{P\left(\mu \left({O}_{i}\left(u\right)\right)>t,{D}_{i}>u,\mathrm{\text{\hspace{0.17em}}}{d}_{i}>u\right)}{P\left({d}_{i}>u\right)}\frac{P\left({d}_{i}>u\right)}{P\left({D}_{i}>u\right)}\\ =P\left(\mu \left({O}_{i}\left(u\right)\right)>t/{d}_{i}>u\right)\frac{P\left({d}_{i}>u\right)}{P\left({D}_{i}>u\right)}\end{array}$ (36)

Then using Theorem 2, and *H*_{5} we get

$\underset{u\to \infty}{\mathrm{lim}}\left(III\right)=Q\left(t\right)\times 1=Q\left(t\right)$ (37)

On the other hand,

$0\le \left(IV\right)\le \frac{P\left({D}_{i}>u\ge {d}_{i}\right)}{P\left({d}_{i}>u\right)}\frac{P\left({d}_{i}>u\right)}{P\left({D}_{i}>u\right)}$ (38)

By (30) and (*H*_{5}), the right-hand term on (38), as *u* tends to infinity, tends to
$0\times 1=0$ and therefore:

$\underset{u\to \infty}{\mathrm{lim}}\left(IV\right)=0$ (39)

and we then have the main result of this paper:

Theorem 3

Under *H*_{3}, *H*_{4} and *H*_{5}

$\underset{u\to \infty}{\mathrm{lim}}P\left(\mu \left({O}_{i}\left(u\right)\right)>t/\mu \left({O}_{i}\left(u\right)\right)>0\right)=Q\left(t\right)\mathrm{\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}}\forall t>0$ (40)

with $Q\left(t\right)$ the tail of a GPD. $\diamond $

Remark 2: This result is a clear extension of the POT method, since it gives the conditional distribution of the measure of affectation given that the area ${O}_{i}\left(u\right)$ is reached by the flow. We will derive a statistical application of this Theorem that we defined as Peak Over Manifold (POM).

3.1. Example 1, Synthetic Data

In the first example, we simulate a data set on tide height along 100 km of beach monitored for three months under different tidal conditions. Then we evaluate the probability of the sea exceeding a fixed threshold distance (*u*) over the sand. The threshold was set as the height exceeding 20% the maximum average tide (
$u=3.5\text{\hspace{0.17em}}\text{m}$ ). We estimated the tidal height using the function *waves* (https://beckmw.wordpress.com/2017/04/12/predicting-tides-in-r/) that models a sinusoidal function with a defined period (*T* in hours) and amplitude (A in meters). We chose to combine functions with three different tidal configurations,
$T=\left[24,48,12\right]$ and
$A=\left[1,0.5,2\right]$ respectively. This raises a cyclic pattern (Figure 5) of average tide (
$\stackrel{\xaf}{\text{tide}}$ ). For each time, we simulated distance from shoreline over the beach for each of the 100 kilometer along the beach (
$s=1,\cdots ,100$ in km). The distance from the shore (*d* in m) was simulated by combining a sinusoidal function with variable spatial amplitude (
${A}_{s}$ in meters) and fixed spatial period (
${T}_{s}=20$ in km) with *iid* withe noise (
$e\approx \text{Normal}\left(\mathrm{0,0.5}\right)$ ) as follows:

$d\left(s\right)=\stackrel{\xaf}{\text{tide}}\mathrm{\text{\hspace{0.17em}}}+\mathrm{\text{\hspace{0.17em}}}{A}_{s}\mathrm{sin}\left(\frac{2\pi}{{T}_{s}}s\right)+e$

Figure 5. Temporal trend in average tidal height in a week. Red dots indicate examples of points were detailed spatial patterns in tidal height is shown. I) An example where tide did not exceed the threshold (*u*) and II) an example were tide exceeds the predefined threshold. The dashed area is the observed excess (
${O}_{i}\left(u\right)$ ).

We registered dates when, at any point, the distance from the shore exceeded threshold (
$d>u$ ). For those dates, the area exceeding the threshold (
${O}_{i}\left(u\right)$ ) was calculated (dashed vertical lines in Figure 5, II). Excess (
$d>u$ ) were registered in 16% (*N*= 685) of the sampled events. We assumed affectation was independent of the point of the beach reached by high tide, and thus the affected area was used as indicator of effect. We fitted by maximum likelihood estimation a GPD using the *fitGPD* function from the {*POT*} package in R to affected areas larger than 1000 m^{2}. The distribution of areas exceeding the threshold (
${O}_{i}\left(u\right)$ ) showed a GPD as expected on theoretical grounds (Figure 6).

Scale parameter (average [95% CI]; $\sigma =0.007\left[0.006,0.009\right]$ ) and a shape parameter (average [95% CI]; $\xi =1\times {17}^{-13}\left[-0.14,0.11\right]$ ) suggested an Exponential distribution (Figure 6). Notice the confidence interval of the shape parameter overlaping zero is usefull to detect exponential distribution.

Figure 6. Simulated area exceeding a threshold ( ${O}_{i}\left(u\right)$ ) and fitted Generalized Pareto Distribution (GPD) fitted. The values of the fitted parameters are also shown.

3.2. Example 2: Ice Cover in Antartica

We explored ice cover data in Antartica using reanalysis data from the Environmental Modeling Center of the National Centers for Environmental Prediction (NCEP) from the National Oceanic and Atmospheric Administration (NOAA) using the second version [13]. Data on monthly sea ice concentration (Figure 7) was downloaded from http://iridl.ldeo.columbia.edu/SOURCES/.NOAA/.NCEP/.EMC/.CMB/.GLOBAL/.Reyn_SmithOIv2/.monthly/T/722.5/VALUE/X/150.0/360.0/RANGEEDGES/Y/-90/-60/RANGEEDGES/.sea_ice/index.html. Sea ice concentration data were obtained from −89.5 to −60.5 degrees South of latitude and −149.5 to 0.5 degress West in Longitude for cell of 1 × 1 degree. Monthly data consisted in 461 months starting in October 1981 to Febraury 2020. Sea ice concetration was expressed as percentage (0 - 100%) of cell area.

We calculated the averge monthly ice cover for the whole period (*i.e.* climatology, Figure 8) and defined a threshold (
$u<10\%$ ). The cells with less than 10% of ice cover were selected to evaluate the dynamic of ice cover in the extreme regions. Then, for each month we explored whether the fraction of ice cover exceeded the climatological threshold (*u*) previously estimated. For the months in which the percentage of ice cover exceeded 10%, we recorded the area of ice cover exceeding threshold. The area of 1 × 1 change with latitude. Thus we used the function *area* from {raster} package to convert from percentage of ice cover to area of the ice cover (in km^{2}) for each cell. Despite this function is not

Figure 7. Ice cover (in %) in the studied region of the Antartica.

precise for high latitudes, it represent a first order approximation. Moreover, most of the points were located mostly in the lower latitudinal regions and thus this does not introduce a relevant bias.

For each month were some ice cover exceeds climatological ice cover, we estimated the area of the ice cover exceeding the threshold and stored in a separate vector. We fitted by maximum likelihood estimation a GPD to the area exceeding threshold using the *fitgpd* function from the {*POT*} package in R to areas larger than 1000 km^{2}. We found a good fit to a GPD (Figure 9) with a positive shape parameter (average [95% CI]
$\xi =0.158\left[0.07,0.26\right]$ ).

4. Discussion & Conclusions

This work introduces a novel technique to evaluate extreme processes over a surface or manifold. The reasoning expands previous univariate approaches to the topic (POT) by demonstrating under realistic hypothesis that the resultant distribution of the area exceeded by a given phenomena follows a Generalized Pareto Distribution (GPD). The applicability of this novel approach is demonstrated under simulated data on tidal height and ice cover in a relevant region in the Antartica. Results confirm the application of a GPD to explore problems of extreme over surfaces. This opens a new avenue in the analisys of extreme flows which can be applied in ice cover in the poles [14], to areas covered by flood, the incidence of heatwaves among other examples. Under resonable conditions, the distribution converges to a GDP, but exploring if other distribution can emerge is interesting and also to compare the method using

Figure 8. Montlhy climatological values of ice cover extent in Antartica.

Figure 9. Area covered by ice in the extreme range of ice cover. Superimposed is the Generalized Pareto Distribution fitted to data. The value of the shape ( $\sigma $ ) and scale parameters ( $\xi $ ) are shown.

more data-sets, as has been done in recent articles [15]. In real applications, the evaluation over surfaces (2D) will be common, but the theorem allows to expand the results to a more general N-dimensional manifold. This avenue has to be further explored.

Acknowledgements

Sincere thanks to the participants of the “Jornadas de Estadística Aplicada 2019” in La Paloma-Uruguay, for suggestions on a previous version of this results. This is a contribution of the project entitled “Mecanismos determinantes de la estructura comunitaria y la variabilidad espacio-temporal en comunidades acuáticas” founded by ANII-Uruguay (FCE_3_2020_1_162710).

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

[1] |
Batt, T.D., Carpenter, S.R. and Ives, A.R. (2017) Extreme Events in Lake Ecosystem Time Series. Limnology and Oceanography Letters, 2, 63-69. https://doi.org/10.1002/lol2.10037 |

[2] |
Bellanger, L. and Perera, G. (2003) Compound Poisson Limit Theorems for High-Level Exceedances of Some Non-Stationary Processes. Bernoulli, 9, 497-515. https://doi.org/10.3150/bj/1065444815 |

[3] | Bousquet, N. and Bernardara, P. (2021) Extreme Value Theory with Applications to Natural Hazards: From Statistical Theory to Industrial Practice. Springer, Berlin. |

[4] |
Hannesdóttir, á., Kelly, M.C. and Dimitrov, N.K. (2019) Extreme Wind Fluctuations: Joint Statistics, Extreme Turbulence, and Impact on Wind Turbine Loads. Wind Energy Science, 4, 325-342. https://doi.org/10.5194/wes-4-325-2019 |

[5] |
Jiménez, E., Cabañas, B. and Lefebvre, G. (2015) Environment, Energy and Climate Change I: Environmental Chemistry of Pollutants and Wastes. Springer, Berlin. https://doi.org/10.1007/978-3-319-12907-5 |

[6] |
Pechiar, J., Perera, G. and Simon, M. (2002) Effective Bandwidth Estimation and Testing for Markov Sources. Wind Energy Science, 48, 157-175. https://doi.org/10.1016/S0166-5316(02)00035-4 |

[7] |
Katz, R.W., Brush, G.S. and Parlange, G.S. (2005) Statistics of Extremes: Modeling Ecological Disturbances. Ecology, 86, 1124-1134. https://doi.org/10.1890/04-0606 |

[8] | Reiss, R.-D. and Thomas, M. (2007) Statistical Analysis of Extreme Values: With Applications to Insurance, Finance, Hydrology and Other Fields. Birkhäuser, Basel. |

[9] | Embrechts, P., Klüppelberg, C. and Mikosch, T. (2014) Modelling Extremal Events: For Insurance and Finance. Springer, Berlin. |

[10] |
Balkema, A.A. and de Haan, L. (1974) Residual Life Time at Great Age. The Annals of Probability, 5, 13-15. https://doi.org/10.1214/aop/1176996548 |

[11] |
Pickands, J. (1975) Statistical Inference Using Extreme Order Statistics. The Annals of Statistics, 3, 119-131. https://doi.org/10.1214/aos/1176343003 |

[12] | Far, S.S. and Wahab, A.K.A. (2016) Evaluation of Peaks-over-Threshold Method. Ocean Science Discussions, 47, 1-25. |

[13] |
Reynolds, R.W., Rayner, N.A., Smith, T.M., Stokes, D.C. and Wang, W. (2002) An Improved in Situ and Satellite SST Analysis for Climate. Journal of Climate, 15, 1609-1625. https://doi.org/10.1175/1520-0442(2002)015<1609:AIISAS>2.0.CO;2 |

[14] | Segura, A., Crisci, A. and Perera, G. (in prep.) A Statistical Approach to Quantify the Impact of Melting Areas in Polar Regions under Different Greenhouse Emission Scenarios. |

[15] |
Arshad, M., Iqbal, M. and Ahmad, M. (2018) Transmuted Exponentiated Moment Pareto Distribution. Open Journal of Statistics, 8, 939-961. https://doi.org/10.4236/ojs.2018.86063 |

Journals Menu

Contact us

+1 323-425-8868 | |

customer@scirp.org | |

+86 18163351462(WhatsApp) | |

1655362766 | |

Paper Publishing WeChat |

Copyright © 2023 by authors and Scientific Research Publishing Inc.

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.