Analysis of Observed and Modelled Near-Surface Wind Extremes over the Sub-Arctic Northeast Pacific

Wind speed extremes in the sub-Arctic realm of the North-East Pacific region were investigated through extreme value analysis of wind speed obtained from wind simulations of the COSMO-CLM (Consortium for Small-scale Modelling, climate version) mesoscale model, as well as using observed data. The analysis showed that the set of wind speed extremes obtained from observations is a mixture of two different subsets each neatly described by the Weibull distribution. Using special metaphoric terminology, they are labelled as “Black Swans” and “Dragons”. The “Dragons” are responsible for strongest extremes. It has been shown that both reanalysis and GCM (general circulation model) data have no “Dragons”. This means that such models underestimate wind speed maxima, and the important circulation process generating the anomalies is not simulated. The COSMO-CLM data have both “Black Swans” and “Dragons”. This evidence provides a clue that an atmospheric model with a detailed spatial resolution (we used in this work the data from domain with 13.2 km spatial resolution) does reproduce the special mechanism responsible for the generation of the largest wind speed extremes. However, a more thorough analysis shows that the differences in the parameters of the cumulative distribution functions are still significant. The ratio between the modelled Dragons and Black Swans can reach up to only 10%. It is much less than 30%, which was the level established for observations.

Extreme value theory supposes that the data selected for analysis have to be independent and identically distributed.However, the independence is very often not the case and extreme value theory could be extended for dependent time series [1] [2].As far as the identical distribution is concerned, our analysis showed that the set of wind speed extremes obtained from observations from the European and Siberian Arctic area is a mixture of two different subsets each neatly described by the Weibull distribution [7].Representatives of each population were marked as "Black Swans" (hereafter BSs) and "Dragons" (hereafter Ds) based on the terminology introduced by N. N. Taleb [8] and D. Sornette [9].
Formally, the Ds were introduced for description of outliers located far beyond any extrapolation of a power-law distribution (see [9] and [10]), and from this point of view the proposal that Ds follow a power-law distribution (e.g. Weibull), makes the applying of this metaphor not correct.Moreover, D. Sornette introduces the concept of Ds (as alternative to BSs) from the point of view of their improved predictability.We do not follow him in such details using the entered terms but only to mark the differences of samples belonging to various groups.
Statistical analysis that can be applied to extreme wind speeds is important not only due to their practical value but also because they permit in some cases the detection of the origin of extreme winds, as identical statistical distributions sometimes suggest common originating mechanisms.We proposed that mentioned two groups of extreme wind events result from different circulation mechanisms.This hypothesis motivated our interest in detecting, analysing, understanding and modelling such different extremes and their nature.From this point of view, it is important to understand an extension of the ability of atmospheric models to reproduce wind speed extremes.
In a previous paper [7] we concluded that the collection of wind extremes modelled by a general circulation model (GCM) consisted only of samples conforming to the BSs group.The same conclusion was reached after a reanalysing the datasets, confirming that they do not contain observed exceptional outliers.Therefore, the output of the coarse resolution model cannot be directly applied for tasks in which extreme wind assessments are needed.
The next step of our analysis is to investigate how well a mesoscale atmospheric model (with a fine spatial resolution) simulates the aforementioned peculiarities of wind extremes.We use the COSMO model and its specific climate version, the COSMO-CLM (Consortium for Small-scale Modelling (http://www.cosmo-model.org),climate version) model, which was developed by the CLM-Community (http://clm-community.eu) in the framework of the Consortium for Small-scale Modelling.We analysed wind velocity archives obtained from a 30-year simulation of the COSMO-CLM model over the North-East Pacific Ocean region [11].Same approach to comparison of mesoscale mod-el-simulated and observed variables was realized to analysis tropical cyclone maximum wind speeds [12].
In the next section, we describe the study area and briefly summarize the COSMO-CLM model together with the production of the datasets.Subsequently, we describe the modelled near-surface wind velocities and compare the fits to the observations.The following sections describe the evidence for a Weibull distribution in the station observation data and as well as the model data.The last section concludes the paper.

Study Area, Dataset and the Problem of Statistical Independence in the COSMO-CLM Model Data Set
The study was performed in the sub-Arctic realm of the North-East Pacific Ocean region from the Kamchatka Peninsula to Hokkaido Island, including both the coastal area and inland Asian territory (Figure 1).Wind speeds of more than 30 m•s −1 were observed during the cold season during the passage of synoptic storms.We have taken into account that in the sub-Arctic region of the Far East, summer includes not only the traditional months of June, July and August but also September due to the summer monsoon circulation.The winter season covers the whole interval from November to March.
A dataset of 10-minute mean wind speed data (covering the period 1985-2014) was used from all stations on Sakhalin Island, which is located in a central area of the domain (Figure 1).As was mentioned, the samples selected for examination have to be independent.We calculate a deadtime between consequent wind fluctuations using for this aim the autocorrelation function.It was assessed as 72 hours.Similar values for such aim were calculated previously [7] [13] [14] [15].
As a source of model data, we used a wind simulation dataset from the COSMO-CLM regional model for the same period of 1985-2014 [11].Note that the special data assimilation system adopted in COSMO-CLM was not used.Therefore, the station data can be used to assess the consistency between simulation products and observations.Dynamic downscaling was performed during the COSMO-CLM simulation through three domains in a one-direction nesting scheme.The outer domain (13.2 km grid spacing) covers Sakhalin Island, the Sea of Okhotsk, Kamchatka Peninsula and the surrounding regions of the Pacific Ocean and Asian continent.The model outputs on the outer domain provide meteorological fields at the lateral boundaries of the intermediate domain (6.6 km grid spacing), and the latter in turn provides its outputs at the lateral boundaries of the inner domain (2.2 km grid spacing) (Figure 1).Note that we do not consider here the matter of results of numerical experiments for the inner domain.
The COSMO-CLM model is based on fully compressible fluid non-hydrostatic equations (Reynolds equations) obtained on a staggered Arakawa grid-C [16].We have used 40 model levels in the vertical direction for the outer and intermediate domains and 50 model levels for the inner domain.COSMO-CLM uses We used the Tiedtke scheme for the convection parametrization [19], the two-stream Ritter and Geleyn radiative transfer model [20], and the Mellor-Yamada 2.5-level planetary boundary layer and complementary surface layer schemes.More detailed documentation can be found at (http://www.cosmo-model.org/content/model/documentation/core/default.htm).
The initial and lateral boundary conditions for the outer domain in COSMO-CLM were provided by ERA-Interim reanalysis [21] for every 6 hours during 1985-2014.To avoid model drift in atmospheric circulation, the spectral nudging technique [22] [23] was implemented on temperature, geopotential height and wind components above 850 hPa on the outer domain.We used wavenumber 11 to impact only the large-scale synoptic conditions (wavelengths > 1000 km).
A computer experiment was conducted in portions typically lasting several months.This duration was used due to limitations of computing resources and data storage volumes, as well as technical risks of "crashing" the computer experiment at longer time intervals.

Comparison of Modelled Near-Surface Wind Velocities for Different Grid Spacing and Observations
The quality of the COSMO-CLM data should be assessed before using it for research purposes.In presenting the results of our runs, we compared the seasonal totals of temperature, wind velocity and wind gusts with those of the observed variables.Let us consider for example the 2014 results grouped by four seasons as follows: January-March, April-June, July-September and October-December.
The model evaluation procedure (using the Sakhalin Island stations data) was performed by taking into account the high resolution of the model data.For comparison (see Table 1), we used the grid point of the model mesh among the nearest surroundings, which had the minimal root-mean-square error.
We observed good agreement for all seasons examined.For example, the bias of the temperature was no more than ~0.5˚C, but the RMS error was 2˚C -3˚C.
In general, the COSMO-CLM model reproduces the spatial distribution, seasonal and synoptic variability of temperature and wind velocity adequately.
Unexpectedly, good agreement of the COSMO-CLM wind gust data with the observed wind gust data was observed despite the simple calculation technique [24].The comparison of different spatial resolutions shows that there is greater agreement for data that adhere to the domain with 6.6-km grid spacing.However, this advantage is insignificant and does not matter in practice.Increasing the horizontal resolution from 13.2 km to 6.6 km gives practically the same results, as well as for extremes.Considering this conclusion, we will use the data from the domain with 13.2 km spacing in the further analysis.We use this data because they occupy a smaller volume and it is much easier to work with them.

The Weibull Distributions in Station Observation Data
As was mentioned above, the Weibull distribution is good model for extreme wind speed ( ) U distribution.In Figure 2, we plot several cumulative distribu- tion functions (cdfs).The images are the "Weibull Plots" (see details in [7]), which represent the specific transformation of the data, and a straight line is recovered if the sample has a Weibull distribution: ( ) ( ) As was established earlier [7], and we can see it again (Figure 2), the tail diverges from the Weibull model starting with a certain large threshold value th U .
Extremes exceeding the th U (these are so-called the Ds) were much more po- werful than values predicted by extrapolating the Weibull distribution in its tail.
Note that Ds are not chaotic outliers.Moreover, they can be visually detected based on obvious breaks in the tail of wind speed extreme distribution functions.
Therefore, the utilization of special statistical methods allowing us to detect the Ds (see [25] [26] [27]) was not required.
To describe the wind speed variability if the Weibull model is not sufficiently accurate, it is possible to use different approaches.For example, we can use the Pareto distribution [7], or the so-called Weibull-like distribution [10] [28] or to Table 1.Statistical parameters of model-data comparison for temperature (T)˚C, wind velocity (W) m/s, and wind gust (Wg) (>10 m/s) for four seasons: January-March (I-III), April-June (IV-VI), July-September (VII-IX) and October-December (X-XII).choose other methods (e.g., recently by [29] and [30]).However, we interpreted these results in another way supposing that the data under consideration are not identically distributed.
In a situation, we can separate wind speed extremes into two groups: one group for values below a certain high threshold (BSs) and another group for values above this threshold ( th U ) (Ds) (Figure 3).Note that, traditionally, time series of extreme wind events (e.g., in the Arctic region) are not divided into different sets [31] [32] [33] [34].In both cases ( ) bull distribution fits well, but with different parameters A and k.To decide whether samples originate from a population with a Weibull distribution, special statistical tests could be utilized.For the discussed task, the most suitable test is the Kolmogorov-Smirnov test (see details in [7]).
The Weibull distribution parameters calculated for all stations are shown in Figure 4. Here, instead of A, we use equivalently , and a Weibull distribution is replaced by The value of  determines the specific scale of wind velocity.A value of The parameters for the two sets of extremes coming from the BSs and Ds populations are evidently divided (Figure 4).V lies between approximately 4 and 14 m/s; however, in the case of Ds, there are often much smaller values.This result indicates that ( ) F U sharply increases with increasing U .The range of parameter k for Ds is located between ~1 and 2.5, whereas the range of parameter k for BSs completes the interval between ~3 and 5.5.Under the same values of V ,

( )
F U tends to 1 much faster in the case of BSs compared with Ds.
Thus, the extremes that belong to Ds are more frequent than those of an analogous magnitude that belong to the family of BSs.
The ratio of the quantile wind speed values for the Ds and the BSs (see Figure 5) reaches up to 30% (the same values were established in [7]).The most striking example (within the area of investigation) occurs at the Mys Terpenia and the Mys Krillion stations (see Figure 1), where wind speeds of more than 35 m•s −1 were detected.

The Weibull Distributions in Data from Regional Mesometeorological Model Simulations
As was mentioned, coarse-resolution GCM cannot reproduce wind speed extremes well; in particular, representatives of the Ds are absent in the modelling data [7].We hypothesize that extreme wind speeds should be governed by the mesoscale circulation of the atmosphere [24] [35] [36].Therefore, the next step of the analysis is to investigate the data of a 30-year dataset of wind simulations of the COSMO-CLM model.In Figure 6 we plot several cdfs for the COSMO-CLM grid points located near the stations for compatibility with the observed data.Again, as noted earlier in the analysis of the observed data, we propose that these curves show approximations of the cdfs of wind velocity extremes by two different Weibull distributions (Figure 7).We conclude that the velocity extremes reproduced by the model are both the BSs and the Ds.Atmospheric and Climate Sciences  The regions on the plane ( ) , V k filled with extreme model values, which belong to the family of BSs and the family of Ds, are approximately the same as those determined in the analysis of the observational data (Figure 4).This result indicates the important similarity of the statistical properties of the observed and modelled extremes.This applies to both the BSs and the Ds.However, a more thorough analysis shows that the differences in the parameters of the cdfs remain significant.The consequences of these differences are clearly seen when comparing the quantile wind speed values.The observed extremes, characterized by the quantile value U (0.99) are almost two times greater than the modelled values, and the ratio between the modelled Ds and BSs only reaches 10% (see Figure 5), a much lower value than that established (see above) for observations.

Conclusions
An extreme value analysis was used to assess the statistical properties of extreme wind speeds over the sub-Arctic region using information taken from the COSMO-CLM model dataset and observation stations.We found two groups of samples that belong to various populations, which are interpreted as the BSs and the Ds.Despite large discrepancies in the absolute values of the modelled extremes, we stress the similarity of the statistical properties of the observed and modelled extremes concerning both the BSs and the Ds.
In the introduction, we formulated the hypothesis that the processes for generating the wind speed extremes below and above the threshold (the BSs and the Ds) are different circulation mechanisms.So far, there is no answer to this hypothesis.But we can conclude that the mesoscale atmospheric model with high resolution markedly improves the modelled results for near-surface wind speed, showing the ability of the mesoscale atmospheric model in capturing the specific physical mechanism that generates wind speed extremes.
In this work, we analysed the data from a domain with 13.2-km grid spacing and obtained positive and encouraging results.The higher-resolution data of our 30-year COSMO-CLM experiment and more computational effort will allow us to obtain even more promising results.

Figure 1 .
Figure 1.Locations of the outer, intermediate and inner domains using for the COSMO-CLM simulation and locations of observation stations, which were used for model-data comparison (Image source: https://www.google.com/maps).

Figure 2 .
Figure 2. Cumulative distribution functions of wind speed maxima (station observations) for 72 hours time step records straightening on the coordinate axis of the Weibull distribution, and linear regression line corresponding to the Weibull function.(a), (b): Aleksandrovsk-Sakhalinky (cold and warm seasons); (c), (d): Yuzhno-Sakhalinsk (cold and warm seasons).

Figure 4 .
Figure 4.The Weibull distribution parameters (k and V) calculated for all stations of the Sakhalin Island (see Figure 1) ("black swans" = BS and "dragons" = D) and in grid points of the COSMO-CLM corresponded to stations; and additionally, data from stations adhering to coastal zone of the Barents Sea, Cara Sea, Laptev Sea, East-Siberia Sea and Chukotka Sea (S(ACS) and D(ACS)) (Kislov and Matveeva, 2016).

Figure 5 .
Figure 5. Quantile wind speed values U (0.99) in m•s −1 for wind data from measurement stations and COSMO-CLM simulation data calculated separately for two groups of wind speed extremes come from the BSs and Ds populations.

Figure 6 .
Figure 6.Cumulative distribution functions of wind speed maxima simulated near the surface by the COSMO-CLM in grid points corresponded to the Aleksandrovsk-Sakhalinky (a), (b); the Yuzhno-Sakhalinsk (c), (d); ((a), (c)-cold season, (b), (d)-warm season) for 72 hours time step records straightening on the coordinate axis of the Weibull low, and linear regression line corresponding to the Weibull function.In all cases R 2 > 0.94.

Figure 7 .
Figure 7. Cumulative distribution functions of wind speed maxima near the surface simulated by the COSMO-CLM in grid points corresponded to the Aleksandrovsk-Sakhalinky ((a) and (b)-warm season), the Yuzhno-Sakhalinsk ((c) and (d)-warm season), the Aleksandrovsk-Sakhalinky ((e) and (f)-cold season); the Yuzhno-Sakhalinsk ((g) and (h)-cold season) for 72 hours time step records straightening on the coordinate axis of the Weibull low, and linear regression line corresponding to the Weibull function.(a), (c), (e), (g) denotes the Weibull distribution for range U ≤ U th (so-called "swans"-see text); (b), (d), (f), (h) denotes the Weibull distribution for range U > U th (so-called "dragons"-see text).In all cases R 2 > 0.96.