Collecting Statistical Methods for the Analysis of Climate Data as Service for Adaptation Projects

The development of adaptation measures to climate change relies on data from climate models or impact models. In order to analyze these large data sets or an ensemble of these data sets, the use of statistical methods is required. In this paper, the methodological approach to collecting, structuring and publishing the methods, which have been used or developed by former or present adaptation initiatives, is described. The intention is to communicate achieved knowledge and thus support future users. A key component is the participation of users in the development process. Main elements of the approach are standardized, template-based descriptions of the methods including the specific applications, references, and method assessment. All contributions have been quality checked, sorted, and placed in a larger context. The result is a report on statistical methods which is freely available as printed or online version. Examples of how to use the methods are presented in this paper and are also included in the brochure.


Introduction
Climate change is ongoing as it is evident in the observations presented by the recent IPCC report of Working Group I [1].The authors present the observed worldwide increase of land surface air temperature, which has been accelerating in the last decades.Global precipitation datasets also show an increase in globally averaged precipitation in the last century, but the regional distribution may vary largely [2].The climatic changes also affect extreme values.In the Special Report on Managing the Risks of Extreme Events and Disasters to Advance Climate Change Adaptation (SREX) [3] the observational basis of extreme events, which might interact with vulnerability and exposure, is outlined.They mention a lot of auxiliary conditions in order to derive trustable results from the rare data basis, particularly for the regional scale.
Climate simulations are necessary to estimate the trends of the near past and projected future developments.This is done with the international initiatives CMPI5 (Coupled Model Intercomparison Project Phase 5 http://cmip-pcmdi.llnl.gov/cmip5/)for global modeling and CORDEX (Coordinated Regional Climate Downscaling Experiment [4]) for regional modeling over 14 domains worldwide.The model results clearly project further warming and changes in all components of the climate system under all scenarios.
Climatic changes affect all sectors, including societal and economic sectors http://www.umweltbundesamt.de/en/topics/climate-energy/climate-change-adaptation/why-we-adapt-to-climatechange.Therefore, adaptation to climate change impacts is inevitable for infrastructure, economy, and administration.The basis for the development of adaption strategies is the quantified outcome of climate model projections.All parties concerned do not only expect high quality observation data and model projections, they also expect state-of-the-art methodological tools for data evaluation.Due to the huge amount of data statistical methods are needed to achieve reliable and robust results.
In the last years not only climate researchers but also other users from various sectors, administration and institutions analyze climate data, either from observation or models, with the aim to understand and quantify climate impact and implications on their specific sector.Similar topics become relevant in research projects on adaptation to climate change all over the world e.g. in the pacific region http://www.sprep.org/Adaptation/pacific-adaptation-to-climate-change-project, in the Caribbean region http://caricom.org/jsp/projects/macc%20project/cpacc.jsp, in Australia http://www.climatechange.gov.au/climate-change/adapting-climate-change/climate-change-adaptation-program and on special topics like water, energy and food security nexus http://www.water-energy-food.org/.
Several institutions have provided information on statistical methods in recent years.A comprehensive guidebook is the WMO-Guide to Climatological Practices [5] with chapter 4 "Characterizing climate from data sets" and chapter 5 "Statistical methods for analyzing datasets".Another informative website is the Climate Data Guide of NCAR (National Center for Atmospheric Research, Boulder, CA, USA) and UCAR (University Corporation for Atmospheric Research, USA, which supports with sponsorship by the National Science Foundation https://climatedataguide.ucar.edu/).In the "Climate Data Analysis Tools and Methods/Statistical & Diagnostic Methods Overview" several useful statistical methods are described and illustrated.Several compendia on statistical tools with application to climate data, e.g.[6] [7] exist already.Also on the international level several guidelines cover the issue of methodology and tools for adaptation development, e.g.[8] [9].
But the main focus of these reports is not on detailed description of methods and often only very special topics such as downscaling or robustness are addressed.In most cases users of climate data need a wide variety of statistical methods that cover the whole range of questions relevant in adaptation development.Therefore, a brochure on statistical methods was planned and realized in the last years.
In the following chapters we describe the specific needs of users of climate data (chapter 2), explain the methodology of the collection and dissemination of statistical methods (chapter 3), present some selected examples of application (chapter 4) and summarize the benefits of the collection (chapter 5).

Statistical Methods in Climate Change Analysis
Dealing with large amount of data has become a common task for all users who generate climate change information from climate model output data.The challenge to apply well-known or newly developed statistical methods was also tackled in the recent German adaptation project KLIMZUG.The Federal Ministry of Education and Research has been funding seven regional projects for five years each, over the period from 2008 to 2014 within the research priority "KLIMZUG-Managing climate change in the regions for the future".The aim of the integrative projects was "the development of innovative strategies for adaptation to climate change and related weather extremes in regions" http://www.klimzug.de/en/160.php.It was evident that the challenge to deal with climate data took a prominent place in the activities of the projects.The large variety of addressed topics-water management, agriculture, urban planning, tourism, coastal protection, harbor logistics and others-was responded by tailored statistical methods.These methods must take into account the specific characteristics of the data and the questions to be answered, namely amount and quality of the data, complexity of analysis, and demand on reliability of the result.Several publications show the variety of different methods which were used or specifically developed for the analysis [10] [11].
In [12] [13] the authors used statistical methods to determine the future urban heat island in order to give advice to city planners with regard to adaptation strategies for the city of Hamburg.As a basis for this analysis climate data are evaluated and compared with observation data and, if their coherence is not sufficient, bias corrected [14].
Future projects and institutions will be confronted with similar challenges and should therefore profit from KLIMZUG efforts.Listing just the publications of such projects is not feasible because the statistical methods are sometimes not obvious from the title, often not explained in detail because most publications are result orientated, and many users outside science may not have access to scientific journals.Project reports and as well as technical reports do not give a user friendly overview either.
Also in the international context various statistical methods are applied, either for climate model data analysis and processing (e.g.time series analysis, bias correction or statistical downscaling, as discussed in [15]) or for the assessment of results (e.g.indicator-based assessment [16]).The IPCC frequently hints at statistically plausible, robust and significant findings, e.g. in the special report on "Managing the Risks of Extreme Events and Disasters to Advance Climate Change Adaptation" (SREX) [3].The growing interest in statistical methods in the field of climate change already led to a special issue on "Advances in Statistical Methods for Climate Analysis" of the journal Environmetrics http://dx.doi.org/10.1002/env.2129.
Some guidebooks present methods that go further ahead to the evaluation of adaptation measures.One example is the "Compendium on methods and tools to evaluate impacts of, vulnerability and adaptation to, climate change" [8] http://unfccc.int/adaptation/nairobi_work_programme/knowledge_resources_and_publications/items/2674.php.The authors use a similar standard of documentation of the used methods as we describe in the next chapter by indication of points like Appropriate Use, Scope, Key Output, Key Input, Ease of Use, Requirements, References and others to support the users.

Methodology
The presented collection of statistical methods is the result of a long-term process which involved the "producers" of the methods.It was accomplished by a "Working Group Statistical Methods" which formed in Hamburg at the Climate Service Center with members from several institutions who had experience in statistics and had different sides of view on the purpose of collection.The group at present consists of scientists from the Climate Service Center 2.0, the Federal Maritime and Hydrographic Agency, the German Meteorological Service, the Hamburg University of Technology, and the Institute for Water and Environment at Bochum University of Applied Sciences.
The first step of the process was to bring together users of climate observation and climate model data and initiate an exchange of methods.This was accomplished by two workshops, which took place at the Climate Service Center in Hamburg.The participants of the first workshop were KLIMZUG project researchers and the overall agreement was to publish their methods as guidance for future projects.After launching a first version of a brochure with mainly "KLIMZUG methods" a second workshop brought together developers and potential users of the methods to discuss the presented collection and come up with new suggestions.Thus, a broader community of climate model users was included in the development process.
All statistical methods were described in a template which was designed to enable an adequate description of each method (Table 1).This assures that the collected statistical methods are described in a standardized way.Also for this purpose each method is sorted into a prescribed category in order to structure the collection.Besides necessary information like description, references and applicability, some important topics are "Requirements for application", "Assessment", and "Example".These three points particularly help the users to assess the method with regard to their own demands and understand what an outcome may look like and thus make a crucial difference to other collections, statistics books, and guidelines.
Statistical methods were delivered by users belonging to a large variety of sectors, such as water management, coastal research, geoscience, ecology, forestry, agriculture, and by different kinds of institutions like universities and other research institutes, federal and regional offices, as well as consulting agencies.
Each method sent to the Working Group (WG) underwent a review process by the members of the working group and an external statistical expert.This comprises a formalistic and content-oriented check.If the description did not meet the criteria, the draft was sent back with clear advice what should be added, changed, clarified.Nevertheless, all contributions should be changed as little as possible, so the WG did not correct "the last detail" since the collection was understood as a product "from users to users".Hence, there may still be some shortcomings in the descriptions of the methods, and the descriptions differ in particularity and style.Two external statistics experts reviewed the brochure structuring process.One of them also reviewed each individual method.Examples of the methods are presented in the Annex.
In the present collection, the order of categories follows the anticipated order of use when evaluating climate data, i.e. the first look at data may probably be via histograms and/or time series.The comparison of model data with observations may lead to the need of a bias-correction.Calculated trends (with methods from time series analysis) must be checked by a suitable significance test.Within each category the methods are ordered according to their complexity-from simple to complex methods.
Users which may apply the presented statistical methods are provided with additional information concerning the aims and limits of the collection.The editors clearly state that there remains the responsibility of the users to inform themselves on basic knowledge on statistics and check whether the selected method is the right one for the specific question.

Selected Examples of Applications
Although the statistics brochure does not claim to be a recipe book "how to combine statistical methods" some hints at the combination of single methods are given in the brochure.This would be highly appreciated by users.But since each problem is unique because of its specific requirements it is not advisable to suggest certain methods or method combinations in general.Nevertheless, four common examples of combining methods for a certain application are described in the introductory chapter.Two examples of rather complex analysis methods are described in the following sections.

Example "Extreme Value Analysis"
In adaptation measures most of the identified risks are due to extreme events such as flooding events due to storm surges and/or heavy rainfall, and low water levels due to droughts or heat waves.The development of adaptation measures should therefore in particular include analysis of possible changes of extreme events.To assist spatial planning processes and the development of adaptation measures in different fields (e.g.water management, coastal protection, tourism, agriculture, harbor and marine infrastructure, renewable energy, nature protection and landscape conversation, health and ecosystem services etc.) the changes of extreme events induced by regional climate change can be analyzed by the application of methods of extreme value statistics as described in textbooks [17] [18].These methods need to be understood and applied under specific requirements.
An example for possible ways of extreme event analyses is shown in Figure 1.
In the following, an example for analyzing future extreme wave events in the Baltic Sea is presented.Information about possible future changes of the wave conditions are needed because they serve as important input parameter for the design of coastal and flood protection structures.The results are moreover used by local authorities for the assessment of the future safety and effectiveness of the structures.Significant increases of the future extreme wave heights can lead to increasing loads on the structures, thus different adaptation measures have to be assessed depending on the preferred adaptation strategy (for more details see [20]).The wave dataset that has been analyzed was calculated by using a newly developed hybrid approach that consists of both empirical and numerical methods for the calculation of the wave conditions based on future projections of wind conditions from a regional climate model (COSMO-CLM [21] [22]) considering different future IPCC greenhouse gas emission scenarios (SRES [23]).The associated methods and results have been published in [10] [24].
The wave data were derived within the climate adaptation project RAdOst (Regional Adaptation Strategies for the German Baltic Sea Coast http://klimzug-radost.de/en)and are available for the 20 th Century (1960-2000) and 21 st Century (2001-2100) on an hourly basis.
Figure 2 shows exemplarily the time series of calculated significant wave heights for the 1 st realization of the SRES emission scenario A1B near the location of Warnemünde at the German Baltic Sea Coast.
At the beginning of the analyses the time series was tested for linear trends as well as jumps in order to ensure the stationarity and consistency of the underlying data.From the corrected time series (cp. Figure 2) annual maximum values (cp.Block Maxima Method in Figure 1) were selected for different time periods, each consisting of 30 years of data.Afterwards, the independence of the sample members was tested and different extreme value distribution (EVD) functions like the Generalized Extreme Value (GEV) distribution, Weibull, Gumbel and Log-normal function were fitted to the samples by parameter estimation using the Maximum Likelihood Method (cp. Figure 1).
Figure 3 shows exemplarily the fitted EVD functions for a sample of 30 years of annual maximum significant wave heights from 2071-2100 and for the 1 st realization of the SRES emission scenario A1B near the location of Warnemünde.
For the assessment of the goodness of fit, maximum and root-mean-square differences between the empirical cumulative distribution function (CDF) and theoretical CDFs have been calculated using a modified Kolmogorov-Smirnov (K-S) test, consecutively called Lilliefors test.The distribution with the lowest root-mean-square   difference was assumed to describe the sample best.In Figure 4 the calculated maximum and root-mean-square differences between the empirical CDF and the theoretical CDFs are shown for the samples consisting of 30 years annual maximum significant wave heights in the 1 st realization of the SRES emission scenario A1B (2001-2100) and in the past century C20  near the location of Warnemünde at the German Baltic Sea Coast.
From Figure 4 it can be concluded that the Log-normal and Gumbel distribution are showing the lowest rootmean-square differences, thus describing the samples best.In the overall assessment, regarding all emission scenarios used in this study (two realizations each of the scenarios A1B and B1, not shown), the Log-normal function showed the lowest root-mean-square differences in the majority of the comparisons.Another outcome from the overall assessment of the goodness of fit is that the stability or the robustness of the results enhances when taking into account time periods of more than 30 years for the selection of samples.
In general, it appears that the application of several statistical methods enhances the plausibility of the results.

Example "Ensemble Analysis"
A second challenging task is the analysis and visualization of climate model data using an ensemble of models.This creates a prominent part of the spread of climate projections besides the spread due to different scenarios and climate variability.Both aspects are an inherent part of the uncertainty of climate projections.Presenting the results of an ensemble analysis e.g. to decision makers or other clients in climate service it needs assessment parameters that can be understood.The IPCC reports introduce the term of likelihood [25] to interpret the bandwidth of results.Other publications define and use a robustness parameter as a measure for the reliability of the results [26].Innovative maps which show the degree of consensus on the significance of future changes and the sign of the change are presented in [27].Visualizing uncertainty is not only an actual topic in climate change research and service but also in other disciplines which deal with social, demographic, or medical data.Several examples are shown in [28].
In the statistics brochure one chapter is dedicated to the analysis of ensembles with a subchapter presenting their visualization.The presentation of projected climatic changes is an essential part of the communication to stakeholders and decision makers.The message must be clear and comprehensible and include the complexity of climate modeling and the inherent uncertainty.
Two possible ways are shown in Figure 5 and Figure 6, which have been developed for Climate-Fact-Sheets in cooperation with a client http://www.climate-service-center.de/036238/index_0036238.html.en.They originate from the document "How to read a Climate-Fact-Sheet-Instructions for reading and interpretation of the Climate-Fact-Sheets.Climate Service Center, Hamburg, Mai 2013".
Here, the ensemble analysis is mainly percentile and likelihood analysis.Figure 5 illustrates the scenario influence on the annual cycle of a climate change signal of a parameter, i.e. the difference between projected values in a future time slice and the values in the reference time slice.Presented is the median of all simulations following a specific scenario, the whole bandwidth based on all scenarios, and the "likely" bandwidth (>66%).
The second example compares the range between the minimum and maximum of different ensembles all available global models for all scenarios, all global models for the A1B scenario, and all available regional models for all scenarios in different time slices.
Since the topic is highly complex the presentations cannot be understood at first glance but working with colors, grey areas and info boxes helps users to intuitively understand the context.Note, that the global model data have not been available for the near future period 2006-2035.

Conclusions
The Climate Service Center 2.0 and other institutions which support society in adaption to climate change are  recognizing a demand for a compendium on statistical methods among their clients.KLIMZUG offered the great opportunity of testing and applying statistical methods in the process of developing adaptation measures.This effort formed the basis for the collection which is dedicated "from users to users".The broad scope of KLIMZUG is reflected in the broad range of the statistical methods presented in the compendium.But only the editorial work of categorization and compilation of the methods makes them available for other users.This is a new approach and complements statistical text books as well as statistical software tools.The strengths are standardized presentations of methods, user participation, and editors from different institutions.
The collection and publication of currently used statistical methods is a contribution to the quality of the achieved results.The reason is two-fold: On one hand, the methods are tested for its usability for certain data sets, the requirements for application are appointed and examples are shown.On the other hand, often several methods are presented so that the user has the possibility to apply different methods and can estimate the bandwidth of the results.In the last years it has become standard to use several models in order to achieve more ro-bust results.This is e.g. one of the aims of AgMIP (The Agricultural Model Intercomparison and Improvement Project): "Include multiple models, scenarios, locations, crops and participants to explore uncertainty and impact of data and methodological choices" http://www.agmip.org/about-us/objectives/.
The demand for the collection and publication of statistical methods is large, which can be highlighted through the click numbers of the online brochure http://www.climate-service-center.de/009940/index_0009940.html.de,we count 100 -200 clicks per month.There is also a link to the brochure on the website of the "Climate Data Guide" of the NCAR (National Center for Atmospheric Research) and UCAR (University Corporation for Atmospheric Research) and funded by the American National Science Foundation https://climatedataguide.ucar.edu/climate-data-tools-and-analysis/statistical-diagnostic-methods-overview/.
The concept of the brochure is a "living document" and the brochure will be updated regularly.The plans for the next update cover the topics: 1) extension of the chapter "Combination of methods", 2) comparison of different methods and assessment of the performance and the differences in the results, 3) extension of the chapter "Ensembles analysis" including visualization.

Figure 1 .
Figure 1.Possible combinations of methods for analyzing extreme events.The procedures used in the example below are marked in black colour (Modification of Figure 2 from [19]).

Figure 2 .
Figure 2. Time series of calculated significant wave heights (meter) for the 1 st realization of the SRES emission scenario A1B near the location of Warnemünde at the German Baltic Sea Coast.

Figure 3 .
Figure 3. Sample of annual maximum significant wave heights (meter) for the time period 2071-2100 and the 1 st realisation of the SRES emission scenario A1B near the location of Warnemünde at the German Baltic Sea Coast.The values were plotted log-normally using the plotting position formula (empirical distribution function) by HAZEN.

Figure 4 .
Figure 4. Maximum and root-mean-square differences (dashed resp.solid lines) between the empirical cumulative distribution function (CDF) and the theoretical CDFs (Weibull, Gumbel and Log-normal) for the 1 st realization of the SRES emission scenario A1B (2001-2100) and the past century C20 (1971-2000) near the location of Warnemünde at the German Baltic Sea Coast.

Figure 5 .
Figure 5.This figure shows the projected change of the annual cycle in a parameter in % as averaged over the period 2071-2100 compared to the mean of the reference period 1961-1990.Presented are lines of the medians of allglobal models of all global models of the scenario A1B, and of all regional models, and the regions of likelihood "likely" and the complete bandwidth.

Figure 6 .
Figure 6.This figure shows the range of the projected change of a parameter averaged over three periods of 30 years 2006-2035, 2036-2065, and 2071-2100 compared to the mean of the reference period 1961-1990.On the bars the median and the regions of likelihood "likely" and the complete bandwidth are marked.

Table 1 .
Template for description of statistical methods.