Enhancing Air Quality Forecasts over Catalonia ( Spain ) Using Model Output Statistics

Model Output Statistics (MOS) is a well-known technique that allows improving outputs from numerical atmospheric models. In this contribution, we present the development of a MOS algorithm to improve air quality forecasts in Catalonia, a region in the northeast of Spain. These forecasts are obtained from an Eulerian coupled air quality modelling system developed by Meteosim. Nitrogen Dioxide (NO2), Particulate Matter (PM10) and Ozone (03) have been the pollutants considered and the methodology has been applied on statistical values of these pollutants according to regulatory levels. Four MOS algorithms have been developed, characterized by different approaches in relation with seasonal stratification and stratification according to the measurement stations considered. Algorithms have been compared among them in order to obtain a MOS that reduces the forecast uncertainties. Results obtained show that the best MOS designed increases the accuracy of NO2 maximum 1-h daily value forecast from 71% to 75%, from 68% to 81% in the case of daily values of PM10, and finally, the accuracy of O3 maximum 1-h daily value from 79% to 87%.


Introduction
Air quality is one of the main issues that concerns current atmospheric research.Global air pollution has an impact on human health [1], climate change [2] and on the physics and chemistry of the atmosphere [3].Meteoro-logical and air quality related environmental phenomena influence and limit the regional and urban development and safety management, which often leads to severe negative impacts on public health, economy and environment of polluted areas.
In Spain, annual average values of Nitrogen Dioxide (NO 2 ) and Particulate Matter (PM 10 ) are elevated in many urban air quality measurement stations with traffic influence [4].Whereas, high ozone levels are measured in rural or suburban areas located downwind of urban or industrial locations and where local ozone precursors are lacking [5] [6].The area of study is Catalonia (41.82˚N, 1.47˚E) in north-eastern Spain, in which six episodes with high levels of pollutants have been recorded in the last five years, thus making the Catalan Government adopt temporary mitigation measures.Figure 1 shows the topographical features of Catalonia.Black dots point the four provinces that divide the studied area.
For these reasons, it will be desirable that public administrations may count on tools which enable them to anticipate the potential risks caused by pollution, helping the management of health, economical and environmental impacts that may affect the polluted areas of its population.In this sense, models are a very useful tool for local administrations for planning and managing production, human resources, activities and emergency procedures; and to introduce improvement plans of air quality urban areas.Nowadays, coupled Eulerian air quality models are useful tools to manage air pollution, and they can even complement or replace air quality monitoring in terms of what the European Directive establishes.
Despite years of refinements and improvements, meteorological and air quality models still contain significant errors.Several statistical techniques have been designed and applied in the last few years [7]- [12] in order to reduce the uncertainty of their forecasts.In this work, we focus on the use of the Model Output Statistics (MOS) technique to improve forecasts obtained by an air quality modelling system.MOS has been applied usually to meteorological applications [13]- [17] but there are a few applications on air quality [18]- [20].Air quality forecasts have been obtained by a coupled Eulerian air quality modelling system developed by Meteosim [21].This coupled air quality modelling system has been applied and tested successfully in urban (Madrid or Barcelona in Spain; Nice in France) and industrial areas (Ponferrada or Tarragona in Spain).This air quality modelling system has been evaluated using Maximum Relative Directive Error [22] referred in the European Directive EC/2008/50.Results obtained from this evaluation accomplish the model uncertainty limits according to the Directive for the pollutants O 3 , NO 2 , PM 10 , SO 2 and CO, having used measurements from more than 120 stations (urban, suburban and rural locations) during a period of three years.
The MOS technique has been applied to daily values of NO 2 and PM 10 , maximum 1-h daily values of NO 2 and O 3 and maximum 8-h daily values of O 3 forecasted over Catalonia (Spain).Air quality measurement data have been provided by the Air Quality Network that belongs to the Territory and Sustainability Department of the Catalonian Government.The study includes a numerical deterministic evaluation that shows the accuracy of the air quality modelling outputs with and without the MOS technique.
Description of the model output statistics technique and the uncertainty evaluation methodology are presented in section 2; a detailed analysis of the results obtained is presented in section 3; and finally, some conclusions are reported in section 4.

Model Output Statistics
The technique Model Output Statistics (MOS) computes regression equations between observed and model forecasted variables.These equations are later applied to the raw model output to get a modified and statistically corrected output.This technique requires a set of variables from the modelling system to be corrected and a set of observed variables.These datasets cover a time period known as training period.Its length is variable but it is advisable to have the longest possible period to include potential cycles.In this work we have used the period comprised between the 24th of October 2012 and the 31st of August 2013, which includes 720,000 hourly values.
Within the MOS correlation equations we distinguish between predictands and predictors.The predictant is the dependent variable to be forecasted or corrected.In developing our MOS, the predictants will be the statistics described in the introduction: maximum 1-h value and daily values for NO 2 , daily value for PM 10 and maximum 1-h and 8-h values for O 3 .
The predictors are the independent variables of the regression equations.In this work, we have used the concentration values forecasted by the modelling system.Since we have only used one predictor, we will omit the discussion about predictor-selection algorithms [23]- [27].It is worth mentioning that in a preliminary phase we did test the viability of using more predictors, such as hourly data and its derived statistics, as well as past observational data.However, we finally found out that the use of the statistics of each pollutant as obtained by the modelling system produced the best results.
Often, the datasets employed in the regression equations are grouped in order to optimize the forecasting relationships and improve the precision of the MOS technique.This grouping is usually referred to as data stratification.

Error Statistics
In order to evaluate the potential improvements of the MOS technique on the modelling system, we will use a numerical validation.Several statistics will be used for the error measurement of the differences between results obtained with and without MOS.Most of them are typical statistics employed in error evaluation in meteorological models [22] [28] and, in particular, in air quality models [29]- [31].Reference [32] draws some recommendations on the maximum thresholds for some of these statistics in the analysis and forecast of pollutant concentrations1 that we will use in this work.
In the next definitions, n is the sample size, o i and f i the observed and forecasted values, where i ∈ [1, n]: Mean normalized error: Reference [32] recommendation: MNBE < ±15%.Mean normalized gross error: Reference [32] recommendation: MNGE < 35%.Index of agreement: This is a statistic to evaluate with a single value the goodness of fit of a modelling system with respect to the observations.As close as one is the value, the best is the fit; and it worsens as it approaches to zero.

Forecast Evaluation
Table 1 shows a summary of the errors of the modelling system in the training period using the above described statistics.
As can be seen, most of the statistics fall within [32] recommendations, with the exception of the values of the PM 10 pollutant.The presence of these errors, however, justifies the need of developing a MOS.

Description of the Proposed MOS Categories
From the observed and modelled data, we have proposed two kinds of stratifications: • A seasonal stratification.The analysis of observed concentration exhibits a dependency on temperature, especially remarkable for ozone, which suggests splitting the training stage in 4 different periods.At first glance one could think that a natural division would involve climatologically seasons, but the study of mean temperature and concentration of the considered pollutants show that this division is not appropriate.We have finally chosen a division in training periods in which the values of temperature and pollutants change significantly from one to another (Figure 2 and Table 2).These periods turned to be 23  A total of 2 × 2 separations have been done, yielding to 4 different categories.Table 3 shows a summary of the categories and the nomenclature that will be used.

Results
In this section we analyze the errors resulting from applying a period-specific trained MOS on the modelling output system.We use the deterministic numerical validation, the comparison between mean daily evolutions, and the comparison of the frequency histograms of the average errors of each category as well as the parameters of a Gaussian fit.This will allow us to choose the best MOS for each pollutant and statistics.

NO 2
Table 4 shows the error analysis statistics of the different categories after having been applied to the modelling system of the NO 2 pollutant.Figure 3 shows a comparison between normalized mean absolute errors in each category.
Table 4 and Figure 3 show that MOS E C produces the best results for both the maximum hourly value and the daily value.Notice that MOS 0 and MOS C do not improve the modelling system.
In case of the maximum hourly value, MOS E C reduces MNBE from 11.7% to 9.5% of the modelling system, reduces MNGE from 29.5% to 24.3%, and increases IOA from 0.86 to 0.91.
For the daily value, MOS E reduces MNBE from 11.8% to −0.9%, thus correcting the overestimation of the modelling system, reduces MNGE from 24.9% to 18.1%, and increases IOA from 0.91 to 0.95.
The frequency histograms of    maximum hourly values and daily values for each MOS category.Obviously, better results are associated to centred and peaked distributions.
As can be seen, the error distributions of both MOS E and MOS E C are the sharpest and narrowest and so the best.For daily values, the best results are attained by MOS E C but closely followed by the rest.
Through a Gaussian fit, we have calculated the width (twice the standard deviation) and displacement of the distributions with respect to the origin.The results are presented in Table 5. MOS E features a near zero shift and the narrowest width and MOS E C features the smallest displacement and width for daily values among all categories.Paying attention at the behaviour of the errors on measurement stations we see that, in general, the improvement introduced by MOS E C is quite apparent in the maximum 1-h values, exhibiting generalized reductions in MNBE and MNGE and increases in IOA.MOS E C reduces MNBE in 78% of the measurement stations and 88% exceed [32] acceptance criteria for both MNBE and MNGE, which turned out to be only 69% and 75%, respectively, with the raw modelling system.On the other hand, the improvement produced by MOS E C over the modelling system in daily values is also very clear, reducing MNBE in 88% of measurement stations.Moreover, all stations exceed [32] acceptance criteria while 69% and 89% did, respectively, with the raw modelling system.

PM 10
Table 6 shows the error analysis statistics of the different categories applied to the output of the modelling system of pollutant PM 10 .Figure 6 displays the comparison between normalized mean absolute errors.
It can be seen that the best results are obtained with MOS E C. It decreases the underestimation produced by the modelling system, reducing MNBE from −24.2% to 5.9%, MNGE from 32.1 to 19.3% and increases IOA from 0.75 to 0.91.
Figure 7 displays frequency histograms of the mean error distributions in forecasting daily values for different MOS categories and the modelling system.Table 7 shows the parameters of the Gaussian fit.
The results are very similar for categories MOS E and MOS E C for which the curves are narrower and centred.
To further show the improvement introduced by MOS E C, Figure 8 shows the comparison between the time evolution of daily PM 10 values observed, forecasted by the raw modelling system and corrected by MOS E C.
Focusing on the impact of MOS E C at the measurement stations, it can be seen that MOS E C diminishes MNGE in all of them.Moreover, all measurement stations exceed [32] acceptance criteria for MNBE and MNGE, which is a very good achievement since only 21% and 58%, respectively, of the stations, satisfied the criteria with just the modelling system.

O 3
Table 8 shows the error analysis statistics of this pollutant for the different categories and Figure 9 the comparison among the mean normalized errors.
As can be seen, MOS E C gets the best results, both in the maximum hourly value and maximum 8-hourly.
In the maximum hourly value, MOS E C slightly increases the MNBE of the modelling system from 1.8% to 2.8%, reduces MNGE from 20.8% to 13.2%, and increases IOA from 0.62 to 0.91.
For the maximum 8-h value, MOS E C reduces the MNBE of the modelling system from 7.0% to 3.9%, MNGE from 25.3% to 15.5%, and increases IOA from 0.68 to 0.92.
Figure 10 displays the mean error distributions of the forecasts of the values for every MOS category and the modelling system and Table 9 summarizes the errors of the Gaussian fit.
The best results are attained by MOS C and MOS E C, especially the former, for the maximum hourly value.MOS C produces the narrowest curve followed by MOS E C, though they are not quite centred.For the maximum 8-hourly value, the best results are attained by MOS E C followed by MOS C. MOS E C has the smallest width but is not the best centred.The analysis of errors by measurement stations show that the improvement introduced by MOS E C over the modelling system for maximum hourly values is apparent for most of the stations, with a generalized reduction of MNBE and MNGE, and an increase of IOA.MOS C reduces MNGE in all stations and, moreover, all of them end up verifying [32] acceptance criteria for MNBE and MNGE.
Furthermore, the improvements produced by MOS E C over the modelling system, in case of the maximum 8hourly value, are also generalized, with reductions in both MNBE and MNGE and increases in IOA.MOS E C reduces MNBE in all measurement stations and [32] criteria are met in all of them for MNBE and MNGE.This compares to 76% and 81%, respectively, of the measurement stations that met the criteria with the raw modelling system.

Conclusions
In this paper, we have applied the well-known MOS methodology in air quality modelling, a field where this tech- nique is rarely employed.The output of an Eulerian coupled air quality modelling system feeds 4 MOS categories for each of these 5 statistics: maximum 1-h daily value and daily values of NO 2 , daily values of PM 10 and maximum 1-h and 8-h daily value of O 3 ; this amounts to 20 MOS considered.
The best results for all statistics have been obtained with the category that stratifies by measurement stations and by periods, as expected.The stratification by measurement stations corrects systematic errors of the modelling system.The stratification by periods allows for the treatment of the seasonal dependency of concentration values.
In summary, the application of the MOS methodology increases the accuracy of the maximum 1-h daily value of NO 2 from 71% to 75%, the daily value of NO 2 from 75% to 82%, the daily value of PM 10 from 68% to 81%, the 1-h daily value of O 3 from 79% to 87% and the 8-h daily value of O 3 from 75% to 85%.
Our results highlight the improvements achieved by a quite simple mathematical tool with a very low computational cost.MOS methodology only increases a 2% the total computational cost of the operative air quality modeling system, and provides an improvement between 3% and 10% of the air quality forecasts.Nevertheless, one cannot forget that the MOS implementation should go in parallel with enhancements in the modelling system such as introducing better pollutants emission values to the modelling system.Any changes in the modelling system, however, imply the recalculation of the MOS regression coefficients.

Figure 2 .
Figure 2. Average daily time evolution of concentration values of NO 2 (a) PM 10 (b) and O 3 (c) for the considered periods.

Figure 4 (
a) and Figure 4(b) have been created to better analyze the improvements of the different categories.These histograms display the error distributions of the modelling system and of

Figure 3 .
Figure 3.Comparison of MNGE and MNBE between the different MOS categories and the NO 2 modelling system for maximum 1-h value [left] and daily value [right].

Figure 4 .
Figure 4. Histograms of error distributions in the evaluation of the modelling system and the MOS categories for maximum hourly values (a) and daily values (b) of NO 2 .

Figure 5 (
a) let us compare the time evolution of the mean daily values of observed maximum hourly values of NO 2 with those forecasted by the modelling system and those corrected by MOS E C. And Figure 5(b) shows the same comparison for daily values.

Figure 5 .
Figure 5.Time evolution of mean daily values of observed maximum hourly values of NO 2 (a) and observed daily values (b) forecasted by the modelling system and corrected by MOS E C in the training period.

Figure 6 .
Figure 6.Comparison of MNGE and MNBE between the different MOS categories and the PM 10 modelling system.

Figure 7 .
Figure 7. Histogram of error distributions in the evaluation of the modelling system and the MOS categories for daily values of PM 10 .

Figure 8 .
Figure 8.Time evolution of daily PM 10 values observed, forecasted by the modelling system and corrected by MOS E C in the training period.

Figure 9 .Figure 10 .
Figure 9.Comparison of MNGE and MNBE between the different MOS categories and the O 3 modelling system for maximum 1-h value [left] and maximum 8-h value [right].

Figure 11 .
Figure 11.Time evolution of averaged maximum 1-hourly values (a) and maximum 8-hourly values (b) of O 3 observed, forecasted by the modelling system and corrected by MOS E C in the training period.

Table 1 .
Stratification according to the measurement station in order to identify systematic errors and biases of each measurement point.Summary of the errors of modelling system in the training period.
Oct 2012 to 02 March 2013 (defined as Period 1), 03 March 2012 to 01 May 2013 (defined as Period 2), 02 May 2013 to 06 July 2013 (defined as Period 3) and 07 July 2013 to 31 August 2013 (defined as Period 4).Stratification according to the measurement station in order to identify systematic errors and biases of each measurement point.•

Table 2 .
Average maximum hourly and daily values of NO 2 , daily value of PM 10 and maximum hourly and 8-hourly values of O 3 within each period.

Table 3 .
Summary of the proposed MOS categories.

Table 4 .
Errors of all MOS categories as well as NO 2 forecasting system.

Table 5 .
Parameters of the Gaussian fit of the different MOS categories with respect to the maximum hourly value and daily value of NO 2 .

Table 6 .
Errors of all MOS categories as well as PM 10 forecasting system.

Table 7 .
Parameters of the Gaussian fit of the different MOS categories with respect to the daily value of PM 10 .

Table 8 .
Errors of all MOS categories as well as O 3 forecasting system.

Table 9 .
Parameters of the Gaussian fit of the different MOS categories with respect to the maximum 1-hourly value and maximum 8-hourly value of O 3 .