_{1}

^{*}

A data assimilation system combines all available information on the atmospheric state in a given time-window to produce an estimate of atmospheric conditions valid at a prescribed analysis time. Nowadays, increased computing power coupled with greater access to real-time asynoptic data is paving the way toward a new generation of high-resolution (i.e. on the order of 10 km) operational mesoscale analyses and forecasting systems. Moreover, better initial conditions are increasingly considered of the utmost importance for Numerical Weather Prediction (NWP) at the short range (0 - 12 h). This paper presents a general-purpose data assimilation system, which is coupled with the Regional Atmospheric Modelling System (RAMS) to give the analyses for: zonal and meridional wind components, temperature, relative humidity, and geopotential height. In order to show its potential, the data assimilation systems applied to produce analyses over Central Europe. For this application the background field is given by a short-range forecast (12 h) of the RAMS and analyses are produced by 2D-Var with 0.25? horizontal resolution. Results show the validity of the analyses because they are closer to the observations, consistently with the settings of the data assimilation system. To quantify the impact of improved initial conditions on the forecast, the analyses are then used as initial conditions of a short-range (6 h) forecast of the RAMS model. The results show that the RMSE is effectively reduced for the one- and two hours forecast, with some improvement for the three-hours forecast.

Modern NWP data assimilation systems use information from a range of sources to provide the best estimate, i.e. the analysis, at a given time. These systems combine information coming from the observations, an a-priori estimate of the atmospheric state (the background or first-guess field), detailed error statistics, and the law of physics.

Nowadays, increased computing power coupled with greater access to real-time asynoptic data is paving the way toward a new generation of high-resolution (i.e., on the order of 10 km or less) operational mesoscale analyses and forecasting systems [1-5]. Moreover, better initial conditions are increasingly considered vital for a range of NWP applications, in particular at the short range (0 - 12 h [6-7]).

This paper shows preliminary results of a data assimilation system, which is under development with the following two purposes: 1) to produce analyses of meteorological parameter; 2) to improve the short-term forecast of atmospheric fields.

The analyses are given for the following parameters: zonal and meridional wind components, temperature, relative humidity, and geopotential height.

The data assimilation system is a stand-alone package that can be used with different backgrounds. However, in this paper it is used in conjunction with the RAMS model [8,9]. So, the data assimilation system uses the RAMS fields as background and the analyses are used to initialize the RAMS model.

The observations used in the data assimilation system are the profiles of the variables of interest, and in particular those distributed through the Global Telecommunication System (GTS).

The main features of the analysis system (2D-Var) used in this work are:

1) Incremental formulation of the cost-function [

2) Preconditioning of the background cost function through a “control variable transformation” U defined as B = UU^{T}, where B is the background error covariance matrix, which is formulated in a simple way.

3) Background error covariances are estimated via the National Meteorological Center (NMC, [

The goals of this paper are the following two: 1) to quantify the performance of the analyses at improving the initial state of the RAMS model; 2) to show the impact of the data assimilation system on the short-range forecast of the RAMS model.

It is important to highlight that a two-dimensional solution (2D-Var) is used to solve a three-dimensional problem, which is a limitation of this work because the vertical correlation of the error is neglected. This causes a loss of information in the analyses, which are less accurate compared to those computed with threeand fourdimensional methods [2,4,5]. Moreover the RAMDAS 4D-Var analysis system is also available for the RAMS model [5,12].

Nevertheless, the adoption of the 2D-Var system of this paper is motivated by the following three reasons: 1) the method is computationally faster, which is important from the operational point of view, and simpler to implement compared to threeand four-dimensional methods; 2) the 2D-Var solution may still produce analyses with a valuable impact on the short-term forecast; 3) a well designed 2D-Var method provides the base for the implementation of more advanced variational systems because many of the algorithms required by 3D-Var and 4D-Var methods (observation operators, minimization packages, background error covariances, etc.) are contained in 2D-Var.

The paper is divided as follows: Section 2 provides details about the method of solution used in this paper; Section 3 shows how the analysis system and the RAMS model are coupled and shows the strategy adopted to achieve the goals of this works; Section 4 gives the results, and; Section 5 gives conclusions.

The basic goal of the 2D-Var algorithm is to produce an optimal estimate of the true atmospheric state at analysis time through iterative solution of a prescribed cost-function [13,14]:

where J(x) is the costfunction, x_{b} is the background state, H is the forward observational operator, y^{o} is the vector of the observations, B, and R are the background, and observational error covariance matrices, respectively.

The problem can be summarized as the iterative solution of Equation (1) to find the analysis state x that minimizes J(x). This solution represents the a posteriori maximum likelihood estimate of the true state of the atmosphere given the two sources of a priori data: the background x_{b} and observations y^{o} [

A preconditioning via a control variable v transform defined by x' = Uv is performed before the minimization of (1) where x' = x − x_{b}. The transform U is chosen to satisfy the relationship B = UU^{T}. Using the incremental formulation [

For the second term we assume that the background x_{b} gives a good estimate of the final state x and we notice that:

where y^{o'} = y^{o} – H(x_{b}) is the innovation vector and H is the jacobian of the potentially nonlinear observation operator H used in the calculation of y^{o'}.

Considering the above results, the Equation (1) may be rewritten as:

In this form, the background term is diagonalized, reducing the number of calculations required from O(n^{2}) to O(n), where n is the dimension of x.

Another goal of the control variable transform is to represent spatial correlations in an accurate and simple form. In the implementation of the 2D-Var scheme of this paper, the transformation U is given by:

where E and L are defined by:

The background error matrix has a Gaussian shape whose length scale is derived by the NMC method, as shown in Appendix A. The background error matrix depends on the background error. In particular, B is an n × n matrix whose element ij is the value of the Gaussian for the distance between the grid points i and j multiplied by. The background and observational errors are introduced in the Appendix A and are derived from the bibliography.

The observational error covariance matrix R is a p × p diagonal matrix whose elements are all equal to the observational error and p is the number of observations available at the analysis time for a level.

The values of the observational and background errors, as well as the length-scale for each parameter depend on the vertical level and the cost-function (1) is minimized for each vertical level.

The Numerical Experiment Set-UpThe background and the forecast are issued by the RAMS model (non-hydrostatic), version 6.0. Its physical setting is summarized in

An important issue in coupling the RAMS model with the data assimilation system is that they use different coordinate systems both in the horizontal and in the vertical. The data assimilation system uses a regularly spaced longitude-latitude grid, while the RAMS model uses a rotated polar stereographic projection, whose pole is rotated near the centre of the domain to minimize the distortion of the projection in the main area of interest.

In the vertical direction, RAMS uses sigma-z terrain following coordinates [

To cope with the differences between the analysis and forecast coordinate systems, two different RAMS settings are used: a “background run” and a “forecast run”. The background run has one domain with 10 km horizontal grid resolution (

Then analyses are performed on the analysis grid, whose domain spans most of Europe (

The analyses are used to initialize a new run of the RAMS model, i.e. the forecast run, whose domain is contained inside the analysis domain, both horizontally and vertically (

The analysis grid (rightmost column) is regularly spaced in longitude and latitude and uses pressure as vertical coordinate.

In the vertical, the RAMS model uses thirty-five levels for the background run and thirty-two levels for the forecast run. Levels are not equally spaced: layers within the Planetary Boundary Layer (PBL) are between 50 and 200 m thick, whereas layers in the middle and upper troposphere are 1000 m thick.

The analysis grid uses thirty-one pressure levels from 1000 hPa to 50 hPa. Pressure levels are spaced every 50 hPa between 800 and 300 hPa, and every 25 hPa below 800 hPa and between 300 hPa and 150 hPa. Above 150 hPa the vertical levels used are: 130, 110, 100, 80, 65, and 50 hPa. This choice enhances the resolution near the surface, and is a compromise between the computing time and the resolution of the analyses.

Observations used in this work are TEMP (both land and ship) reports over Europe and the European wind profiler network.

TEMP reports contain, among others, vertical soundings of relative humidity, temperature, wind speed and direction, and height. The European wind profilers network measures the wind speed and direction in the vertical above the instrument.

Observations were downloaded from MARS (Meteorological Archive and Retrieval System, see also http:// www.ecmwf.int/publications/manuals/mars/) of ECMWF (European Centre for Medium Weather range Forecast) and were available from 1 to 30 August 2008^{1}.

To perform analyses, measurements are interpolated onto the vertical levels of the analysis grid. Temperature and relative humidity are interpolated assuming they are linear in log-pressure. The velocity components are assumed linear in pressure. The same behaviour of the variables with height is assumed to interpolate the fields between the RAMS sigma-z levels and the pressure levels of the analysis grid and vice versa.

Finally, only measurements whose difference with the background is under a fixed threshold are used in the analyses. The thresholds considered in this paper are equal for all levels and are the following: 25 m for geopotential height, 5 K for temperature, 10 m/s for zonal and meridional wind components and 30% for relative humidity. This is the only quality check adopted for the observations, and is used to discard measurements affected by gross errors.

To quantify the impact of the analysis both in the improvement of the initial state and in the short-term forecast of the RAMS model, the following strategy is adopted (

After 12 h of background run, an analysis is made at 12 UTC. This hour was chosen because there are several reporting TEMP and wind profiler reports, which can be used to analyse the parameters considered in this paper.

Starting at the analysis time (12 UTC), a short-term RAMS forecast, lasting 6 h, is made using the forecast grid. For this run, the initial conditions are given by the analyses produced at 12 UTC, while the boundary conditions after 6 h are taken form the ECMWF operational analysis/240 h forecast cycle and are the same as the background run.

It is important to highlight that observations used at the analysis time are not used in the ECMWF 240 h forecast, which gives the boundary conditions for the background run and for the forecast run after the initialization time. The ECMWF 240 h forecast uses observations form a 6 h time window centered around the forecast initial time, (see http://www.ecmwf.int/products/ forecasts/guide/The_ECMWF_early_delivery_system.html). So the 12 UTC observations are used only in the initial conditions of the forecast run.

The root mean square error (RMSE) is computed between the background fields and observations, and between the forecast fields and observations for the whole period on the common forecast grid (

Finally, because the data for 10 August were not available, a total of thirty background run, analyses, and forecast run were collected for the whole period.

Hereafter the RMSE computed between the background run and the observations at a fixed time and for the whole period is referred as the background error (RMSE_b). Similarly, the RMSE computed between the forecast run and the observations at a fixed time and for the whole period is referred as the forecast error (RMSE_f). For the computation of both RMSEs, the grid point nearest to the observation is considered and the statistics are computed on the common forecast-grid domain (

It should also be emphasized that RMSE_f at the analysis time is computed after the analyses are used to initialize the RAMS model. So, the difference between the RMSE_b and RMSE_f accounts for the errors introduced by the interpolation between the RAMS and analysis grids.

From

The error reduction of this simple ideal case is never attained because: 1) the observations for each level are usually more than one and the innovations of these measurements, i.e. the differences between the background and observations, interact with each other in the analysis; 2) the difference between RMSE_f and RMSE_b of

It is important to note the decrease of the performance of the analysis with increasing height, as shown by the decrease of the difference between RMSE_b and RMSE_f with height. This occurs because the vertical resolution of the analysis grid decreases at higher levels^{2}, and the errors introduced by the vertical interpolation between the analysis and RAMS grids are larger.