Adaptive Filter for High Dimensional Inverse Engineering Problems: From Theory to Practical Implementation

The inverse engineering problems approach is a discipline that is growing very rapidly. The inverse problems we consider here concern the way to determine the state and/or parameters of the physical system of interest using observed measurements. In this context the filtering algorithms constitute a key tool to offer improvements of our knowledge on the system state, its forecast… which are essential, in particular, for oceanographic and meteorologic operational systems. The objective of this paper is to give an overview on how one can design a simple, no time-consuming Reduced-Order Adaptive Filter (ROAF) to solve the inverse engineering problems with high forecasting performance in very high dimensional environment.


Introduction
Filtering algorithm constitutes a key tool to offer improvements of system forecast in engineering systems, in particular for oceanographic and meteorologic operational systems.Theoretically, the optimal filtering algorithms provide the best estimate for the system state based on all available observations.As many engineering problems are expressed mathematically by means of a set of partial differential equations together with initial and/or boundary conditions, their numerical solutions result on system state with very high dimension (order of ).In this context the adaptive filter (AF) for state and parameter estimation is an attractive topic for the last decades [1].Traditionally, the AF is a filter that self-adjusts its parameters (of transfer function) to minimize an error signal.The corresponding nonadaptive filter has then a structure obtained by conventional methods.The AF uses feedback in the form of an error signal to adjust its tuning parameters to optimize the filter performance.It offers an attractive solution to filtering problems in an environment of unknown statistics, provides a significant improvement in performance.AF is successfully applied in many fields as communications, control, noise cancellation, signal prediction, radar, sonar, seismology, biomedical engineering...One other potential field of application of the AF is numerical weather prediction, in particular, data assimilation.It uses mathematical models of the atmosphere and ocean to predict the weather based on current estimates of weather state [2].With vast data sets and dimensions of the system state of order , data assimilation requires the most powerful supercomputers in the world.Even though, at the present, for operational purposes we must be satisfied with simplified assimilation methods, not to say on implementation of sophisticated forecast optimal techniques like a Kalman filter (KF) [3] in near future due to requirement on huge memory storage and time evolving the system of equations of dimension The difficulties encountered here concern not simply very high dimension of the system state.Factors affecting the accuracy of numerical predictions include the uncertainty in model error statistics, a more fundamental problem lying in the chaotic nature of the partial differential equations used, the density and quality of observations...These difficulties require an another approach to the design of assimilation systems for improving the performance of weather forecasting skills.The objective of this paper is to give an overview of how one can design a simple, no time-consuming Reduced-Order AF (ROAF) with high forecast-ing performance to handle assimilation problems in the very high dimensional environment.
In the section that follows a brief outline of the theory of ROAF is given.The steps in implementation of the ROAF are detailed in Section 3. Numerical experiments are formulated in Sections 4 and 5 where the problems of estimation of the trajectory of the Lorenz system as well as assimilation of sea surface height (SSH) into the oceanic model are exhibited.Section 6 includes the conclusions.

Why the Adaptive Filter
The AF, in the context of that designed for estimating the state of very high dimensional system, originates from the work [4] which is defined as a filter minimizing the mean prediction error (MPE) of the system output.One of the main features of the AF is that it is optimal among the members of a set of parametrized state-space innovation representations, with the vector of pertinent parameters of the gain as a control vector.Consider the standard filtering problem here         , : , : , , are uncorrelated sequences of zero mean and timeinvariant covariance and respectively.For simplicity, in (2) The optimal in mean squared filter is given then by the KF, For systems with a dimension of order under most favorable conditions (perfect knowledge of all system parameters and noise statistics...) it is impossible to implement the filter (4) in the most powerful computer systems in the world.

-,
In order to exploit the optimal structure of the filter like (4a), noticing that in (4a) all the variables are well defined except the gain   K k .Moreover, under mild conditions, the gain Thus if instead of all equations in (4) we use only the first equation with assumption that constant, there is an expectation that one can design a filter which is close to the optimal one if K is close to  .
K  However such asymptotically optimal filter is realizable only if the system dimension is relatively small.To be able to work with any system independently of its dimensionality, in [4] the filter is assumed to be of the form (4) with the gain well defined up to the vector of unknown parameters .
 Then instead of finding all elements of K , the task is reduced to seeking an optimal  of dimension n  , which, as expected, is much less than The best way to carry out this task is to introduce the MPE objective function [4] .n where   E  is the mathematical expectation operator,  denotes the 2 vector norm, l a one-step-ahead predictor for Mention that the gain in the KF minimizes the objective functions (5a) and (5b) too (under the condition on perfect knowledge of all noise statistics and system parameters...) since  .
represents the innovation vector.Introduction of (5) gives us a great advantage in the design of an ROAF since its minimization can be performed by a simple stochastic approximation (SA) algorithm [5]: At each assimilation instant one has to compute only the gradient of the sample objective function . This makes the task filter design much simpler compared to the KF where the estimation error is minimized in the probability space hence requires the knowledge of probability density functions of all entering random variables.As to Four-Dimensional Variational (4D-Var) [6], the optimization is performed over all assimilation period by iterative algorithms which require, at each assimilation instant, about 20 iterations (each iteration includes one forward model integration and one backward adjoint integration) to approach an optimal control vector.The principal differences between two approaches, 4D-Var and AF, are listed in Table 1.

Order Reduction
The ROAF described below has been introduced in [4] in order to reduce the number of estimated parameters in where projects correction from the reduced space into the full space, e  e nxn r e P R n n   K represents the gain in the reduced space.In [7,8] the choice of r and parameterization of the gain e P K are studied from the point of view of filter stability.It is well known whatever is a filter, the question of ensuring its stability is of the first importance: instability causes the error growth and it can completely destroy the filter.It turns out that under detectability condition, stability of the filter can be guaranteed if the columns of r is constructed from a subspace of unstable and neutral eigenvectors (EVs) or singular vectors (SVs) of .From a computational point of view, it is inadvisable to construct r from leading eigenvectors or singular vectors: the eigenvectors may be complex, non-orthogonal hence their computation suffers from numerical instability.A great advantage of the singular vector approach is that it does not suffer from numerical instability.There exists the efficient Lanczos algorithm [9] for computing the leading singular vectors of very large sparse systems.However, definition of singular vectors depends on an introduced vector norm and their computation requires the adjoint of For many physical problems, the latter represents a very hard task, not to say about the possible existence of discontinuities when obtaining the tangent linear model (TLM) . The third approach is related to leading (or dominant) real Schur vectors (ScVs).As proved in [7], the real dominant ScVs (DScVs) play the same role in ensuring filter stability as the dominant EVs or SVs.The Schur approach enjoys all advantages of the singular vector approach and in addition, it does not require the adjoint code.Computation of DScVs is closely associated with searching the direction of rapid growth of the prediction errors (PE).In [10] the PE sampling procedure (SP) based on DScVs is proposed which is aimed at generating the PE samples for the DScVs.These PE samples (will be referred to as DPE (dominant PE) samples) will participate in the construction of r or in the estimation of the parameters of to initialize the gain.

Gain Parametrization
In [7] different structures of e K are proposed which ensure a stability of the filter (6) if r is constructed from either dominant EVs or DScVs.For

Numerical Approximation of r P
Generated PE samples share the important property to be iteratively developed in directions of rapid growth of the prediction error.The columns of constructed from DScVs, allow for capturing the most growing PEs.In [10] the PE sampling procedure is proposed for generating the DScVs samples: x i -some estimate for the true system state   x ibe given.The estimation error is . Integration of the model from f x i produces the prediction represents model integration over the interval  .As the true system state at i is


by the model  yields the vector       which can be considered as a PE pattern growing over the period of model integration t  .If we apply this procedure to an ensemble of orthogonal columns The operator r thus can be approximated by using the columns-vectors of i which belong to the subspace generated by columns of i P S X .The filter with the gain constructed on the basis of PE samples will be referred to as a PE filter (PEF).

Optimization
From a practical point of view, it is often desirable to ensure more than stability for the designed filter.One of the advantages of the AF is that it is self-optimizing, i.e. it is designed to enable self-minimization of the prediction error by tuning the gain parameters.In this sense, the AF can improve its performance that the traditional filters cannot.The objective ( 5) is based on the deep philosophical idea postulating that the best model should be able to predict the future real process with high precision.As uncertainty in the model always exists, even the KF cannot produce the best estimation in such situation.
From the computational point of view, the choice of the criteria (5) allows us to greatly simplify the optimization procedure.Since the optimality is understood in the mean probabilistic sense, simple optimization tools known as stochastic approximation (SA), and in particular, a simultaneous perturbation SA (SPSA) [5], are quite appropriate for seeking the optimal parameters.With the SPSA, one can compute the gradient vector by additional two times integrating the numerical model.The algorithm is independently on dimension of the vector of tuning parameters.Thus no adjoint equation (AE) for is needed for the optimization procedure whose construction is a hard task, not to say on the computational time and the cost of integrating AE, possible existence of discontinuities in the numerical model, of non-linearities...For more details, see [11].T  

Practical Implementation of ROAF
The following steps are to be proceeded in order to construct a ROAF.
Step 1. Simulation of an ensemble of DPE samples: (i) Choice L-a number of DScVs; (ii) Apply the Sampling Procedure (SP) in [10] to generate an ensemble of PE samples for L DScVs.
Step 2. Estimation of the filter gain: (i) Choice of gain structure; (ii) Estimation of the gain elements from generated PE samples (iii) Parametrization of the gain with  as a vector of unknown parameters.
Step 3. Optimization of filter performance: Apply the SPSA algorithm to estimate the tuning parameters  .
This algorithm is of the form (see [5]) where  can be chosen as random variable having the symmetric Bernoulli (+/-) 1 distribution.
For sufficient conditions for convergence of the SPSA iterate   , k     see [5].In [11] a comparative study on the efficiency of the SPSA with respect to other standard optimization methods shows that the SPSA is more efficient when it is applied to optimizing nonlinear systems and/or when more and more observations are assimilated (the experiment on estimation of the vehicle's present position and velocity).
Comment 3.1.Due to orthognormalization i i i X G S  , the columns of i X are normalized, as a consequence the columns of represent only the direction but not the amplitude of PE.For the renormalization procedure, see Comment 2.1 in [10].
, the SP generates PE samples using only the numerical model.We have then a so called offline SP (denoted as SP1).In the so-called on-line SP (SP2) the samples are generated during the filtering process.The SP2 simulates the PE by integrating the model from the filtered and its perturbed estimate.The renormalization process can be performed then more precisely if there is a possibility to get some information on the ECM matrix   P k of the filtered error The PE patterns in SP2 are generated by integrating the model from

 
x k and     k at each assimilation instant.They can be used to correct the ECM obtained by the off-line SP1 and to improve the filter performance.
Comment 3.3.For the systems with moderate state dimensions (see the Lorenz system in Section 4), all the elements of the matrix 4) can be estimated from the ensemble , for example, by where T 1 1     represents a re-normalization factor.
For very high dimensional systems, the matrix e M obtained from (10a) is rank deficient.It is therefore advisable to select a well defined structure of the gain with a small number of parameters to be estimated from PE samples.See Section 5 for more details.

Experiment on the Lorenz System [10]
The Lorenz attractor is a chaotic map, which shows how the state of a dynamical system evolves over time in a complex, non-repeating pattern [12].

Problem Statement
The equations that govern the Lorenz attractor are: where  is called the Prandtl number and  is called the Rayleigh number.All , , 0, and  is varied.The system exhibits chaotic behavior for 28   but displays knotted periodic orbits for other values of  .In the experiments to follow (see also [13]) the parameters , ,    are cho- sen to have the values 10, 28 and 8 3 for which the "butterfly" attractor exists.
It is assumed that the observations arrive at the moments 1The system (11) is discretized using the Euler method, with the model time step 1 , 1, 1, ,100.
. Thus the observations are available after each 200 steps of model integration.The dynamical system corresponding to the transition of the states between two time instants k and 1 k thus can be represented in the form (1) with nonlinear simulates the model error.
The sequence   w k is assumed to be a white noise having variance 2, 12.13 and 12.13 respectively.As to the observation system, the operator is modeled as the solution of (1) (2) subject to added by the Gaussian noise with zero mean and variance 2. The problem considered in this experiment is to apply the PEF and EnKF for estimating the true system state using the observations and to compare their performances (see [10] for details).

The EnKF, PEF and Assimilation
The SP1 has been applied to generate the ensemble of patterns .The pattern the interval equal to that between two observation arrival times.After T iterations, the ECM M is estimated using Equations ( 10a) and (10b) on the basis of , .The gain of the PEF is computed according to the equation for K in (4) subject to the obtained time-invariant M (for a fixed T).According to [14], the gain in the EnKF is updated using the associated Riccati equation and an ensemble of perturbations of the size at each assimilation instant  Evolution of the gain element (2,1) in the EnKF subject to 10 L 0  is shown in Figure 1.Mention that for L = 3, L = 30 (not shown here) the variations of the gain in the EnKF are much stronger than for .It happens since the perturbations in the EnKF are of random characters.The variation becomes less important with increase in the ensemble size.As reported in [15], the EnKF can yield an accurate performance if the ECM is estimated on the basis of the ensemble of 1000 samples.

L 
The corresponding gain element in the PEF is shown in Figure 1 too.The curves PEF-L1, PEF-L2, PEF-L3 correspond to three simulations subject to 1, 2, 3 L  respectively.They are sufficiently smooth after about 20 iterations and behave in a similar way.Generally speaking, the gain element (2,1) in the PEF is greater than that in the EnKF.
Time-average Root-Mean Squrare (RMS) of the FE (RMS-FE), resulting in two filters EnKF and PEF subject to different gains, are presented in Figure 2. In the PEF the gains are time-invariant assuming the values at 100 T  (see Figure 1).At the end of the assimilation process, with exception of the EnKF-3, all the filters yield almost the same error level.The RMS of the PE (RMS-PE) produced by the EnKF-3 is too high   RMS  10.4 , i.e. being 1.42 times higher than that of the PEF-L1 (with RMS 7.3  ).It is important to stress that due to very slow convergence rate in the Monte-Carlo method, the performance of the EnKF can be improved only at an expensive computational cost since the  ensembles of very large size must be simulated.
In the Lorenz system, for the PEF, the maximal ensemble size (at each time instant) is equal to the dimension of the system state, i.e. equal to 3. Implementation of the EnKF-30 requires, at each assimilation instant, 30 times integration of the numerical model.That is 10 times greater than the dimension of model state.As reported in [15], the EnKF with 19 ensembles produces the estimation error of 3 times higher than that based on 1000 ensembles.In contrast, even with , the performance of the PEF is comparable with that of the EnKF-100.1 L 

Numerical Model
In this section the ROAF is constructed and applied to assimilate the Sea Surface Height (or topography or relief of the ocean's surface).The ocean model used here is the Miami Isopycnal Coordinate Ocean Model (MICOM).For the detailed description of the model, see [16].The model configuration is a domain situated in the North Atlantic from 30.0 N to 60.0 N and 80.0 W to 44.0 W. The grid spacing is about in longitude and in latitude, requiring N h = N x × N y = 25200 (N x = 140, N y = 180) horizontal grid points.The number of layers in the model is We note that the state of the model is where is the thickness of the lr-th layer, are two meridional and zonal velocity components.The state of the model has the dimension In the twin experiments to follow it is assumed that we are given the noise-free observations each (days) not at all grid points at the surface but only at the points , ,

SSH Observations: Reduced-Order Filter
The assimilation problem can be formulated in the form F  represents the integration of the nonlinear model over 10 , is the filter gain, is the innovation vector.The gain has the structure The operator oi will interpolate the missing observations from observed points to all horizontal grid points.Symbolically where ,

Structure of ECM and Its Estimation
As mentioned in Comment 3.3 (Section 3), the formulas in (10a) and (10b) are appropriate for estimating all the elements of the ECM

 
M k only if the system has a moderate dimension (see the Lorenz system in the previous sections).For very high dimensional systems, instead of (10a) and (10b) we introduce the structure

 
1 , where  denotes the Kronecker product; .m  can be chosen a priori from physical considerations or estimated from PE patterns.In the well-known Cooper-Haines filter (CHF, see [16,17]), the elements where span all horizontal grid points whose number is equal to As the ensemble In what follows we reserve the notation PEF for the filter subject to ECM (12)  .
For the present MICOM model, , the gain in the CHF [2] is equal to

 
It is seen that the patterns generated by SP1 allow to well estimate the gain coefficients and as consequence, to improve significantly the performance of PEF compared to that of the CHF: in average, the reduction is of order 50% for the SSH-PE.As to the velocity estimation error, the reduction is of order 40% [11].The algorithm SPSA (9a)-(9c) has been applied to minimize the objective function (5). in the adaptive PEF (APEF) varies during adaptation whereas the gain in the PEF is constant.As expected, self-adjusting the gain parameters allows the APEF to reduce significantly the estimation error that the PEF cannot (see Figure 6).

On-Line SP2
In order to search other opportunities to reduce the estimation errors, the SP2 has been applied along with a re-normalization procedure, without or with the use of Riccati like equation (see Comment 3.2).This allows us to utilize the PE samples generated during assimilation.First we assume that is obtained from patterns gene- , the ECM of the FE is estimated using the Riccati like equation In Figure 8 -RIC" repres e SSH PE error pro s  the curve "PEF ents th duced by the PEF who e gain is updated on the basis of 10 L  PE samples simulated at each assimilation instant.It is clear that the "PEF-RIC" behaves better than the "PEF", especially as assimilation progresses.The price to pay here (compared to PEF) is that one needs to integrate in addition L times the numerical model at each assimilation instant .k

Summary
Theory and practical implementation of the ROAF are presented in this paper.This offers a unified approach to the design of an efficient ROAF with low computational  cost.It is developed to overcome e difficulties encountered in the filtering problems with efficient w and computer memory th very high dimensionality of the system state, uncertainties in the system description, non-linearities...The purpose of this paper is not only to give a comprehensive understanding the basic ideas behind this theory but also intended to provide the potential practitioners a guide for implementation of the ROAF.The ROAF design essentially consists of: 1) Choice of filter gain; 2) Application of a rather simple, but quite powerful PE sampling procedure for generating the PE samples which will participate in the construction of projection subspace and/or initialization of the filter gain; 3) Parametrization of the filter gain and optimization of filter performance by implementing an efficient and low-cost SPSA algorithm which allows to evaluate the gradient of the objective function by 2 times integration of the numerical model.
We mention that compared with traditional optimization methods, the SPSA is proved to be more hen working with the non-linear system and/or when more and more observations are assimilated [11].Due to space limits of this paper we haven't presented here another important class of filters based on Markovian representation for the PE of system output [18].This class of filters is proposed by observing that when the designed filter is non-optimal (that is the case in very high be SPD.For adaptation purpose, the following stabilizing gain structure is of more interest due to linear dependence of the gain on the tuning parameters 
k using the geostrophy hypothesis.This gain structure can be obtained also by assuming that the covariance M has the vertical and horizontal separable structure.By considering   oi P z k instead of   z k , the observation operator can be regarded as , , unit matrix of dimension   pxp ( is the number of all horizontal grid points).

zN
is the number of thickness layers in the model, , l m  is a scalar representing the covariance of the PE between two layers and The elements , l m l several physical constraints (conservation of potential vorticity, no motion at the bottom layer...).In the PEF, , l m  are estimated from DPE patterns.Applying the SP1 subject to 1 L  yields the ensemble of DPE patterns

4 .Figure 3  5 .Figure 4
Figure 3 shows the gain coefficient in the PEF obtained as functions of iteration resulting during application of SP1.Here   3 k i

Figure 5
shows how the gain component   1 k

Figure 4 .
Figure 4. Variances of SSH innovation in CHF and PEF.

Figure 6 .Figure 7
Figure 6.Sample cost functions in PEF and APEF.ated r by SP1 and

Figure 8 .
Figure 8. PEF with the gain updated during assimilation by Riccati like equation.

.
an uncorrelated random process and one natural way to improve the filter performance is lude a procedure for whitening the sequence   k This task can be done efficiently by assuming   k  to be a solution of a Markov equation, along with tunin nown parameters to minimize the variance of th system output.