Elastic Full Waveform Inversion Based on the Trust Region Strategy

In this paper, we investigate the elastic wave full-waveform inversion (FWI) based on the trust region method. The FWI is an optimization problem of minimizing the misfit between the observed data and simulated data. Usually, the line search method is used to update the model parameters iteratively. The line search method generates a search direction first and then finds a suitable step length along the direction. In the trust region method, it defines a trial step length within a certain neighborhood of the current iterate point and then solves a trust region subproblem. The theoretical methods for the trust region FWI with the Newton type method are described. The algorithms for the truncated Newton method with the line search strategy and for the Gauss-Newton method with the trust region strategy are presented. Numerical computations of FWI for the Marmousi model by the L-BFGS method, the Gauss-Newton method and the truncated Newton method are completed. The comparisons between the line search strategy and the trust region strategy are given and show that the trust region method is more efficient than the line search method and both the Gauss-Newton and truncated Newton methods are more accurate than the L-BFGS method.


Introduction
The physical properties of the earth can be inverted quantitatively by the reflected seismic waves. In modern seismology, traveltime inversion is the main olution for reconstructing media parameters.
In FWI, the Fréchet derivative is required to estimate explicitly before proceeding to the inversion of the linearized system [3]. However, the computational cost of calculating Fréchet or gradient is too large. So some other methods have been taken into account. In [1] [2], the authors proposed to compute the gradient by solving an adjoint problem, which is considered as the initial of timedomain FWI. Since then, the FWI has been an important research topic in exploration geophysics. In the 1990s, the time-domain FWI was extended to the frequency domain [4] [5]. The frequency-domain FWI has the advantage of high computational efficiency and flexibility of data selection [6].
The FWI is actually an optimization iterative process to solve a nonlinear optimization problem. Generally, the computational cost of global optimization methods is extensive large. For this reason, local optimization algorithms are adopted. To avoid iteration falling to an unreasonable local minimum point, the multiscale method was proposed [7] in 1995. The multiscale method performs the inversion from the low frequency to high-frequency information step by step. It includes the long-wavelength information of the model and thus it enhances the robustness of inversion effectively. It can overcome the problem of local minima caused by the poor initial model selection. This technique has been successfully applied to the time-domain acoustic FWI, see e.g. [8] [9]. In order to reconstruct the long wavelength of the macroscopic velocity model, the Laplace-domain FWI is proposed [10]. A hybrid domain method combining the Laplace-domain method and the frequency-domain method together is also developed [11]. This method can be regarded as a special case of frequency-domain damping inversion.
In elastic FWI, there are three different parameters, i.e., density and two Lamé parameters. In [12], the authors pointed out that different parameters have coupled effects on the seismic response which is called the trade-off effect or crosstalk. In order to tame the trade-off effect, several hierarchical inversion strategies are developed. In [13], Lamé parameters are inverted with fixed density first and then velocities are updated. In [14], all parameters are inverted simultaneously. However, the overall steplength is estimated by calculating an optimal steplength for every parameter individually.
The Hessian matrix plays an important role in FWI. It contains the information of multiple scattering wavefields and the trade-off effects among different parameters. The off-diagonal blocks of Hessian reflect the relationship between different parameters [12]. The Newton-type method such as the quasi-Newton method, the Gauss-Newton method [15] and the truncated Newton method [16] all contain the Hessian information and so they behave better than the steepest descent method in accuracy. In the calculation of the gradient vector, the adjoint method is usually used [17]. The adjoint method can be extended to the truncated Newton method [18]. For the elastic wave equations, the wave propagator operator is symmetric. The back propagator operator and the forward propaga-American Journal of Computational Mathematics tor operator are identical. Besides that, the Gauss-Newton approximation of Hessian matrix is positive and definite. This property can be used in the trust region strategy. Our investigations in this paper show that the trust region method can accelerate the convergence rate and improve the inversion accuracy.
In this paper, we investigate the application of the trust region strategy in FWI. The rest of our paper is organized as follows. In Section 2, the theoretical methods are described in detail. In Section 3, numerical computations and comparisons are presented. Finally, in Section 4, the conclusion is given.

Theory
In this section, we will introduce the forward problem first. Then we will describe FWI based on the trust region method. In order to solve the subproblem in the trust region algorithm, the two dimensional subspace method is described.

Forward Method
The source-free time-domain two dimensional (2D) elastic wave equations can be written as [19]  , where ρ is the density, λ and µ are the Lamé parameters, ( ) where x v and y v are the particle velocity components in the x and y directions respectively, xx σ , yy σ , xy σ are the stress tensor components. Here we have added the source, i.e., the body force x f and y f vector to x v and y v components respectively. We rewrite the system (2.3) in the following matrix form The symbol † denotes the adjoint operator. Note that the determinant of the matrix C is positive. So from (2.4) and (2.5), we have The compact form of the forward problem is is the model parameters, w v σ . In (2.7), the wave propagator ( ) A m is symmetric. Given the media parameters ρ , λ and µ , the forward problem is to solve system (2.6) numerically.
We have many numerical methods to solve system (2.6), for example, the finite volume method [20] and the staggered-grid method [21] and so on. In this paper, we use the staggered-grid method for its high computational efficiency and relatively easy code implementation. Since the computational domain is finite, the absorbing boundary conditions [22] are required to absorb the boundary reflections. In this paper, we adopt the perfectly matched layer (PML) method [23] [24]. The forward discrete schemes for solving system (2.3) are given in Appendix A.

Full Waveform Inversion
The FWI is a data-fitting procedure to find the model parameter m by minimizing the data misfit δ d between the simulated data cal w and the observed data obs w . The objective function can be formulated as where T is the total recording duration, s N and r N are the number of the sources and receivers respectively. Once the media parameters ρ , λ , and µ are inverted, we can yield the P-wave velocity and S-wave velocity by Neglecting the summation in (2.8) and introducing the restriction operator R at receiver points, then the gradient of the objective function with respect to the model parameter m is where † is the adjoint operator. Furthermore, the Hessian matrix can be obtained by taking derivative of (2.11) with respect to the model parameter The expression of Hessian H is divided into two parts, i.e., 1 H and 2 H . The first part 1 H is the Gauss-Newton approximation of Hessian [15], which just contains the 1st-order derivative of the wavefield. The second part

Gradient Calculation
The gradient of the objective function can be obtained by the Lagrangian formulation [17] [18] [28] or the perturbation theory [29]. Following the perturbation theory [29], the gradient of the objective function with respect to model parameters can be rewritten as where : We solve the forward problem based on (2.3) instead of (2.1)-(2.2) by the staggered-grid method. So it is necessary to express the gradient (2.13)-(2.15) with the particle velocity and the stress tensor. The results are the following σ , yy σ and xy σ are the solutions of corresponding adjoint problem. The derivation is given in Appendix B.

Trust Region Method
There are two strategies to solve a minimization problem. One is the line search method and the other is the trust region method. In order to describe the trustregion method more clearly and directly, we consider minimizing an abstract Then the line search strategy is The line search chooses a direction k p first and then finds the step length k α along this direction. The direction may be the steepest descent direction, the Newton direction and so on. After fixing the search direction, the search length Usually, the inexact line search methods such as the Wolfe method and the interpolation method are used [26] since solving (2.20) exactly is too expensive. A popular inexact method is the Wolfe method [30]. In this paper, we use the Wolfe method.
In the trust region method, information gathered from objective function ( ) f x is used to construct a model function k f  whose behavior near the current point k x is similar to that of the objective function k f . Because k f  may not be a good approximation of f when x is far from k x , we restrict the search for a minimizer of k f  to some region around k x . Namely, we find the step p by approximately solving the following subproblem where k ∆ is the radius. This is the trust region subproblem and we solve it by the two dimensional subspace method [26]. Note that the two dimensional subspace method requires the Hessian is symmetric and positive. In a sense, the line search method and the trust region method differ in that the line search method starts by fixing the direction k p and then searches the step length k α whereas in trust region the direction k p and the search step k α are determined together subject to the trust radius k ∆ . Theoretically, the trust region is easier to reach quadric convergence than the line search method. More theoretical details can be found in the references (e.g., [31] [32] [33]).
One key step in the trust region method is the choice of the trust region radius k ∆ at each iteration. The choice is based on the agreement between the objective function f and the model function k f  at previous iteration. Given a step k p , we define the ratio k ρ as where the numerator is the actual reduction, and the denominator is the predicted reduction. If k p reaches the boundary of the trust region or the ratio k ρ is satisfactory enough, the radius is increased by a constant time: If the ratio k ρ is too small, which means that the model function k f  is not a good approximation to the objective function f under the radius, then we reduce the trust region radius: In this paper, we choose 1 2 α = and 2 1 4 α = .

FWI Algorithms
In this section, we present the algorithm of the truncated Newton method with the line search strategy and the algorithm of the Gauss-Newton method with the trust-region strategy.

Numerical Computations
In this section, numerical computations are presented to illustrate the advantages of the trust region method. We also present the inversion results by the L-BFGS method for comparisons. We choose the L-BFGS method to make comparison as it is the most popular quasi-Newton method.
We consider FWI for the Marmousi model which is usually applied to test the ability of an imaging or inversion algorithm. The exact Marmousi model is shown in Figure 1. Figures 1(a) where For the elastic wave FWI, the density is more sensitive than the velocities.
Some hierarchical inversion strategies developed by [13] and [14] can be applied.
In our FWI computations, we don't apply these strategies. We invert the three media parameters simultaneously. However, a multiscale technique by implementing the inversion from low frequency to high frequency is still used. Moreover, the data of the next high frequency band include the data of the previous low frequency band which guarantees the robustness of inversion. In our computations, four groups of frequencies are used to complete inversion step by step, i.e., 2 Hz, 5 Hz, 10 Hz and 20 Hz. For every stage, the stopping criterion is where ( )       In Table 1, we present the iteration number at every stage (i.e., 2 Hz, 5 Hz, 10 Hz and 20 Hz) for the L-BFGS method, the Gauss-Newton (GN) method and the truncated Newton (TN) method by the line search (LS) strategy or the two dimensional subspace (Sub) strategy. From Table 1, we can see that the Gauss-Newton method has obvious advantage over other methods from point of total iteration number.
In Table 2, we compare the computational efficiency for different methods.
The total computational time and the time of each iteration are listed in Table 2.
The total iteration number is also listed for convenience. As we can see, the Gauss-Newton method with the two dimensional subspace method costs least computational time for each iteration.
In Figure 7, we present a profile comparison of the inversion results at    Table 3, we present 2 l errors and relative 2 l errors for the different methods. The relative error ε is defined by

Conclusions
In FWI, the line search strategy is usually applied. In this paper, we have investigated the application of the trust-region strategy to elastic wave multi-parameter inversion. The theoretical methods and algorithms are described in detail. The trust-region subproblem is solved by the two-dimensional subspace method. Numerical
From the relation of strain and stress [19], we have 2 ,