On the Generalization of Seismic Tomography Algorithms

The seismic tomography problem often leads to underdetermined and inconsistent system of equations. Solving these problems, care must be taken to keep the propagation of data errors under control. Especially, the nonGaussian nature of the noise distribution (for example outliers in the data sets) can cause appreciable distortions in the tomographic imaging. In order to reduce the sensitivity to outlier, some generalized tomography algorithms are proposed in the paper. The weighted version of the Conjugate Gradient method is combined with the Iteratively Reweighted Least Squares (IRLS) procedure leading to a robust tomography method (W-CGRAD). The generalized version of the SIRT method is introduced in which the (Cauchy-Steiner) weighted average of the ART corrections is used. The proposed tomography algorithms are tested for a small sized tomography example by using synthetic traveltime data. It is proved that—compared to their traditional versions—the outlier sensitivities of the generalized tomography methods are sufficiently reduced.


Introduction
The classic geophysical tomography problem is solved to reconstruct the velocity distribution for the investigated portion of the Earth such that the projected data (the traveltimes) should agree with measurements.In practice, this is done by solving a large, sparse, least-squares problem.Traditionally, least squares problems in tomography have been solved by row action methods such as Algebraic Reconstruction Technique (ART) or Simultaneous Iterative Reconstruction Technique (SIRT).It was proved by [1] that the Conjugate Gradient (CG) method can be used even in large-scale tomographic least squares inversion with taking the advantage of the sparsity of the matrix of the problem.This low cost CG algorithm of Scales was applied in solving the double trace (DT) tomography problem [2].
It is well-known that least squares methods (LSQ) give optimal results only when the data noises follow a Gaussian distribution, or in other words the least squares solution is very sensitive to sparsely distributed large errors, i.e. outliers in the data set.The practice shows that the distribution of the errors in the measured data is seldom Gaussian so that the use of the least squares method cannot be optimal.
There are various ways to address the question of the statistical robustness.One of the most frequently used methods in robust optimization is the Least Absolute Deviation (LAD) method.In this case, L1 norm is used to characterize the misfit between the observed and predicted data.The inversion with minimization of the L1 norm gives more reliable estimates when a small number of large errors contaminate the data.Another possibility is to use the Cauchy criterion.In this case, the Cauchy distribution of data noise is assumed.Cauchy inversion is also frequently used in the geophysical inversion as a robust optimization method [3].However, there is a problem with Cauchy inversion: the scale parameter of the weights should be a priori known.An improved algorithm was developed by Steiner (1988) in the framework of the method of Most Frequent Values (MFV) [4] in which the scale parameters are automatically derived from the data set.
As it is well-known, various tools of image processing are applicable in seismic tomography.It was proved by [5] that smoothing the tomograms (at the end of the reconstruction) by the use of the so-called alpha-trimmed mean, the distortions caused by the outliers can appreciably be reduced.In this paper we report on generalization of two items, in that we apply Cauchy-Steiner weights (Cauchy weights with scale parameters automatically determined by using Steiner's MFV method) in the original Conjugate Gradient tomography algorithm in order to minimize the weighted norm of the deviation vector.The generalization of the Simultaneous Iterative Reconstruction Technique (SIRT) is proposed and in the data space Cauchy-Steiner weights are used to calculate the weighted average of the ART corrections (W-SIRT).

The IRLS Technique Using Cauchy-Steiner Weights
The realization of robust inversion can be achieved in various ways.To handle the Cauchy inversion problem the Gradient Method was used by [3].To solve the least-absolute-deviation problem, the method of Iteratively Reweighted Least Squares (IRLS) was proposed by [6].Combining the two methods the IRLS algorithm can be applied using Cauchy weights, defined as , where 2 σ is the scale parameter and k r is k-th residual.The scale parameter 2 σ of the Cauchy distribution can- not be given a priori because the data residuals change from iteration to iteration.A condition for the lower bound of the scale parameter was derived by [3].
In the framework of Steiner's (MFV) method [4] the scale parameter 2 σ can be determined in an internal iteration.In the (j+1)-th step of this procedure the 2 1 j ε + (Steiner's scale factor) can be calculated in the know- ledge of 2 j ε as ( ) where in the 0-th step the 0 ε starting value is given as It can be seen that the above procedure derives the scale parameter from the data set (deviation between measured and calculated data).The stop criterion can be easily defined by experiences (for example by fixed number of iterations).After this the Cauchy-Steiner weights can be calculated by inserting the Steiner's scale parameter (given in the last step of the above internal iterations) into the Cauchy formula (1) which gives the form The same weights were used in [7] for the joint inversion of in-mine measured geoelectric and seismic data and recently in [8] for creating robust inversion based Fourier transformation.
In order to minimize the weighted norm ( ) the method of Iteratively Reweighted Least Squares is used.In the framework of this algorithm a 0-th order solution is derived by using the (non-weighted) LSQ method and the weights are calculated as where the ( ) traveltimes are calculated on the slowness field given by solving the LSQ problem.In the first iteration, the misfit function is minimized resulting in the linear set of normal equations of the weighted Least Squares method, where the Here D is the distance matrix with the kj D elements giving the length of the ray section in the j-th cell be- longing to the k-th ray, s  is the slowness, t  is the traveltime.This procedure is repeated giving the typical j-th iteration step ( ) ( ) with the ( ) .
(In these steps the normal equation is linear, because the weights are always calculated in a previous step.Here we note that each step of these iterations contain an internal loop for the determination of the Steiner's scale parameter.)This iteration is repeated until a proper stop criterion is met.
In order to solve the normal equations of the type [1] developed a tomographically very efficient variant of the Conjugate Gradient method.In order to solve the normal equations of the weighted least squares method, it should be modified as follows: Let o x  be an initial estimate of the slowness and compute the vectors and start the iteration where 0,1, 2, k =  refers to the iteration number.This procedure differs from the ordinary Conjugate Gradient algorithm only at two points beside the transpose of the D matrix, the W weight matrix appears also.
For the numerical experiments a rectangular test area of the size 100 × 100 cells was defined (Figure 1).In the model three anomalies of the velocity 5 km/s (red color) is located in a homogeneous background of 4 km/s velocity (blue color).Sources and receivers were positioned along the x and y axis in an arrangement fulfilling the requirement of full tomographic ray coverage, so the theoretical traveltime data were computed along 60,000 ray traces.
Two data sets were generated for the tests.In the data set I. random noises up to 1% of theoretical traveltimesfollowing the Gauss distribution-were added to the data.The second data set-containing outliers-(data set II.) was created from the first one so that 20% extra noise was added to a randomly selected 20% portion of the data.In our numerical tests 10 iterations were applied in CG reconstructions.
In order to characterize the accuracy of the reconstruction the (relative) model distance was used.Here j x and ( ) x denotes the slowness in the j-th cell of the reconstructed picture and the model, respectively, M is the number of cells.Similarly the relative data distance were also calculated Using the first data set, the ordinary CG algorithm was applied for reconstruction results the velocity distribution shown on Figure 2. We got reliable results with acceptable relative distances both in data space and in model space.
The weighted version of the conjugated gradients (W-CGRAD) gives similar result, as it is shown in Figure 3.The CG result in Figure 2 is slightly better, because this method solves the Gaussian least squares problem and the data set contains Gaussian noise.
In order to test the outlier-sensitivity of the ordinary (non-weighted) CG method let us now use non-Gaussian data (data set II.).The reconstructed velocity distribution is given by Figure 4.As it can be seen, the picture is highly distorted and very noisy demonstrating the fact, that the ordinary Conjugate Gradient method (giving a least squares solution) is very sensitive for the outlier.This is an obvious consequence of the fact that the ordinary CGRAD algorithm of [1] is a variant of the Least Squares method which gives optimal result only in the case of Gaussian data noises.
Note, that the relative distances are very large, which is in agreement with the fact, that the anomalies are almost unrealizable.
Let us now test the newly proposed weighted conjugate gradient algorithm (W-CGRAD) using the Cauchy-Steiner weights.Figure 5 shows the reconstructed picture where the weighted version of the CG algorithm was applied to process data set II. (containing the outliers).It can be seen, that the influence of outliers is highly reduced due to the weighting.Note, that the relative model distance (0.103) is approximately seven times smaller, than in the case of Figure 4, where the original (non-weighted) CG algorithm was used.This result shows that the weighting with Cauchy-Steiner weights is very efficient in reducing the influence of outliers.

A Cauchy-Steiner Weighted SIRT Method
Simultaneous Iterative Reconstruction Technique is one of the most frequently used methods in seismic tomography.In the typical step of the algorithm the arithmetic mean of the so-called ART correction belonging to the Q seismic rays crossing the j-th cell is calculated as If instead of this simple average, a weighted average is used  a new version of the SIRT algorithm can be defined.Using Cauchy-Steiner weights a robust SIRT method can be expected.
As it is well-known, the SIRT method is one of the best procedures in tomography when the noise distribution follows Gaussian statistics.This can also be proved by using data set I. in a SIRT reconstruction.The result is shown in Figure 6.It can be seen that the compared to CG, the SIRT method gives nearly two times better model distance.
The weighted version of the SIRT (W-SIRT) gives similar result, as it is shown in Figure 7.The SIRT result in Figure 6 is slightly better, but there is only negligible difference in the model distance.
Similarly to the CG procedure the traditional SIRT method produce relatively distorted picture in case of data sets containing outliers.This is demonstrated in Figure 8, where the SIRT reconstruction by using data set II. is shown.Though this picture is better than the CG result given in Figure 4 (the model distance is reduced by a factor of two), the reconstruction is not acceptable.
On the other hand, using the weighted version of the SIRT method, we obtained the velocity distribution shown by Figure 9.It can be seen, that the weighted SIRT algorithm with Cauchy-Steiner weights (W-SIRT) is highly resistant to outlier data.Comparing the model distances of Figure 6 and Figure 8 we can see that the introduction of Cauchy-Steiner weights results in sufficient rejection of the influence of outliers.Note it that the modification required by the weighting can easily be implemented into an ordinary SIRT algorithm and-due to the weighting-the computation time is increased by only a very small amount.So, the W-SIRT algorithm is computationally economic and noise resistant.

Conclusion
In order to make the Conjugate Gradient and the SIRT tomography methods more robust a new weight factor, the Cauchy-Steiner weights were introduced by calculating the scale parameter of the Cauchy weight using Steiner's method of the Most Frequent Values.Minimizing the (Cauchy-Steiner) weighted norm of deviations the Conjugate Gradient algorithm (W-CGRAD) was applied in the framework of the Iteratively Reweighted Least Squares method.Compared to the Gaussian Least Squares method, it was demonstrated in small sized to mographic experiment, that the new (W-CGRAD) algorithm is more robust and resistant for outliers.The SIRT method was also modified by using Cauchy-Steiner weights.Instead of arithmetic mean of the ART corrections, the W-SIRT algorithm uses weighted average in calculating the new slowness field.It was also proved thatcompared to its original version-the W-SIRT method is less sensitive for outlier data.Using the robust tomography methods proposed in the paper, reduced outlier sensitivity can be achieved resulting in better tomographic reconstruction.

Figure 1 .
Figure 1.The model used for numerical tests.