Seismic Data Quality Control and Interpolation Using Principal Component Analysis ()
1. Introduction
It is necessary to evaluate the quality of observed seismic records, and to detect and interpolate bad traces or shots [1] [2] , when the acquisition environment is challenging or when components of the sampling or recording system malfunction. Seismic trace/shot interpolation is also critically required in multi-channel seismic data processing when the acquired data are coarser than the required spatial sampling as discussed in [3] [4] . However, many of the existing algorithms for data interpolation face challenges when dealing with seismic data having bad trace/shot data. For example, the local-slant-stack methods [5] are some of the oldest interpolation methods specifically developed for seismic data interpolation. To make these methods work effectively, algorithms must detect coherent events along with a set of dips in traces, and stack the data along the dip angle having the best semblance to create new reflectance. These methods have the very desirable feature of not requiring evenly spaced traces, as required by many other methods. This means they can be used to interpolate aliased data and can handle irregularly recorded seismic data on complex landscapes, such as in mountainous areas. However, these methods encounter challenges in trying to identify complex reflection events and tend to have very large computing power requirements [3] .
A significant advancement in seismic data interpolation came with the appearance of the f-x method [6] and similar methods [7] [8] . The idea behind these methods is to use the low-frequency components of the data to predict the high-frequency components by assuming that reflection events are linear in the temporal-spatial domain and that these events are aliased and unwrapped in the frequency domain. Clearly, the limitations of the f-x and related methods come from their linear events assumptions such that they are not suitable for curved events and practical band-limited seismic recording (aliased and unwrapped assumption). There are some extended versions of the f-x method, such as f-x-y method [6] [9] [10] , which works on areal (x-y) data in contrast to profile (x) data. However, they have the similar working principal and linear events assumptions. Reference [11] proposes a minimum weighted norm method for curved seismic events but the method fails for aliased data.
The f-k method as discussed in [7] [12] estimates traces at half the trace interval by matching the f-k transforms of the odd traces to the transforms of the even traces, in contrast to the f-x method that works in frequency-offset (f-x) domain. Just like the f-x method, the f-k method can only interpolate uniformly recorded linear events.
The 2 dimensional (2D) prediction error filter (PEF) as discussed in [13] works in temporal-spatial domain for predicting traces at half the trace interval of the seismic data. Reference [3] comments that the PEF method works in a manner that could be considered similar to f-x and f-k methods, although PEF works entirely in the time-spatial domain and has the limitations of being suitable only for linear events and infinite bandwidth records. Similar to the PEF method are the half-step [14] and multistep and autoregressively adapted prediction filter methods [8] [15] [16] for seismic interpolation. The multistep method further uses autoregression, or recursive least square solution, and adds a predefined exponential attenuation coefficient in the high-frequency components of prediction equations to relax the strict requirement of linear events assumption.
Reference [3] reviews, compares, and discusses some of these methods with their assumptions such as the requirement for linear events, infinite bandwidth records, and even special spatial sampling schemes. The linear assumption remains unresolved in the method of Fourier transformation for unevenly sampled records [17] [18] [19] .
The method given by [20] predicts high frequency components from observed low frequency signals by solving the band-limited signal problem in the wavelet domain. Similarly, References [21] [22] use curvelet and local radon transform to solve this prediction problem. However, the prediction of high frequency components from low frequency components is challenging without a solid physical foundation.
References [23] [24] [25] [26] [27] integrate migration and interpolation procedures in wave equation-based imaging methods, such as reverse-time depth and Kirchhoff migration, for seismic interpolation in the Fourier or spatial domains. These methods have the advantage of noise attenuation while interpolating seismic data. However, the migration method itself requires adequately interpolated seismic traces, and the issue of how to balance interpolation and migration processes remains a challenge.
Reference [28] presents a nonlinear correlation technique to fill gaps in seismic surveys by replacing noisy traces using the correlation of adjacent traces. Unfortunately, it also assumes linear events (dip and amplitude) during the construction of synthetic (interpolated) traces.
Recently, References [29] [30] show promising results using artificial intelligence (AI), such as deep learning and convolution neural network, for seismic denoising and corrupted trace interpolation. However, it is difficult to understand why they succeed or fail because the coefficients of the trained network are not suitable for analysis.
Reference [31] released an inverse spatial PCA method for denoising and interpolating irregularly observed airborne magnetic survey data, with excellent results for separation of local anomalies during interpolation.
In this paper, we present the novel method, termed Eigenspace Seismic (E-S), for seismic data interpolation and quality evaluation, using PCA, without assuming linear seismic events, infinite bandwidth, or regular spatial sampling. Currently, there is no published example showing this method and the applications demonstrated in our study. Though many examples exist for seismic data processing using PCA [31] - [36] , they are not applied to trace or shot interpolation and quality control. For example, Reference [34] focuses on denoising by partial reconstruction of the seismic data image (array) using selected principal components (PCs) for vertical seismic profile data processing. Reference [36] use a small fraction of PCs from PCA results of a stacking collection to replace this collection then feed this into the stacking procedure to get better stacking results. To enhance the computing efficiency, Reference [36] also presents a fast PCA (FPCA) algorithm for practical seismic application. The 3 dimensional (3D) seismic data reconstruction [32] [34] is more similar to the pioneering work made by [35] on using PCA for partial seismic data reconstruction, but they focused on fine-tuning of PCs and singular spectrum analysis (SSA) for energy structure, respectively. Reference [33] presents an abstract mentioning seismic trace interpolation using PCA but do not apply it to shot interpolation or seismic data quality control.
In the E-S method, we treat a seismic trace as a trace vector and arrange the trace vectors of any type of trace gather (e.g. common shot, common receiver, common mid-point, common offset, etc.) a trace matrix with its traces as rows, without losing any generality. For a shot gather, traces of a shot are concatenated to be a shot vector, after interpolation, we decipher the shot vector in the same way with the concatenation procedure. We fulfill the PCA of trace or shot matrix among their rows using the singular value decomposition (SVD) method to preserve energy and have high calculation performance.
In the following sections, we will first introduce the method and its data processing procedures (Section 2), and then test its performance using both a synthetic example and a trace and shot gather of offshore seismic data (Section 3). We discuss the working principal, possible improvement, and conclusions in Section 4.
2. The E-S Method
The E-S method handles a whole trace, or shot, as an inseparable entity. A trace gather spans a vector feature space that originates from its correlation structure and has spatial metrics among these entities. However, this raw feature space is sparse and we can transform it into an eigenspace with linear properties using PCA. In the eigenspace, we can reconstruct an entity, such as a trace at any observation point, or construct a virtual entity, such as a virtual trace at an unknown location, using 1D interpolation of their coordinate trajectories in the eigenspace. Comparing the reconstructed entities with the observed ones enables us to evaluate seismic quality, and the construction of virtual entities is used to fill gaps for seismic trace/shot interpolation.
2.1. Constructing the Eigenspace
To simplify the descriptions, we will start from the E-S method using a common mid-point gather. Without losing any generality, we assume the seismic profile direction is x and acknowledge its m recorded traces with the trace sampling length (n). If we assume a trace at the line coordinate i (x = i) as a vector
(i = 0, m − 1), then we can express this trace gather as an m * n matrix S:
(1)
where T means transpose of a matrix. All traces or rows of matrix S will span a vector-space [37] and all non-zero normalized principal components (PCs) of these traces (S) are created using PCA [38] . The PCs consist of an orthogonal base of the spanned eigenspace because of their unit length and mutually orthogonal properties. However, a better and more efficient way to construct the eigenspace for eigenvector estimation and analysis of its energy structure is to decompose the matrix S using the SVD method [39] as expressed in Equation (2), because the size of the seismic matrix S is often large
(2)
where S, U, ∑, and V are the seismic trace matrix, left eigenvector matrix, middle diagonal singular value matrix, and right eigenvector matrix with dimensions m * n, m * r, r * r, and r * n (r is the rank of matrix S), respectively. The columns of
are the bases of the constructed eigenspace and S has been mapped into the eigenspace having the rows of U as its loading scores or coordinates in the eigenspace, and the diagonal entries of ∑ as its spectrum reflecting its energy contribution. Note that rows and columns of U and
are unit (length) and mutually orthogonal, we can write any trace si as a row of S as:
(3)
where
,
,
, ×, and
are ith row of S, ith entry of ∑, ith row of U, outer product, and ith column of
, respectively.
2.2. Entity Interpolation in Eigenspace
Rows of matrix U are the coordinates of the entities, such as PCs of traces, in the eigenspace. According to Equation (3), seismic traces can be reconstructed in the eigenspace using
as its base and U as its “loading scores” (coordinates in the eigenspace) and its spectral density ∑ when the trace number i (i = 0, m − 1) or row of S is limited to be an integer. If we relax i to be a real number x (distance along the seismic profile), such as 0.5 which means it locates in the middle between the first and second trace, then we can deduce the following equation from Equation (3):
(4)
We define the constructed sx as a virtual trace at position x. During this procedure,
, the base of eigenspace, and ∑, the energy distribution on every base of eigenspace, do not change. To calculate
, we estimate
along the coordinate trajectories of
(Equation (3)) using a 1D interpolation method. Practically, we use the simple linear, bilinear or cubic spline 1D interpolation method to estimate
(
) from U. Currently, we do not use the E-S method for extrapolation outside the range of traces.
2.3. Bad Entity Detection
We can revise the E-S method for detection and restoration of bad traces if we compare any observed trace with its reconstructed trace from any other observations using the E-S method. We use the residual mean square (rms) to evaluate the difference between the observed and reconstructed trace:
(5)
where
is the reconstructed trace at current trace location i (x = i) from the trace gather not including the ith trace.
Therefore, we start to detect bad traces from looping through all, except the first and last, traces in a trace gather by interpolating a new trace at the current looping trace location from the whole trace gather excluding the current one. Then we calculate and plot the rms of the observed and interpolated virtual ones. By examining the rms chart, we mark traces with rms bigger than our expected rms value, estimated from visual inspection of the rms chart, as bad traces and we replace them with the interpolated ones. To process noisy records with many bad traces, we repeat this procedure iteratively and pick only one trace as a bad trace in each iteration.
3. Numerical Tests
We test the E-S method with a simulated trace gather by convolving a synthetic reflectance coefficient profile, representing an unconformity and rift basin horizons, with a predefined Ricker wavelet that has frequency characteristics commonly used in offshore seismic surveys. We also test the E-S method on a stacked trace gather from an offshore seismic survey that has horizons with steep dip, curved, onlap, offlap, and chaotic reflection of basin floor fans.
3.1. Synthetic Reflection Interpolation
To simulate a synthetic seismic profile, we draw a simulated stratigraphic profile with horizontal and curved interfaces and assign a constant reflectance to these horizons. Then we convolve this reflection trace gather with a predefined Ricker wavelet having 0.064 s duration, 100 Hz center frequency, and 2 ms sampling interval. We display the final simulated profile of 0.4 s and 100 traces in Figure 1(a) and its sparse traces (by extracting one from every two but keep the first and last traces) in Figure 1(b).
From Figure 1(b), we interpolate the traces at every unknown trace position and display the results in Figure 1(c). Figure 1(d) is the residual between the original (Figure 1(a)) and interpolated one (Figure 1(c)). By visual inspection,
Figure 1. Interpolation of synthetic reflection section. (a) models seismic profile; (b) half traces of seismic models; (c) E-S results; (d) E-S residual.
we find the residual is close to zero and the E-S method has performed very well in the interpolation
3.2. Trace Interpolation
Next, we test the E-S interpolation method using a section of offshore seismic survey from the east coast of Canada, with 4 ms sampling interval, and show the results in Figures 2-8.
Figures 2-4 display the input traces (a), interpolation results (b), residuals (c), and rms (d) for interpolation using every second, third and fourth trace, respectively. Table 1 lists the mean errors and relative rms of trace interpolation using half of the original traces as shown Figure 2(d). The relative rms for others (Figure 3(d), Figure 4(d)) are all less than 30.0% of the standard deviation of the signal.
Figure 5 is the spectral density of input traces for singular spectrum analysis (SSA). It can outline the energy distribution structure of input traces in the eigenspace. From Figure 5, 80.9% of total energy accumulates on the first 22
Figure 2. Trace interpolation using 1/2 of the original traces. (a) 1/2 Traces input; (b) E-S interpolated; (c) residual traces; (d) rms.
Figure 3. Trace interpolation using 1/3 of the original traces. (a) 1/3 Traces input; (b) E-S interpolated; (c) residual traces; (d) rms.
Figure 4. Trace interpolation using 1/4 of the original traces. (a) 1/4 Traces input; (b) E-S interpolated; (c) residual traces; (d) rms.
Figure 5. Singular spectrum of a trace gather. Solid line is the singular values and the dotted line is the accumulated energy (%) of the test trace data.
Figure 6. Examples of coordinates of known traces and their 1D interpolation in the eigenspace. The dots (U) are known coordinates of the input traces in the eigenspace and solid lines are interpolated coordinates from these dots. The right column zooms to first twenty dots of corresponding coordinates of their left counterparts.
Figure 7. Eigen-traces of the test trace data set.
Figure 8. Five times expansion of the input traces by interpolation. (151 * 5 − 5 + 1) 751 traces are interpolated from the test 151 traces where every trace is 5 times interpolated between and including itself and next trace except the last trace where no extrapolation is made to avoid edge effects. (a) Input traces; (b) E-S interpolated traces.
Table 1. Calculated residual mean and rms values between original and interpolated traces using the E-S method.
eigenbases, 90.3% on the first 31, and 99.1% for the first 60. The stronger the correlation that exists among traces, the faster the energy for every eigenbase (thin solid line in Figure 5) will attenuate, and simultaneously the faster the accumulated energy will increase (dashed line in Figure 5). For white noise, the energy for every eigenbase is flat and there is no interpolation method that works under this circumstance.
We plot the spatial loading scores (and 1D interpolation trajectories) and eigenbase in Figure 6 and Figure 7 to help to constrain the E-S method. In Figure 6, we display the coordinates (rows of U) with solid dots and its 1D interpolation with solid lines. We find these coordinates are quite smooth and suitable to be interpolated using simple 1D spline interpolation methods. We also show their local details in the right column of Figure 6 where the local smooth features are clear.
Figure 7 displays the eigenbase of the eigenspace. Note that they are unit length and they are orthogonal (mutually not correlate) with some patterns that may be useful in eigenspace base understanding.
In Figure 8, we use E-S interpolation to produce 5 times the initial number of traces to illustrate the resolution power in identifying basin floor fans, onlap, offlap, and other structures of a rifted basin. In the input traces (Figure 8(a)), these sedimentary features of the rift basin are fuzzy and discontinuous. However, the rift basin structure is very clear in the interpolated section (Figure 8(b)), showing how the E-S method can be helpful in interpreting complex stratigraphic structures.
3.3. Bad Trace Detection
To test the performance of the E-S method for bad trace detection, we add normal distribution noise
, as shown in Figure 9(a), to some randomly selected traces, such as [3369, 3389, 3409, 3429, 3449, 3469, 3479] in a trace gather (collection), using Equation (6), and we display the trace gather with simulated bad traces in Figure 9(c).
(6)
Figure 9. Bad trace detection. From top to bottom, we order the inlet charts as: (a), (b), (c), and (d). The number of trace is the CMD trace number. (a) a trace collection of stacked and migrated 2-D marine seismic data from the east coast area of Canada used for testing the E-S method for bad trace detection and interpolation; (b) rms between current trace and the interpolated trace interpolated at current location from traces other than itself. The rms from the raw input trace collection and from the input trace collection with simulated noise are displayed with dots and solid line, respectively; (c) 70% normal distribution noise is added into [3369, 3389, 3409, 3429, 3449, 3469, 3479] traces of the test trace dataset; (d) interpolated trace collection by removing the noise polluted traces (with the simulated noise).
where s and
are the original seismic trace, and the seismic trace with added noise. The normal noise n has mean
and standard deviation
. We estimate
and
from the same trace where we add noise and 2.333 is a constant to control the signal to noise ratio.
We interpolate traces at any trace position, except the first and last, using the trace gather with added noises excluding current trace. Then we calculate the rms for each interpolated trace and display it as shown in Figure 9(b). From Figure 9(b), we can easily pick out bad traces and use the interpolated ones to fill corresponding traces to get Figure 9(d) for further processing or interpretation without bad traces.
3.4. Bad Shots Detection and Interpolation
We use 120 shots from a seismic survey offshore east coast of Canada to test the performance of the E-S method for bad shot detection and shot interpolation. Every shot has 16 traces (receivers) and each trace has 1751 samples with 2 ms recording rate. The procedure is similar to the above example, in that noise was added to some shots, interpolation was used to produce a virtual shot using other shots at every shot location, and rms was calculated and plotted to detect bad shots. We display the rms chart example of these operations in Figure 10. The peaks of the rms chart as shown in Figure 10 correspond very well to the shot numbers where we added noise. We omit the display of shot gather for it is too large.
Figure 10. Bad shots detection. Normal distribution noises are added into randomly selected shots, [6224, 6244, 6263, 6284, 6304], of the test shot gather using equation 5. The rms between original and interpolated shot using E-S method is used to pick out bad shot at their peaks, similar to the bad trace detection.
4. Discussion and Conclusions
In this paper, we propose a novel method termed Eigenspace Seismic (E-S) method to improve seismic data quality and restore bad recordings, using PCA to interpolate in eigenspace.
We demonstrate the effectiveness of the E-S method for trace interpolation by processing synthetic and offshore seismic survey data to restore original data from decimated data sets with 1/2, 1/3, and 1/4 of the original traces. We also show its possible usage to clarify stratigraphic structures such as onlapping units and basin floor fans, in the interpolated trace gather by augmenting the number of original traces by a factor of 5 using the E-S method. The method works best for sparsely recorded or densely recorded seismic data with many bad components.
By analyzing the features of the E-S method using SSA, we can evaluate its energy distribution, features of eigenbases and the smoothness of loading scores (or coordinates in the eigenspace) trajectories that are helpful in the success of E-S method. The major advantage of the E-S method comes from the fact that it handles a whole trace or shot as an inseparable entity and transforms the 2D interpolation into a 1D problem in the eigenspace.
The computational cost of the E-S method is not an issue for trace interpolation and trace quality control, but it does become an issue for shot-gather interpolation and quality control. The state-of-art fast PCA method [36] could be used to seamlessly speed up the E-S method.
The E-S method does not assume the signal is stationary so that it may be effective for nonstationary noises commonly associated with marine data acquisition, such as shark strikes, swell noise, or nonlaminar flow around cable instrumentation. Theoretically, the E-S method is based on coherency of a processed trace so that it will be efficient to process seismic data having coherent signals. The effectiveness of the E-S method depends on the distribution of these noises on the PCs. There are some methods specifically designed to solve the energy concentration problems, such as factor analysis [38] , that have possibility to address these requirements. These methods show great potential to improve the E-S method but their development is out of the range of this study.
Currently, we use simple 1D spline interpolation methods in the spatial loading score (U) interpolation in the E-S method. In the future, it should be possible to use other advanced 1D interpolation methods for the U estimation to get better E-S results. There are many potential applications of the E-S method to seismic data or other datasets, such as the construction of a 3D velocity volume from irregularly distributed 3D velocity observations [40] .
Acknowledgements
Natural Resources Canada through the UNCLOS Program of the Geological Survey of Canada, Lands and Minerals Sector of Natural Resources Canada supported this work. Natural Resources Canada through the Public Safety Geoscience Program partially supported this work. The authors wish to thank Mary-Lynn Dickson at the Geological Survey of Canada (Atlantic) for helpful discussions, Patrick Potter for preparing the offshore marine seismic reflection data and for constructive discussion, and John Shimeld for his critical internal review and suggestions. We also thank the Editors and two anonymous reviewers of the journal for their helpful suggestions for the improvement of the manuscript.