A New Method to Select Training Images in Multi-Point Geostatistics

Training images, as an important modeling parameter in the multi-point geostatistics, directly determine the effect of modeling. It’s necessary to evaluate and select the candidate training image before using the multi-point geostatistical modeling. The overall repetition probability is not sufficient to describe the relationship of single data events in the training image. Based on the understanding, a new method was presented in this paper to select the training image. As is shown in the basic idea, the repetition probability distribution of a single data event was used to characterize the type and stationarity of the sedimentary pattern in the training image. The repetition probability mean value and deviation of single data event reflected the stationarity of the geological model of the training image; the rate of data event mismatching reflected the diversity of geological patterns in training images. The selection of optimal training image was achieved by combining the probability of repeated events and the probability of overall repetition of single data events. It’s illustrated in the simulation tests that a good training image has the advantages of high repetition probability compatibility, stable distribution of repeated probability of single data event, low probability mean value, low probability deviation and low rate of mismatching. The method can quickly select the training image and provide the basic guarantee for multi-point geostatistical simulations.


Introduction
Multi-point geostatistics was proposed by Guardiano and Srivastava in 1993 [1], How to cite this paper: Wang, L.X., Yin which aimed to cope with the problem of the insufficient consideration of twopoint statistical information.The problem made it difficult to reproduce the shape of the simulated target more truthfully.By establishing a quantitative training image, the probability of determining different data events after scanning with a multi-point template was used to characterize the probability of occurrence of different data events.The objective of multi-point geostatistics is to recreate the geological patterns contained in the training images, so that training images can be considered as one of the key factors that determine the effect of simulation [2]- [9].In recent years, in order to obtain effective training images, scholars have proposed different methods, including the target-based method [10] [11] [12], the method based on the deposition process [5] [13] [14], the method based on the process of imitation deposition [15] [16], and the method based on geological data transformation [17] [18], etc.
At present, there are so many methods of creating training images that a large number of different training images can be created through various methods and tools for a certain research area.However, as a geological understanding of training images, how to select one or more most-suitable training images for the actual research area from multiple (group) training images of different sources, different creation methods, different spatial structure characteristics and credibility before conducting multi-point modeling?It has become a problem that modelers have to face.Yet, the optimal selection methods for training images are very limited, which include the optimal selection method based on variogram, the method based on conditional probability [3] [6] [19], and the method based on similar distance [20].
The optimal selection method based on variogram can effectively obtain the two-point geostatistical information contained in the data volume, but it is limited by the two-point geostatistics of the variogram.It can only be used to compare the features of second-order space structure, but can not analyze and compare the higher-order geostatistical features.Ortiz and Deutsch first proposed a way to sort training images through high-level geostatistical information [19].By the method, data events composed of a plurality of grid points in a single well can be obtained, and the training images can be scanned to obtain the distribution of the condition data events in the training images.
The training images were sorted by comparing multiple distribution features.
Boisvert further proposed a training image optimization method based on data event distribution and multi-point density equations [3].The example tests showed that the above two methods can effectively sort the training images.
However, these two methods can only be used to analyze and compare one-dimensional data extracted from a single well, but no effective high-level geological statistics can be obtained in the three-dimensional space.Then, Pérez proposed a training image optimization method based on three-dimensional data event repetition probability statistics [6], that is, the spiral search was conducted to obtain condition data events in the condition data, search the Open Journal of Yangtze Gas and Oil training data events of the spatial structure in the candidate training images, count the number of repetitions appearing in different training images, normalize all the repetitions obtained from each condition data event, and then obtain the average of the repetitions of each condition data event to get the compatibility between different training images and condition data events.However, this method simplified the calculation of data event disparity in data event search and matching degree calculation, and allocated the same weight to each point in the data event.In addition, this method cannot exactly reveal the true match between the training image and the condition data, and cannot differentiate and analyze a large number of training data events.And there is no direct relationship between the overall compatibility of training images with data events and the compatibility of training images with individual data events.Therefore, this method still cannot provide the absolute matching of different data events and training images in the condition data.
Based on Pérez's methodological analysis, this paper considered the issue that, in some cases, the overall probability of repetition may result in a high overall compatibility due to the repetition of a certain pattern in the training images, as no direct relation exists between the overall compatibility of training images and data events and the compatibility between training images and single data events.Furthermore, a new index was proposed in this paper, that is, statistical characteristic parameters of single data event repetition.These two ideas were

Method Based on Overall Repetition Probability
Pérez (2014) proposed to optimize the training images by counting the repetition probability of the whole data event and computing the relative compatibility and absolute compatibility.
The relative compatibility is to normalize the repetition number of each data event and calculate the repetition probability P i,j of the i-th data event in each training image, , , R i,j represents the repetition number of the i-th data event in the j-th training image, and then calculate the average repetition probability of the n-th data Open Journal of Yangtze Gas and Oil events as the relative compatibility C j , , Absolute compatibility is the occurrence of statistical events in the training image.If the i-th data event has appeared in the j-th training image, Y i,j is 1, otherwise Y i,j would be 0, then the proportion of data events contained in this training image is calculated, that is, absolute compatibility M j .a single data event repetition probability analysis based on its absolute compatibility and relative compatibility was proposed to make up for the shortcoming that the overall repetition probability does not reflect the distribution of individual data events within the training image.

Statistical Characteristic of Single Data Event Repetition
The single data event repetition probability is designed to reflect the distribution The repetition rate of a single data event is the repetition probability of a single data event in the repetition of all data events of a training image, that is, Data events with PT i,j being 0 mean no matching event in the training image.
If there is no match found in the training image, it will be marked as 1, otherwise 0, then no match will be calculated, where, UNR i,j is the index of mismatch events, and UNP j is the mismatch rate.When establishing statistical distribution probability for one-event repetition probability PT i,j , without considering data events without matching, the effective data event repetition probability PT i,j is calculated by interval, and the distribution probability average and deviation are calculated.The training images with lower data event mismatch rate, even single data event repetition probability distribution and smaller single data event repetition probability Open Journal of Yangtze Gas and Oil

Process of the Method
Through the programming, the method of combining the overall repetition that exactly matches the condition data event, and calculates the normalized probability P i,j and the single-event repetition probability PT i,j .According to the normalized probability, the relative compatibility and absolute compatibility of the whole training image are calculated.According to the single event repetition probability, the distribution proportion, the distribution mean and the distribution deviation are calculated: The specific steps are as follows (Figure 4): 1) Determine the search template, then create a search template weight ranking, and determine the pseudo-random path to find data events according to the distribution of condition data.
2) Scan training images to look for patterns that match the data events.If the data event condition points find an exact match in the training image, the event repetition number R i,j is incremented by 1 until the training image search is completed.
3) Jump to the next data event, repeat step 3) until all data events have been searched.5) Get the normalized probability P i,j and the single event probability PT i,j to calculate relative compatibility C j , absolute compatibility M j , Single data event repetition probability average and Single data event repetition probability deviation and Data event mismatch rate of single data events UNP j .

Two-Dimensional Test
The two-dimensional test grid adopted were the training images published by Pérez (2014) with a grid size of 100 × 100 × 1 (Figure 5).The maximum search range in the test was set to 31 × 31 × 1, and the number of upper limit condition points was to 35.The absolute compatibility and the relative compatibility were calculated respectively for the number of repetitions when searching for 5, 10, 15, 20, 25, 30, 35 condition points within the search range (Figure 6).It can be seen that as the condition points increased, the relative compatibility of the training images close to the original geological model tended to increase, while the absolute compatibility was higher than that of other training images.For the data events when 15 conditional points were considered, the Single data event repetition probability distribution, Single data event repetition probability average, Single data event repetition probability deviation and data event mismatch rate were calculated (Figure 7).And it is not difficult to find that, with better training images, there comes more stable repetition probability distribution, lower repetition probability average and deviation and mismatch rate.

Three-Dimensional Test
With the three-dimensional test grid with the size of 60 × 60 × 10, three different specifications (Table 1) of the river phase model TI4, TI5, TI6 and 900 corresponding to the point data were established, and at the same time, three training images T1, T2 and T3 were selected (Figure 9).For three different data conditions, the test tried to find their appropriate training images.For multi-point modeling, the maximum conditional point is 35.The grid size is 20 × 20 × 4 meters.It can be seen that the width of T1 is the largest, the thickness of T3 is the smallest, and the thickness of T2 is the largest while its width is moderate.distribution with the overall repetition probability.Therefore, the training images with similar parameters can be optimized by using the single-event repetition probability for the case that relatively good training images could not be selected by relative compatibility and absolute compatibility.
Multiple simulations were performed based on the three training images and the three sets of condition data (Figure 12).The differences between the three river phase models in terms of width and thickness were acceptable from the point of view of multipoint simulation.However, the optimality is the best.It is

Conclusions
The training image is equivalent to a geological pattern library for multi-point simulation, where data events are the embodiment of geological model.The advantages and disadvantages of the training images depend on the matching degree of the conditional patterns.It is an effective way to train the images by analyzing the data events.
The overall repetition probability of data events optimizes the overall pattern of training images through relative compatibility and absolute compatibility, which can reflect the matching degree of the geological patterns in the training images as a whole to the condition data.The higher relative compatibility and absolute compatibility have generally evaluated the training images.However, the lack of credibility of the condition data for a single data event would result in an additive effect of the individual significant data event on the overall repetition probability, and that training images that are not faithful to the condition data also be selected.Single data event repetition probability can make up for the overall repetition probability of a single data event description of the deficiencies and evaluate the stability of the distribution of individual data events.
In the steady reservoirs modeling, training image selected by this method can , Y.S. and Feng, W.J. (2018) A New Method to Select Training Images in Multi-Point Geostatistics.Open Journal of Yangtze Gas and Oil, 3, 112-129.https://doi.org/10.4236/ojogas.2018.32010 combined to sort and optimize the training images.The synthetic theoretical model showed that the new method could better achieve the sorting and optimization of training images.The research provided a new method for multi-point geostatistical modeling core and key parameters, i.e. training image optimization.It promoted multi-point geological modeling to better serve the reservoir model establishment and laid the foundation for enhanced oil recovery.An accurate training image could improve the effect of modeling, making the multi-point modeling closer to the actual reservoir situation.[9] [17] [21] [22] [23].
characteristics of data events within a certain training image.It uses the conditional probability as the evaluation data and selects a suitable search range and the number of conditional points involved in evaluation to weight the grid points within the search range.It also finds the number of occurrences of this mode in the training images and records the number of repetitions for each mode.That is, for the t-th candidate training images, the set of the n data events CE is obtained by scanning the condition data with the specified template, and the number of occurrences of the i-th data event CE i in the j-th training image is denoted as R i,j .Then, the distribution statistics of data events in each training image are calculated, so as to select a better training image.The statistical characteristics of these distributions include: single data event repetition probability distribution, single data event repetition probability average, single data event repetition probability deviation and data event mismatch rate.With single data event repetition probability distribution, single data event repetition probability average and single data event repetition probability deviation, the stability of data events in the training image can be reflected.And with data event mismatch rate, the diversity of training image patterns can be highlighted.
average and single data event repetition probability deviation are closer to the real geological features.Aiming at the poor performance of the above training images, the probabilistic characteristics of single data events are statistically analyzed when the five conditional points are taken (Figure3).It can be clearly seen from Figure3that the single data event repetition probability deviation and data event mismatch rate of single data events are obviously lower.Aided by a single data event indicator and combined with the overall repetition probability indicators, it will be able to more directly filter out the training images in line with the actual geological features.

Figure 3 .
Figure 3. Statistical characteristics of the single data event repetition probability.(a) Repetition probability deviation; (b) repetition probability average; (c) repetition probability distribution; (d) mismatch rate.
probability and the single data event statistical index is proposed to select the optimal training image.By meshing the work area with known condition data, a random search path is established.At the same time, the search range of the template is sorted by weight.For any node location, the search sequentially matches the condition data event exactly from the nearest condition point to the farthest condition point.Once the perfect match pattern is found, the number of repetitions for this pattern increases until all data points in the data model are searched across the training image, which returns the number of repetitions R i,j

Figure 4 . 4 )
Figure 4.The flow chart of training image evaluation.
From the real images TI4, TI5, TI6, 1091 conditional points were randomly selected, corresponding to TIC4, TIC5, TIC6.Based on the condition data, the training images of candidate T1, T2, T3 were tested and sorted, and the training image was optimized.TIC4: The condition data form TI4; TIC5: The condition data form TI5; TIC6: The condition data form TI6; T1, T2, T3: The training image for MPS; For the condition data TIC4, T1, T2 and T3 are used as the modeling parameter.The same as TIC5 and TIC6.

Figure 5 .
Figure 5. Training image and condition data.

Figure 6 .
Figure 6.Statistical characteristics of the overall repetition probability.(a) Absolute compatibility; (b) relative compatibility.

Figure 7 .
Figure 7. Statistical characteristics of the single data event repetition probability.(a) Repetition probability deviation; (b) repetition probability average; (c) repetition probability distribution; (d) mismatch rate.

Figure 11 .
Figure 11.Statistical characteristics of the single data event repetition probability.(a) Repetition probability deviation; (b) repetition probability average; (c) repetition probability distribution; (d) mismatch rate.
obvious that condition point TIC4 with training images T1, condition point TIC5 with training images T2 and condition point TIC6 with training images T3 produced the best simulation effect.Based on the two-dimensional model and three-dimensional model test, it can be seen that the relative compatibility, the absolute compatibility and the absolute compatibility in the overall repetition probability can improve the optimal selection evaluation for the training image with significant difference.And for the training images whose structural features are close to each other, the overall repetition probability will give a better evaluation of the training images in the event of partial data events with a high number of repetitions.However, the single data event repetition probability starts from the distribution of single data event repetition number, and takes the stability of data events, which is evaluated with the Single data event repetition probability average, Deviation and Mismatch rate, as the optimal selection index of training images.Combined with the overall repetitive probability of data events, the training images can be more fully optimized.

Table 1 .
Original channel size and training image scale.