Cone Bearing Estimation Utilizing a Hybrid HMM and IFM Smoother Filter Formulation

Cone penetration testing (CPT) is a widely used geotechnical engineering in-situ test for mapping soil profiles and assessing soil properties. In CPT, a cone on the end of a series of rods is pushed into the ground at a constant rate and resistance to the cone tip is measured (qm). The qm values are utilized to characterize the soil profile. Unfortunately, the measured cone tip resistance is blurred and/or averaged which can result in the distortion of the soil profile characterization and the inability to identify thin layers. This paper outlines a novel and highly effective algorithm for obtaining cone bearing estimates qt from averaged or smoothed qm measurements. This qt optimal filter estimation technique is referred to as the qtHMM-IFM algorithm and it implements a hybrid hidden Markov model and iterative forward modelling technique. The mathematical details of the qtHMM-IFM algorithm are outlined in this paper along with the results from challenging test bed. The test bed simulations have demonstrated that the qtHMM-IFM algorithm can derive accurate qt values from challenging averaged qm profiles. This allows for greater soil resolution and the identification and quantification of thin layers in a soil profile.


Introduction
Cone Penetration Test (CPT) [1] [2] [3] [4] is extensively used in geotechnical engineering to determine the in-situ subsurface stratigraphy and to estimate geotechnical parameters of the soils present. Geotechnical engineers use CPT to characterize and quantify soil properties and groundwater conditions so that the infrastructure (e.g., bridges, roads, buildings) construction requirements can be determined. In CPT a cone penetration test rig pushes the steel cone vertically into the ground at a standard rate and data are recorded at regular intervals during penetration. The cone penetrometer has electronic sensors to measure penetration resistance at the tip (q m ) and friction in the shaft (friction sleeve) during penetration. A CPT probe equipped with a pore-water pressure sensor is called a piezo-cones (CPTU cones). CPT penetrometers with other sensors such as a seismic sensor are also used for in-situ site characterization. Figure 1 illustrates a schematic and the associated terminology of a cone penetrometer.
For piezo-cones with the filter element right behind the cone tip (i.e., the u2 position), it is standard practice to correct the recorded tip resistance for the impact of the pore pressure on the back of the cone tip. This corrected cone tip resistance is normally referred to as q t , but in this paper, we focus on an additional correction that should be made to address the averaging that takes place when performing CPT to obtain the actual cone tip resistance values.
Boulanger and DeJong [5] outlined the distortions which occur when obtaining q m measurements and proposed an "inverse" algorithm where the results of the distortion could be optimally removed. In their work Boulanger and DeJong incorrectly described the distortions as a convolution operation (Equations (1), (2), (10), (12), (13), and (15)). In fact, the tip-bearing distortions are an averaging process [2] [5] where cone tip values measured at a particular depth are affected by values above and below the depth of interest. This averaging or smoothing results in the inability to identify thin layers which is critical for liquefaction assessment and the reduction in soil layer resolution. The averaging/smoothing effect is subsequently described along with a proposed algorithm which combines the Bayesian recursive estimation Hidden Markov Model (HMM) filter with Iterative Forward Modelling (IFM) parameter estimation in a smoother formulation for optimal estimation of true q t cone bearing values.

Cone Penetration Testing Model
When performing CPT the layers above and below the cone tip affect the measured tip resistance as illustrated in Figure 2.
The measured cone penetration tip resistance q m can then be described as where d: the cone depth; d c : the cone tip diameter; Δ: the q t sampling rate; q m (d): the measured cone penetration tip resistance; q t (d): the true cone penetration tip resistance; w c (d): the q t (d) averaging function; v(d): additive noise, generally taken to be white with a Gaussian pdf. In Equation (1) it is assumed that w c averages q t over 60 cone diameters centered at the cone tip. Boulanger and DeJong [5] outline how to calculate w c using the equations shown below (after correcting the equation for w 1 ).
The cone penetration averaging function w c for varying , ratios is illustrated in Figure 3. As outlined out by Boulanger and DeJong [5], w c is highly nonlinear and depth variant. This proves to be a significant challenge in obtaining the optimal estimates of q t but this can be addressed by applying a BRE filter which incorporates smoothing.

Bayesian Recursive Estimation
Bayesian Recursive Estimation (BRE) is a filtering technique based on state-space, time-domain formulations of physical problems [6] [7]. Application of this filter type requires that the dynamics of the system and measurement model, which relates the noisy measurements to the system state equations, be describable in a mathematical representation and probabilistic form that uniquely define the system behaviour. The potentially nonlinear discrete stochastic equation describing the system dynamics is defined as follows: In Equation (3), the vector f k is a function of the state vector x k and the process or system noise u k . It is assumed that Equation In Equation (4), h k depends upon the index k, the state x k , and the measurement noise v k at each sampling time. The probabilistic state-space formulation described by Equation (3) and the requirement for updating the state vector estimate based upon the newly available measurements described by Equation (4) are ideally suited for the Bayesian approach to derive the optimal estimation. In this approach it is attempted to construct the posterior estimate of the state given all available measurements. In general terms, it is desired to obtain estimates of the discretized system equation states x k , based on all available measurements up to time k (denoted as z 1:k ) by constructing the posterior p(x k |z 1:k ). The posterior Probability Density Function (PDF) then allows the calculation of the conditional mean estimate of the state (E(x k |z 1:k )).
BRE is a two step process consisting of prediction and update. In the prediction step the system equation defined by Equation (3) is used to obtain the prior PDF of the state at time k using the Chapman-Kolmogorov equation, which is given as The update step then computes the posterior PDF from the predicted PDF and the newly available measurement as follows: The recurrence Equations (5) and (6) form the basis for the optimal Bayesian solution. The BRE of the posterior density can generate an exact solution when the state-space equations fit into a Kalman Filter (KF) formulation or a Hidden Markov Model (HMM). Otherwise, BRE will generate an estimation numerically using Particle Filters (PF) when deriving the posterior PDF.

Hidden Markov Model (HMM) Filter
The HMM filter (also termed a grid-based filter) has a discrete state-space representation and has a finite number of states. In the HMM filter the posterior PDF is represented by the delta function approximation as follows: , represent the fixed discrete states and associated conditional probabilities, respectively, at time index k − 1, and N s the number of particles utilized. The governing equations for the HMM filter are derived by substituting Equation (7) into Equations (5) and (6). This substitution results in the HMM prediction and update equations which are outlined in Table 1.
Obtain optimal minimum variance estimate of the state vector and corresponding error covariance.
In the above equations it is required that the likelihood pdf ( ) based upon iteratively adjusting the parameters until a user specified cost function is minimized. The desired parameter estimates are defined as those which minimize the user specified cost function. The IFM technique which is utilized within the q t estimation algorithm is the downhill simplex method (DSM) originally developed by Nelder and Mead [8]. The DSM in multidimensions has the important property of not requiring derivatives of function evaluations and it can minimize nonlinear-functions of more than one independent variable. Although it is not the most efficient optimization procedure, the DSM is versatile, robust and simple to implement. A simplex defines the most elementary geometric figure of a given dimension: a line in one dimension, the triangle in two dimensions, the tetrahedron in three, etc; therefore, in an N-dimensional space, the simplex is a geometric figure that consists of N + 1 fully interconnected vertices. For example, in determining the location of a seismic event, a threedimensional space is searched, so the simplex is a tetrahedron with four vertices. The DSM has been used in a variety of scientific applications such as obtaining seismic source locations [9] [10]) tomographic imaging [9], and blind seismic deconvolution [11].
The DSM starts at N + 1 vertices that form the initial simplex. The initial simplex vertices are chosen so that the simplex occupies a good portion of the solution space. In addition, it is also required that a scalar cost function be speci- space by variously reflecting, expanding, contracting, or shrinking. The simplex size is continuously changed and mostly diminished, so that finally it is small enough to contain the minimum with the desired accuracy. The DSM incorporates the following basic steps: 1) Specify initial simplex vertices.
2) Specify the cost function at each vertex of the simplex.
3) Compare the cost function for each vertex and determine the lowest error "best" and highest error "worst" vertices. 4) Sequentially locating first the reflected, then if necessary, the expanded, and then if necessary, the contracted vertices, and calculating for each the corresponding cost function and comparing it to the worst vertex; if at any step the cost function of the new trial point is less than the value at the worst vertex; then this vertex is substituted as a vertex in place of the current worst vertex. 5) If the process in step 4 does not yield a lower error value than the previous worst, then the other vertices are shrunken towards the best vertex. 6) At each stage of shrinking, the distances between vertices are calculated and compared to a set tolerance value to check if the simplex has become sufficiently small for termination of the estimation; when the test criterion is reached, the previous best vertex becomes the solution. 7) At each stage of shrinking, the cost function values at the vertices is compared to a set minimum value to check if the error residual has become sufficiently small for termination of the estimation; when the test criterion is reached, the previous best vertex becomes the solution.

qtHMM-IFM Algorithm
The q t optimal filter estimation technique is referred to as the q t HMM-IFM algorithm and it consist of a BRE smoother and an IFM component.

qtHMM Algorithm Formulation
The HMM portion of the q t HMM-IFM algorithm (so called q t HMM algorithm) implements a BRE smoothing filter. BRE smoothing is a non-real time filter that uses all measurements available to estimate the state of a system at a certain time or depth in the q t estimation case [6] [7]. The q t HMM algorithm smoother consists of two parts: the forward and the backward formulation. The forward-depth filter (ˆF k q ) processes measurement data (q m ) above the cone tip ( (1)). Next a backward-depth formulation (ˆB k q ) is implemented, where the filter recurses through the data below the cone tip ( 30 (1)) starting at the final q m value. The optimal estimate for q t is then defined as

E. Baziw, G. Verbeek International Journal of Geosciences
Since the structure of the backward-depth q t HMM filter is similar to that of the forward-depth q t HMM filter only the forward-depth formulation is outlined. In this case a HMM filter is utilized where a bank of discrete q t values (i = 1 to N) varying from low (q tL ) to high (q tH ) (e.g., 0 MPa to 120 MPa) and a corresponding q t resolution q tR (e.g., 0.2 MPa) are specified. The required number of fixed grid HMM states is given as N S = (q tH − q tL )/q tR . In Table 1 the notation of the states x i is mapped to q i to reflect the bank of q t values. The measurement equation given by Equation (1) is modified as outlined below for the forward-depth case: The transitional probabilities (i.e.,

( )
HMM state (i.e., discrete cone tip, q ,i ) is set equal due to the fact that there is equal probability of moving from a current cone tip value to any other value between the range q tL to q tH . The likelihood PDF ( ) Table 1 is calculated based upon an assumed Gaussian measurement error as follows: where σ 2 is the variance of the measurement noise. The HMM forward-depth estimated q t HMM cone tip values (ˆF k q ) are calculated as follows:

qtHMM Test Bed Example
The previously outlined q t HMM algorithm was subjected to extensive test bed simulations. Unfortunately, it was concluded that this formulation would not work due to the significant challenges in estimating the unknown q t values below the cone tip (e.g., 54 unknown q t values for the case of a 10 cm 2 cone tip (d c = 0.0357 m) and a sampling rate of 0.02 m (∆ = 0.02) for the forward-depth q t HMM Formulation. The same obviously applied for the backward-depth q t HMM Formulation, but in that case the 54 unknown q t values are above the cone tip. It should be noted, however, that with this q t HMM Formulation it was nearly possible to duplicate the results of Boulanger and DeJong shown in Figure 4. In this case, the forward-depth formulation was utilized and the maximum q t value was not allowed to exceed 100. This is illustrated below for the case outlined in Figure 4(b) and H sand /d c = 20. Figure 5 illustrates values of q m and the forward-depth q t estimates for varying maximum q t values specified for the case outlined in Figure 4(b) and H sand /d c = 20. As is shown in Figure 5  . Values of q t , q m and q inv (estimated q t value) for idealize profiles with interlayers of a strong soil with q t = 100 embedded in a weaker soil with: (a) q t = 1, (b) q t = 10, and (c) q t = 50. Note that q inv almost perfectly overlays q t [5]. Figure 5. Values of q m (black series) and q t HMM feed-forward q t (red series) estimates for varying q t maximum values specified for the case outlined in Figure 4  q t value and not a demonstration of algorithm performance. It should be noted that the output is shown in Figure 5(d) is similar to the results of Boulanger and DeJong illustrated in Figure 6. This figure illustrates the significant instability in the estimates of q t when Boulanger and DeJong used their inversion estimation algorithm for a situation where soil layers with a q t value of 12 were embedded in a uniform deposit with a q t value of 10. This instability caused Boulanger and DeJong to incorporate an ad-hoc smoothing filter followed by a low-pass spatial filter into their algorithm.

Incorporation of IFM into the qtHMM Algorithm
IFM is incorporated into the q t HMM algorithm to address the poor test bed results. In this case, initial estimates for q t are derived utilizing IFM. Instead of attempting to estimate all the unknown q t values (below the cone depth for the forward-depth analysis, and above the cone for the backward-depth analysis) IFM is utilized where only a fraction of the q t values are required to be estimated. In this process constant layer q t values and their corresponding depth extents are estimated for a maximum number of layers (specified by the user) within the next w c window.

qtHMM-IFM Test Bed Example
The performance of the q t HMM-IFM algorithm was evaluated by carrying out challenging test bed simulations. This section outlines two of these challenging test bed simulations. The first test bed simulation of which is illustrated in Figure 8. A soil profile was defined through q t values (light grey line in Figure 8(a)) and the resulting q m values were then calculated (black line in Figure 8(a)). Using the q t HMM-IFM algorithm the q t values were then estimated based on the q m values (black dotted line in Figure 8(a)). It shall be obvious that the algorithm performed well as the derived q t values closely matched the originally specified q t values (with the percentage difference shown by the black line in Figure   8(b)). Interesting to note in this simulation is that the layering identified by the black oval (i.e., variation in q t between 84 and 96) has been lost, and that when focusing on q m values thin soil layers are completely overlooked. In addition inserting these thin layers also results in large differences between the measured and the true tip resistance values for the entire interval where these thin layers occur, as shown by the black dotted line in Figure 8(b).
A second test bed simulation of the performance of the q t HMM-IFM algorithm is shown in Figure 9. In Figure 9 a soil profile was defined through q t values (grey line in Figure 9(a)) and the resulting q m values were then calculated   Figure 9(a)). Using the q t HMM-IFM algorithm the q t values were then estimated based on the q m values (black dotted line in Figure 9(a)). It shall be obvious that the algorithm performed well as the derived q t values closely matched the originally specified q t values (with the percentage difference shown by the black line in Figure 9(b)). Similar to test bed 1, the layering identified by the black oval has been lost and that when focusing on q m values thin soil layers are completely overlooked. The light grey ovals in Figure 9(a) clearly illustrate that by applying q m values the actual tip resistance values are masked and blurred. In addition inserting these thin layers also results in large differences between the measured and the true tip resistance values for the entire interval where these thin layers occur, as shown by the back dotted line in Figure 9(b).

Conclusions
Cone penetrometer testing (CPT) is an effective, fast and relatively inexpensive system for determining the in-situ subsurface stratigraphy and estimating geotechnical parameters of the soils present. When performing CPT the layers above and below the cone tip affect the measured tip resistance. The extent of this issue can be significant and is dependent upon variable in-situ soil properties (i.e., it is site specific). As a result, a specific soil (with a specific q t value) can generate significantly different tip resistance readings (q m value) based upon the properties of the bounding soils, especially in soil profiles with thin soil layers.
For this reason, it is recommended that an algorithm is implemented to generate the actual q t value from recorded q m values.
This paper has outlined an algorithm which utilizes a hybrid HMM and IFM filter for the purpose of obtaining CPT true cone tip bearing values from measured blurred measured values. The q t estimation algorithm is referred to as q t HMM-IFM. Challenging test bed simulations have demonstrated that the q t HMM-IFM algorithm can derive accurate q t values from a q m profile. This allows for the identification and quantification of thin layers in a soil profile. The authors will carry out further test bed simulations and subsequently apply the q t HMM-IFM algorithm on real data sets.