Rolling Gaussian Process Regression with Application to Regime Shifts

Abstract

Gaussian Process Regression (GPR) can be applied to the problem of estimating a spatially-varying field on a regular grid, based on noisy observations made at irregular positions. In cases where the field has a weak time dependence, one may desire to estimate the present-time value of the field using a time window of data that rolls forward as new data become available, leading to a sequence of solution updates. We introduce “rolling GPR” (or moving window GPR) and present a procedure for implementing that is more computationally efficient than solving the full GPR problem at each update. Furthermore, regime shifts (sudden large changes in the field) can be detected by monitoring the change in posterior covariance of the predicted data during the updates, and their detrimental effect is mitigated by shortening the time window as the variance rises, and then decreasing it as it falls (but within prior bounds). A set of numerical experiments is provided that demonstrates the viability of the procedure.

Share and Cite:

Menke, W. (2022) Rolling Gaussian Process Regression with Application to Regime Shifts. Applied Mathematics, 13, 859-868. doi: 10.4236/am.2022.1311054.

1. Introduction

Gaussian Process Regression (GPR) is a popular data assimilation method that can be used to reconstruct a field, m ( x 1 , x 2 , ) , that varies with spatial coordinates, x i ( 1 i K ) [1] [2] [3] [4]. Observations of the field at particular positions, together with prior information on its value at these positions and its spatial correlation function, are used to estimate the field at arbitrary positions. GPR is conceptually similar to interpolation, but is not true interpolation because the reconstructed field does not, in general, reproduce the observations (except for the special case called Kriging [1]). This behavior is advantageous in the case of noisy data, because the reconstruction contains fewer defects caused by data outliers.

In many real-world applications, the field that is being reconstructed also has time, t , dependence, that is, m ( t , x 1 , x 2 , ) . Furthermore, observations often are not made synchronously, but rather in a sporadic and ongoing manner. When the goal is to estimate the current state of the field, older measurements become obsolete and only the most recent observations are relevant. The choice of the length of the observation window, T, affects both the resolution of the reconstruction (for resolution improves with the number of data) and the variance of the reconstruction (for the inclusion of new data decreases variance but the inclusion of obsolete data increases it). The issue of window length is particularly acute when the field is relatively stable, except for undergoing sporadic reorganizations. These so-called regime shifts are common features of fields associated with the Earth’s biosphere [5] [6] [7], climate system [8] [9] and geodynamo [10] [11].

We use a formulation of GPR [3] in which the M model parameters, m = [ m ( t ) ; m ( c ) ] M + N , and corresponding spatial coordinates, x = [ x ( t ) ; x ( c ) ] M + N (where the semicolon implies vertical concatenation), are divided into target, t, groups of length M and control, c, groups of length N. Only the control group is observed by data, d. The control points represent observations of the field at irregularly-spaced positions and the target points represent its values on a regularly-spaced grid. A prior estimate of the model parameters, m ^ , their covariance, C m , and the covariance of the data, σ d 2 I , are assumed to be known. The GPR estimate of the model parameters is then:

Δ m [ Δ m ( t ) Δ m ( c ) ] = [ C m ( t c ) C m ( c c ) ] A 1 Δ d with A C m ( c c ) + σ d 2 I and m e s t = Δ m + m ^ and Δ d d m ^ ( c ) and d p r e = [ m ( c ) ] e s t C m e s t ( c c ) = σ d 2 C m ( c c ) A 2 C m ( c c ) and C m e s t ( t t ) = σ d 2 C m ( t c ) A 2 C m ( t c ) T (1)

The formula for the solution, Δ m , should be evaluated from right to left for maximum computational efficiency. The matrix inverse, A 1 , appears both in the estimated solution, m e s t , and its posterior covariance, C m e s t [4], which is an important diagnostic of the solution’s quality. In typical cases, A is far from being sparse, so when N is large, A 1 can be computationally expensive to compute. Alternatives methods that omit the calculation of A 1 , such as solving the linear system, A u = Δ d , and then calculating Δ m = [ C m ( t c ) ; C m ( c c ) ] u , are nearly as expensive and do not provide a (simple) pathway for computing the posterior covariance.

Suppose that we know the GPR solution when the control group has N A element, ordered by time of observation. Our goal is to compute an updated solution after we delete the first N 1 observations from the group, leaving N 2 = N A N 1 , and then append N 3 elements, so that it now has N B = N 2 + N 3 elements. This process, which we term rolling GPR (or moving-window GPR) is repeated many times as new data become available and old data are deemed obsolete. The process ensures that the estimate of the field is kept up-to-date.

The structure of this paper is as follows: The process of performing rolling GPR is detailed in Methodology section, and issues concerning its implementation are discussed. This process is divided into four conceptually distinct parts: discarding obsolete data from the GPR problem, appending new data, detecting and correcting the error in the solution, and detecting and responding to regime changes through changes in window length. The complete process of performing rolling GPR is outlined in Procedure section. Examples section provides a demonstration of the technique, applied to the reconstruction of a two-dimensional field with a regime shift, using an exemplary synthetic dataset and a Gaussian prior covariance function. Finally, a discussion of the efficiency of the discard-append process, together with summary remarks, are presented in Discussion and Conclusion section.

2. Methodology

Discarding Obsolete Data. The quantities m ^ , x ( c ) , d , C m ( t c ) , C m ( c c ) , A and A 1 all need to be modified when data are discarded. The vectors, m ^ ( c ) and d are modified by retaining only their last N 2 elements:

m ^ ( c ) [ m ^ ( c 1 ) m ^ ( c 2 ) ] isreplacedwith m ^ ( c 2 ) d [ d ( 1 ) d ( 2 ) ] isreplacedwith d ( 2 ) (2)

Here, c 1 and c 2 refer to the deleted group and retained group, respectively. The corresponding xs must be modified in an identical manner. Similarly, only one block within the covariances matrices is retained:

C m ( t c ) [ C m ( t c 1 ) C m ( t c 2 ) ] isreplacedwith C m ( t c 2 ) C m ( c c ) [ C m ( c 1 c 1 ) C m ( c 1 c 2 ) C m ( c 2 c 1 ) C m ( c 2 c 2 ) ] isreplacedwith C m ( c 2 c 2 ) (3)

The updating of A 1 requires more effort, but uses only well-known techniques. At the start of the process, the N A × N A matrix, A , and its inverse, A 1 , are known. These matrices are partitioned into submatrices:

A [ X Z T Z Y ] and A 1 [ P R T R Q ] (4)

Here, X and P are N 1 × N 1 , Y and Q are N 2 × N 2 , and Z and R are N 2 × N 1 , with N 1 + N 2 = N A . We mention an identity involving Z , R , P and Q that will be used later in the paper:

Z T R P 1 [ I R T Z ] + Z T Q Z = Z T R P 1 Z T R P 1 R T Z + Z T Q Z = 0 (5)

It can be derived by multiplying out the block matrix form of A 1 A = I , solving the diagonal elements for X and Y , and substituting the result into the off-diagonal elements. The process of removing the first N 1 data corresponds to replacing A with Y , and A 1 with Y 1 . An efficient method for computing Y 1 can be designed using the Woodbury identity [12]:

( M + U W V ) 1 = M 1 M 1 U ( W 1 + V M 1 U ) 1 V M 1 (6)

Here, M , W , U and V are conformable matrices. The reader may confirm by direct substitution that:

W = I and M = A and ( M + U W V ) 1 = [ X 1 0 0 Y 1 ] U N , N 1 = [ I N 1 , N 1 0 N 1 , N 1 0 N 2 , N 1 Z N 2 , N 1 ] and V N 1 , N = [ 0 N 1 , N 1 [ Z T ] N 1 , N 2 I N 1 , N 1 0 N 1 , N 2 ] (7)

are consistent choices of the several matrices (with the subscripts indicating the sizes of selected matrices). Thus, Y 1 can be read from the lower-right-hand block of A 1 A 1 U ( I + V A 1 U ) 1 V A 1 . Note that the matrix, ( I + V A 1 U ) , is ( 2 N 1 ) × ( 2 N 1 ) , so that its inverse requires less computational effort than does the N 2 × N 2 matrix, Y , as long as a relatively few data are being removed. Below, we will develop an analytic formular for ( I + V A 1 U ) 1 that further reduces the size of the necessary matrices to N 1 × N 1 ). An explicit formula for Y 1 is obtained by manipulating the block matrix form of the Woodbury formula, starting with:

[ I + V A 1 U ] = I + [ 0 Z T I 0 ] [ P R T R Q ] [ I 0 0 Z ] = [ [ I N 1 , N 1 Z T R ] Z T Q Z P [ I N 1 , N 1 R T Z ] ] (8)

Note that the off-diagonal elements of this matrix are symmetric, and the off-diagonal elements are transposes of one another. Its inverse has the form:

[ I + V A 1 U ] 1 = [ I N 1 , N 1 C N 1 , N 1 D N 1 , N 1 I N 1 , N 1 ] with C = Z T R P 1 = Z T R P 1 R T Z Z T Q Z and D = R T Z [ Z T Q Z ] 1 = [ P 1 C ] 1 (9)

The formulas for C and D were derived by multiplying out the block diagonal form of [ I + V A 1 U ] 1 [ I + V A 1 U ] = I , solving the diagonal elements for C and D , and applying identity Equation (5). Note that both C and D are symmetric matrices. The products A 1 U and V A 1 have the forms:

A 1 U = [ P R T R Q ] [ I 0 0 Z ] = [ P R T Z R Q Z ] [ P S R T ] V A 1 = [ 0 Z T I 0 ] [ P R T R Q ] = [ Z T R Z T Q P R T ] [ S T T T P R T ] with S R T Z and T = Q Z (10)

Thus, the expression, A 1 U ( I + V A 1 U ) 1 V A 1 , can be simplified to:

A 1 U ( I + V A 1 U ) 1 V A 1 = [ P S R T ] [ I C D I ] [ S T T T P R T ] = [ { P ( S T + C P ) + S ( D S T + P ) } { P ( T T + C R T ) + S ( D T T + R T ) } { R ( S T + C P ) + T ( D S T + P ) } { R ( T T + C R T ) + T ( D T T + R T ) } ] (11)

The lower right-hand block of Equation (11) becomes the new A 1 :

A isreplacedwith Y A 1 isreplacedwith Y 1 = Q ( ( R T T + T R T ) + R C R T + T D T T ) (12)

This formula has been tested numerically, and was found to be correct to within machine precision. Accuracy and speed are improved thorough the use of coding techniques that utilize and enforce the symmetry of all the symmetric matrices.

Appending New Data. As before, the quantities m ^ ( c 2 ) , x ( c 2 ) , d ( 2 ) , C m ( t c 2 ) , C m ( c 2 c 2 ) , A and A 1 all need to be modified when new data are appended. The vectors, m ^ ( c ) and d are modified by appending N 3 elements:

m ^ ( c 2 ) isreplacedwith m ^ ( c ) [ m ^ ( c 2 ) m ^ ( c 3 ) ] and d ( 2 ) isreplacedwith d [ d ( 2 ) d ( 3 ) ] (13)

The corresponding xs must be modified in an identical manner. Similarly, new blocks are appended to the covariances matrices:

C m ( t c 2 ) isreplacedwith C m ( t c ) [ C m ( t c 2 ) C m ( t c 3 ) ] C m ( c 2 c 2 ) isreplacedwith C m ( c c ) [ C m ( c 2 c 2 ) C m ( c 2 c 3 ) C m ( c 3 c 2 ) C m ( c 3 c 3 ) ] (14)

As before, the updating of A 1 requires more effort, but use a well-known technique based on the Bordering method [13] [14]. We rename the existing matrix, A , to X , as it will become the upper-left block of the modified A . Both X and its inverse, X 1 , are known N 2 × N 2 matrices. We seek to add the blocks, Y and Z , associated with the N 3 new data, creating an augmented N B × N B matrix, A , with N B = N 2 + N 3 . The matrix, A , and its inverse satisfy:

A 1 A [ P R T R Q ] [ X Z T Z Y ] = [ I N 2 N 2 0 N 2 N 3 0 N 3 N 2 I N 3 N 3 ] (15)

Multiplying out the equation and solving for P , Q and R yields:

P X + R T Z = I so P = [ I R T Z ] X 1 P Z T + R T Y = 0 so R T = P Z T Y 1 R X + Q Z = 0 so R = Q Z X 1 R Z T + Q Y = I so Q Z X 1 Z T + Q Y = I so Q = [ Y Z X 1 Z T ] 1 (16)

The matrices then become:

A isreplacedwith [ X Z T Z Y ] A 1 isreplacedwith [ P R T R Q ] (17)

These formulas have been tested numerically, and are correct to within machine precision. Accuracy and speed are improved thorough the use of coding techniques that utilize and enforce the symmetry of all the symmetric matrices.

Detecting and Correcting of Errors in the Solution. Numerical tests (not shown) indicate that delete/append process can be repeated many (e.g. hundreds) of times without significant loss of precision, at least for the class of matrices encountered in GPR. However, as a precaution, we recommend that the inverse be corrected every few hundred iterations, either though the direct calculation of A 1 from A , or through one application of the Schultz method [15] [16] [17]:

A 1 isreplacedby A 1 ( 2 I A A 1 ) (18)

One way to monitor accuracy is to compute the absolute value of just one (or a few) of the diagonal elements of the error matrix, Φ A A 1 I . For fixed i, quantity | Φ i i | can be computed very efficiently, because only one inner product (between the ith row of A and the th column of A 1 ) need be performed. The correction process then can be initiated when | Φ i i | exceeds a threshold, chosen to be small fraction, say 10−8.

Readjusting Window Length. The posterior data variance,

σ 2 = N B 1 e T e with e = d d p r e (19)

is a measure of how well the reconstruction fits the data. The quantity, χ 2 N B σ 2 / σ d 2 , is approximately chi-squared distributed with N B degrees of freedom, and therefore has an expected value of N B and variance of 2 N B . Thus, the expected value of σ 2 is σ d 2 , its variance is 2 σ d 4 / N B and its 95% confidence bound is σ 95 2 = σ d 2 + 2 σ d 2 [ 2 / N B ] 1 / 2 . However, σ 2 can be expected to rise well above this bound during a regime shift due to model error, that is, two distinct spatial patterns are being comingled and cannot be fit simultaneously. Shortening the window length tends to bring σ 2 closer to the bound, because obsolete data are being discarded more rapidly. This suggests the strategy of defining a threshold, guided by the value of σ 95 , and decreasing the window length when E is above it (by setting N 1 > N 3 when) and increasing it once E has dropped below it (by setting N 1 < N 3 until N B has reached some upper limit).

3. Procedure

Step 1. Decide upon initial values of N 1 , N 2 , N 3 . The choice N 1 = N 3 implies that the same number of data are appended as are discarded, leaving the window length, N B = N 2 + N 3 unchanged. The rolling process is most efficient when N 2 N 1 and N 2 N 3 .

Step 2. Solve the GPR problem for an initial group of N A = N 1 + N 2 data, as in Equation (1).

Step 3. Discard N 1 data by modifying m ^ and d , and their corresponding xs, as in Equation (2), C m ( t c ) and C m ( c c ) as in Equation (3) and A and A 1 as in Equation (12).

Step 4. Append N 3 data by modifying m ^ and d , and their corresponding xs, as in Equation (13), C m ( t c ) and C m ( c c ) as in Equation (14) and A and A 1 as in Equation (17).

Step 5. (Optional) Monitor the error | Φ i i | , where Φ [ A A 1 I ] , for a single index, i, and when it exceeds a threshold of say, 10−8, refine A 1 as in Equation (18).

Step 6. Solve the GPR problem for N B = N 2 + N 3 data, obtaining m e s t and d p r e as in Equation (1).

Step 7. (Optional) Compute the posterior data variance, σ 2 , as in Equation (19) and reassign the values of N 1 , N 2 and N 3 , as needed (as discussed in the Readjusting Window Length section).

Step 8. Once N 3 new are available, iterate the procedure, starting at Step 3.

4. Examples

Test Scenario. In these examples, the two dimenional field, m ( x , y ) , 0 x 30 , 0 y 30 is reconstructed on a regular 30 × 30 grid, using data that are acquired at the steady rate of 10 observations per time interval, ∆t. The observations are made at randomly chosen positions and have uncorrelated and uniform error with prior variance, σ d 2 = 10 4 . The true field experiences a regime shift at time, 25∆t, when it abruptly changes from a four-lobed to a three-lobed pattern:

m ( t , x , y ) = { sin ( 2 π x / L ) sin ( 2 π y / L ) ( t < 25 ) sin ( π x / L ) sin ( 3 π y / L ) ( t 25 ) with L = 30 (20)

The field is assumed to have zero prior value and Gaussian prior covariance:

m ^ = 0 and C i j = γ 2 exp ( r i j 2 2 s 2 ) with r i j 2 = ( x i x j ) 2 + ( y i y j ) 2 and γ 2 = 1 3 and s = 0.22 (21)

Example without Window Length Readjustment. In this example (Figure 1(A)), the size of the rolling set of observations increases with time up to an upper bound of 90. For times, t < 25 Δ t , the field is correctly reconstructed, showing the correct four-lobed pattern, and the posterior variance is about equal to the

Figure 1. Example of rolling GPR. (A) Case where the window length is held constant. The window length (top curve) grows to N b = 90 and then is held constant. The two-dimensional field, m ( x , y ) (colors), abruptly changes from a four-lobed patter to a three-lobed pattern, at time, t = 25 . The posterior data variance, σ 2 (bottom curve), is low, except near the time of the change. (B) Case where the window length is varied. The posterior data variance, σ 2 (top black curve), and the time at which it exceeds a threshold (top red curve) is detected. The window length (bottom black curve) is decreased during the interval of high variance, and then increased afterwards, within bounds (bottom two red curves).

prior variance. The field is incorrectly reconstructed during the time interval, 25 Δ t t 33 Δ t , when the window comingles the two patterns, and the posterior variance is about one thousand times higher than the prior variance. For times, t 34 Δ t , the field is correctly reconstructed, showing the correct three-lobed pattern, and the posterior variance returns to its original level. This example confirms the ability of the method to reconstruct the field in the presence of regime shifts.

Example with Window Length Readjustment. In this example (Figure 1(B)), the posterior data variance, σ 2 (a measure of the misfit of the data), is monitored, and the size of the rolling set of observations is reduced when it increases past a threshold (but not below a lower bound of N B = 50 ) (Figure 1(B)). Once the error has declined below the threshold, the set size is allowed to increase, up to an upper bound of N B = 90 . The duration of the interval of elevated error is reduced by a factor of about two compared to the first experiment. The presence of the second three-lobed pattern is evident at an earlier time than in the first example, demonstrating the utility of the window length readjustment.

5. Discussion and Conclusions

The main advantage of the rolling Gaussian Process Regression method that we describe here is its ability to reconstruct a time-varying field without any assumptions about its dynamics. This is in contrast to other data assimilation techniques, such as Kalman filtering [18], in which the differential equation describing the time dependence is assumed to be known.

The N × N matrix, A , that arises in Gaussian Process Regression is a non-sparse, symmetric positive definite matrix that takes N B 3 / 3 + O ( N B 2 ) floating point operations to invert [19]. Careful counting of operations reveals that the discard/append process described above takes about 6 N A N B 2 + 4 N B 2 N C operations. Thus, for N A = N C , it is more efficient when N A / N M 1 / 30 , that is, when just a few percent of the data are being updated.

The procedure for implementing “rolling GRP” (or moving window GPR) that we present here is more computationally efficient than solving the full GPR problem at each update, at least when the number of data that are deleted/appended is only a few percent of the total number used in the calculation. Regime shifts (sudden large changes in the field) can be detected by monitoring the posterior data variance (a measure of the misfit of the data) during the updates, and their detrimental effect is mitigated by shortening the time window as the variance rises, and then decreasing it as it falls (but within prior bounds). The numerical experiments presented here demonstrate the viability and usefulness of the procedure.

Acknowledgements

The author thanks Roger Creel of Columbia University (New York, USA) for the helpful discussion.

Conflicts of Interest

The author declares no conflicts of interest regarding the publication of this paper.

References

[1] Krige, D.G. (1951) A Statistical Approach to Some Basic Mine Valuation Problems on the Witwatersrand. Journal of the Chemical, Metallurgical and Mining Society of South Africa, 52, 119-139.
[2] Rasmussen, C.E. and Williams, C.K.I. (2006) Gaussian Processes for Machine Learning. The MIT Press, Cambridge, 272 p.
https://doi.org/10.7551/mitpress/3206.001.0001
[3] Menke, W. and Creel, R. (2021) Gaussian Process Regression Reviewed in the Context of Inverse Theory. Surveys in Geophysics, 42, 473-503.
https://doi.org/10.1007/s10712-021-09640-w
[4] Menke, W. (2022) Environmental Data Analysis with MATLAB or Python, Principles, Appilications and Prospects. 3rd Edition, Elsevier, Amsterdam.
[5] Scheffer, M., Carpenter, S., Foley, J.A., Folke, C. and Walker, B. (2001) Catastrophic Shifts in Ecosystems. Nature, 413, 591-596.
https://doi.org/10.1038/35098000
[6] Beisner, B.E., Haydon, D.T. and Cuddington, K. (2003) Alternative Stable States in Ecology. Frontiers in Ecology and the Environment, 1, 376-382.
[7] Smol, J.P., Wolfe, A.P., Birks, H.J.B, Douglas, M.S.V., Jones, V.J., Korhola, A., Pienitz, R., Rühland, K., Sorvari, S., Antoniades, D., Brooks, S.J., Fallu, M.-A., Hughes, M., Keatley, B.E., Laing, T.E., Michelutti, N., Nazarova, L., Nyman, M., Paterson, A.M., Perren, B., Quinlan, R., Rautio, M., Saulnier-Talbot, É., Siitonen, S., Solovieva, N. and Weckström, J. (2005) Climate-Driven Regime Shifts in the Biological Communities of Arctic Lakes. Proceedings of the National Academy of Sciences of the United States of America, 102, 4397-4402.
https://doi.org/10.1073/pnas.0500245102
[8] Zhang, Q.-B. and Fang, O. (2020) Tree Rings Circle an Abrupt Shift in Climate. Science, 370, 1037-1038.
https://doi.org/10.1126/science.abf1700
[9] Alley, R.B., Marotzke, J., Nordhaus, W.D., Overpeck, J.T., Peteet, D.M., Pielke Jr., R.A., Pierrehumbert, R.T., Rhines, P.B., Stocker, T.F., Talley, L.D. and Wallace, J.M. (2003) Abrupt Climate Change. Science, 299, 2005-2010.
https://doi.org/10.1126/science.1081056
[10] Glatzmaier, G.A. and Roberts, P.H. (1995) A Three-Dimensional Self-Consistent Computer Simulation of a Geomagnetic Field Reversal. Nature, 377, 203-209.
https://doi.org/10.1038/377203a0
[11] Meynadier, L. (1993) Geomagnetic Field Intensity and Reversals during the Past Four Million Years. Nature, 366, 234-238.
https://doi.org/10.1038/366234a0
[12] Woodbury, M. (1950) Inverting Modified Matrices. Memorandum Report 42. Statistical Research Group, Princeton University, Princeton.
[13] Houreholder, A.S. (1975) The Theory of Matrices in Numerical Analysis. Dover Publicatiions, New York.
[14] Westlake, J.R. (1968) A Handbook of Numerical Matrix Inversion and Solution of Linear Equations. John Wiley & Sons, Hoboken.
[15] Li, R.C. (2014) Matrix Perturbation Theory. In: Hogben, L., Ed., Handbook of Linear Algebra, 2nd Edition, CRC Press, Boca Raton.
[16] Petersen, K.B. and Pedersen, M.S. (2012) The Matrix Cookbook.
https://www.academia.edu/42127358/The_Matrix_Cookbook
[17] Petkovic, M.D. (2014) Generalized Schultz Iterative Methods for the Computation of Outer Inverses. Computers & Mathematics with Applications, 67, 1837-1847.
https://doi.org/10.1016/j.camwa.2014.03.019
[18] Grewal, M.S. and Andrews, A.P. (2014) Kalman Filtering: Theory and Practice using MATLAB. 4th Edition, Wiley-IEEE Press, Hoboken.
[19] Hunger, R. (2007) Floating Point Operations in Matrix-Vector Calculus (Version 1.3). Technical Report, Technische Universität München, Associate Institute for Signal Processing, München.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.