_{1}

^{*}

A new dynamic model identification method is developed for continuous-time series analysis and forward prediction applications. The quantum of data is defined over moving time intervals in sliding window coordinates for compressing the size of stored data while retaining the resolution of information. Quantum vectors are introduced as the basis of a linear space for defining a Dynamic Quantum Operator (DQO) model of the system defined by its data stream. The transport of the quantum of compressed data is modeled between the time interval bins during the movement of the sliding time window. The DQO model is identified from the samples of the real-time flow of data over the sliding time window. A least-square-fit identification method is used for evaluating the parameters of the quantum operator model, utilizing the repeated use of the sampled data through a number of time steps. The method is tested to analyze, and forward-predict air temperature variations accessed from weather data as well as methane concentration variations obtained from measurements of an operating mine. The results show efficient forward prediction capabilities, surpassing those using neural networks and other methods for the same task.

Simulation, design, and process control tasks in engineering require the knowledge of the mathematical model of the controlled system. A dynamic model of a system may be created using analytical or numerical, computational simulation tools. Complex problems involving coupled processes pose a challenge to set up an analytical or computational, dynamic model that is fast enough to evaluate, flexible enough to match experimental observations, and adjustable enough for corrective calibration. Analytical models may need precious skills to set up for sufficient details, while computational, system modeling tools are cumbersome to incorporate in real-time process control applications.

Artificial Intelligence (AI) and Machine Learning (ML) methods have arisen as a panacea for overcoming the model-building difficulties when the vast amount of monitored data is already available from the subject system. A systematic review of AI models for natural resource applications is given by Jung and Choi [

The aims at the development of a new dynamic system model are: 1) data compression without information loss; 2) processing speed increase in model identification; and 3) accuracy improvement for short-term forecasting. Such demands (1)-(3) have arisen, e.g., for forward predicting and controlling atmospheric conditions in the hazardous workplace environment for workers’ safety and health. The focus of the study, therefore, is to develop a fast, real-time evaluation of a method for the mass amount of data commonly monitored as environmental air parameters with the capabilities of forecasting.

The heat, mass, and momentum transport processes are dependent on the past and present input conditions involved in the outcome of process parameters of the atmospheric conditions such as air velocity, temperature, humidity, and contaminant gas species. Similarly, the expected, future, process parameters are governed by the past and present conditions and the general, self-similar system behavior, in addition to some recurring disturbances. Dynamic model identification is expected to recognize and account for these system characteristics for forward prediction applications. Once the systematic characteristics are matched, only the stochastic disturbances remain to be depressed using, for example, least-square fit matching during model training. The distraction caused by the “known unknowns” in the forecast of the process parameters will then be limited only to the extent of a random model fitting error.

Functional data analysis is a good starting point for dividing the input data into discrete time intervals within which the data in each time segment is characterized by some statistical parameters such as the median or mean values, e.g., in Horvath and Koloszka [

A similar approach is used in the presented work regarding the autoregressive concept but in a fundamentally new way in which any single time series of N members of data is broken into multivariate components in time compartments assigned to M number of designated time interval bins. A significant element is that the compartmentalized data to be processed are moved from time segment bin to bin step by step, moving with the progression of real-time. The dynamic model will then use the characteristic values of the groups of data as multivariate inputs kept in the time interval bins.

The quantum of data kept in bins serves as the fixed base of the M-dimensional operator (or functional) of the dynamic model. A similar approach is used in a previous work regarding operator representation of a system model, rendering an output function to an input function as a transformation, e.g., by matrix-vector multiplication, used by Danko [

The plan of the study is set up as follows. A data flow of X ( t 1 ) , X ( t 2 ) , ⋯ , X ( t N ) is assumed from a single-channel sensor, acquired from the subject system at t 1 , t 2 , ⋯ , t N time instants, X ( t N ) being the most current. The past data are to be continuously stored in M number of bins, where M ≪ N for data compression. Definitions are given for the time compartmentalization into bins; the data processing and distribution into bins; and transport of the quantum of data between the bins during step-by-step sliding from the most recent to the past time periods. Various data compression methods are shown for comparison of characteristics including the common, sliding time window averaging and a new property, named the “moving window quantum of data”. The moving window quantum value in each bin is defined from the contained X ( t i ) data for constructing a set of base vectors of the dynamic operator of the system. For the model training of the matrix operator, a set of M-length quantum vectors is defined for setting up an over-determined set of equations for M < K . The M × M matrix coefficients of the dynamic operator of the system are obtained by matching the model prediction to the data by the least-square (LSQ) error fit method. Application examples will complete the study to show the operator model’s performance to complement or surpass those of other ML techniques including NN.

D1. Definition of time compartmentalization into bins. Let t 1 , t 2 , ⋯ , t N be the set of equidistant time divisions ( [ t i , t i + 1 ] = constant = Δ t , t i ∈ R 1 , e.g., minute, or day in seconds) for the acquisition of X ( t 1 ) , X ( t 2 ) , ⋯ , X ( t N ) data samples ( X ( t i ) ∈ R 1 , e.g., temperature, or gas concentration) to be used simultaneously for operator model identification. A set of M time intervals with time divisions τ 1 , τ 2 , ⋯ , τ M where M ≪ N , is defined for arranging the time divisions into bins over the same model input interval, that is, [ 0 , t N ] = [ 0 , τ M ] . An arbitrary but strategical selection for the time bin intervals is defined to achieve monotonously and gradually widening division intervals from the most recent ( τ M ) to the oldest ( τ 1 ) time instant, that is, [ 0 , τ 1 ] ≫ [ τ M − 1 , τ M ] = Δ t M , in such a way that the finest bin width equals the equidistant time divisions in t, that is, Δ t M = Δ t . Consequently, X ( t M ) = X ( τ M ) and X ( t M − 1 ) = X ( τ M − 1 ) . The width of each time-base bin is defined as Δ τ k = τ k − τ k − 1 , for k = 1 , ⋯ , M with τ 0 = 0 for the starting point of the first moving time window at k = 1 . Note that Δ τ M = Δ t N = Δ t by design.

E1. Examples of bins selection.

E1a. Given is a time interval of N = 327 days with 1-day increments as t i = i , where i = 1 , ⋯ , N . The number of bins is selected to be M = 50 . The task is to find a smooth and monotonous function for the τ k , k = 1 , ⋯ , M division points for covering the entire 327 time period. A power series function is selected for τ k as follows:

τ k = a ( 1 − b − k ) , k = 1 , ⋯ , M (1)

where:

a = t N − t N − 1 b 1 − M − b − M , (2)

and:

b = ( b t N − t N − 1 t N − t N − 1 ) 1 M (3)

With t i = i given, Equation (3) has to be solved first by iteration, that converges in 22 steps to 1e−12 absolute error, giving b = 1.0636 . From Equation (2), a = 342.7324 .

The τ k divisions from Equation (1) are plotted in

E1b. Given is a time interval of 327 days, each day to be further discretized to 5-minute intervals. This defines N = 327 × 288 = 94176 time intervals with 5-minute increments as t i = i , where i = 1 , ⋯ , N . The number of bins is selected

to be M = 50 . The task is to find the τ k , k = 1 , ⋯ , M division points for covering the entire 94,176 time period. From Equations (1)-(3), a = 327.0158 , b = 1.2199 , and the τ k time division points are evaluated. The most recent four values for τ k are τ 47 , 48 , 49 , 50 = [ 323.8053 , 324.9364 , 326 , 327 ] .

The τ k divisions in day units from Equation (1) are plotted in

The focus is on the real-time evaluation of a continuous, discretized data stream. The time-base bins are designed to hold the newest sample unchanged, and the characteristics of past data compressed, representative to the acquisition time of the cluster relative to the last, current time instant. There are several, known ways to characterize past data using some methods of averaging. For example, the conventional, daily average of minute-acquired temperatures use the integral mean value of the measured data, the integral approximated by the Riemann sum of the definite integral for each day. Following this example, and assuming for simplicity a continuous, piecewise-linear function, X p ( t ) , for representing the discretized data X ( t i ) , that is, X p ( t i ) = X ( t i ) , for i = 1 , ⋯ , N , the average data, X p ¯ k ( t ) , belonging to each time bin may be defined as:

X p ¯ k ( t ) = 1 Δ τ k ∫ τ k − 1 τ k X p ( t ) d t , t ∈ [ τ k − 1 , τ k ] (4)

There are difficulties in using Equation (4) for discretized data X ( t i ) directly. The τ k bin division points do not coincide with the t i time divisions except for bin k at k = M − 1 and k = M , therefore, X p ( t ) cannot simply be replaced by the X ( t i ) values within the time intervals [ τ k − 1 , τ k ] to avoid rounding errors. In addition, linear interpolation function fitting for X p ( t ) is necessary, albeit not practical, as the storage of all original data, X ( t i ) , is needed for X p ( t ) that alone contradicts data compression. Therefore, X p ¯ k ( t ) is not practical as defined in Equation (4) but re-written in its moving boundaries form for accepting a constant Δ t = Δ τ M time step change to account for the moving time window. The transition from Δ τ k bin at X p ¯ k ( t ) average to Δ τ k + 1 bin at X p ¯ k + 1 ( t ) average adds an ∫ τ k + 1 τ k + 1 + Δ τ M X p ( t ) d t difference value, while leaves behind a − ∫ τ k τ k + Δ τ M X p ( t ) d t difference value of the integral X p ¯ k ( t ) Δ τ k . The sliding window expression for X p ¯ k ( t + Δ t ) is:

X p ¯ k ( t + Δ t ) = X p ¯ k ( t ) + ∫ τ k + 1 τ k + 1 + Δ τ M X p ( t ) d t Δ t Δ τ k − ∫ τ k τ k + Δ τ M X p ( t ) d t Δ t Δ τ k , t ∈ [ τ k − 1 , τ k ] (5)

For evaluating X p ¯ k ( t + Δ t ) at the next time step, the two integrals in Equation (5) still need additional data at the beginning and the end time bin k to store, but at least the many data already averaged inside bin k will need not be individually stored as their previous average value is reused in X p ¯ k ( t ) . The shortcomings in using X p ¯ k may be alleviated by modifications of its content, leading to a different property of the sliding-averaged data. The modifications to be made to the expression in Equation (5) to make it less cumbersome to use are first introduced as approximations to replace the X p ( t ) kernel functions with their integral mean values for the respective time bins in the integrals:

∫ τ k τ k + Δ τ M X p ( t ) d t = ∫ τ k τ k + Δ τ M X p ¯ k ( t ) d t + ε 1 = X p ¯ k ( t ) Δ t + ε 1 (6a)

∫ τ k + 1 τ k + 1 + Δ τ M X p ( t ) d t = ∫ τ k + 1 τ k + 1 + Δ τ M X p ¯ k + 1 ( t ) d t + ε 2 = X p ¯ k + 1 ( t ) Δ t + ε 2 (6b)

Indeed, substituting Equations (6a) and (6b) into (5) gives an approximate expression for X p ¯ k ( t + Δ t ) that is easy to evaluate and effective in data compression, but includes the sum of two error terms, ε 1 + ε 2 :

X p ¯ k ( t + Δ t ) = X p ¯ k ( t ) − X p ¯ k ( t ) Δ t Δ τ k + X p ¯ k + 1 ( t ) Δ t Δ τ k + ε 1 + ε 2 , t ∈ [ τ k − 1 , τ k ] (7)

The need for a new, useful, average-type characteristics of the data stored in bin k is inspired by Equation (7) together with the goal of eliminating the ε 1 + ε 2 error term. The new data property is called the quantum of data in a time interval bin in sliding window coordinates, leading to the definition of the quantum of data.

D2. Definition of quantum of data in a time-base bin

Definition of the quantum of data, Q k ( t ) , in time-base bin k is given in a finite difference equation form as follows:

Δ Q k ( t ) Δ τ = Q k + 1 ( t ) − Q k ( t ) Δ τ k where Δ Q k ( t ) = Q k ( t + Δ τ ) − Q k ( t ) (8)

The quantum definition in Equation (8) expresses that the rate of change in quantum Q k at any time, t, over the finest time step, Δ τ = Δ t , is proportional to the rate of change of quantum differences between the upstream, Q k + 1 ( t ) , and downstream, Q k ( t ) quantum neighbors. It is straightforward to use Equation (8) step-by-step, starting from the Q k + 1 ( t ) = Q M ( t ) quantum that is known as it is always equated with the last, constant, sampled value of the data stream.

Applying the definition in Equation (8) for a discrete data series yields:

Q k ( i + 1 ) = Q k ( i ) ( 1 − Δ τ Δ τ k ) + Q k + 1 ( i ) Δ τ Δ τ k for k = 1 , ⋯ , M − 1 (9)

The quantum property in Equation (9) is an improvement over the sliding window property in Equation (7) as the ambiguous error term, ε 1 + ε 2 , is eliminated due to the modified definition. The sliding window average is not a convenient property to use in comparison to the sliding window quantum of data property. By definition and design, X p ¯ k ( i Δ t ) ≠ Q k ( i ) , but their value may be close to each other. The essential difference, however, is that Q k ( i ) is efficiently calculated with superior data compression while serves well the purpose of a reliable data characteristics for system model application with large data.

Note that the definition in Equation (9) is recursive and the Q k ( i + 1 ) quantum value at bin k at time ( i + 1 ) Δ t is defined by the weighted quantum value of Q k ( j ) at a previous time j Δ t , and the quantum value of the upstream neighbor bin, Q k + 1 ( i ) . The newest quantum value at k = M − 1 is Q M ( i ) , that is the single origin of filling all bins downward with their quantum content according to Equation (9). Q M ( t ) may be selected as the original data, X ( t ) , taken at t = i Δ t . This way, the quantum of data will retain everywhere the physical unit of the original data.

A straightforward way to give closed formulas of Q k ( t ) , k = 1 , ⋯ , M − 1 for evaluating the quantum of data directly in each bin from the original data stream may be obtained by repeatedly applying Equation (9) starting from the known, new value of Q M ( i Δ t ) = X ( i Δ t ) toward Q 1 ( i Δ t + Δ t ) . However, a simple, matrix-vector equation is more convenient for numerical evaluation as shown in the following example.

E2. Example of bin-to-bin quantum of data transformation using matrix-vector calculation

Let the values of quantum Q k ( i + 1 ) and Q k ( i + 1 ) for k = 1 , ⋯ , M be organized into column vectors Q i + 1 = [ Q k i + 1 ] and Q i = [ Q k i ] , respectively. Using Equation (8), new vector elements Q k i + 1 for ( i + 1 ) Δ t time for k = 1 , ⋯ , M − 1 can be expressed with the previous vector elements Q k i for i∆t time for k = 1 , ⋯ , M in a matrix-vector equation:

[ Q 1 i + 1 Q 2 i + 1 ⋮ Q M − 1 i + 1 ] = A [ Q 1 i Q 2 i ⋮ Q M i ] (10)

where A is a sparse ( M − 1 ) × M matrix with zero elements everywhere except for non-zero elements only in the main diagonal and in the first, upper off-diagonal:

A ( k , j ) = 0 for k = 1 , ⋯ , M − 1 ; j = 1 , ⋯ , M , but j ≠ i and j ≠ i + 1 A ( k , k ) = 1 − Δ τ Δ τ i for k = 1 , ⋯ , M − 1 A ( k , k + 1 ) = Δ τ Δ τ i for k = 1 , ⋯ , M − 2 } (11)

The last element of vector Q i + 1 for k = M , not included in Equation (10), is defined by the new data, that is, Q M i + 1 = X ( i Δ t ) .

E3. Example of quantum of data vectors for a harmonic signal

A continuous, sinusoidal data stream of 327 days sampled at regular 5-minute time intervals is processed into 50-element quantum vectors. A synthetic data stream is selected in the example to model daily and yearly temperature variations

superimposed according to X ( i ) = 1 2 [ sin ( 327 365 2 π i / N ) + sin ( 2 π 327 i / N ) ] ,

where the real-time vector is i = [ 1 , ⋯ , N ] , the series of time divisions. The time compartmentalization in E2 into 50 bins is used for the X ( i ) → Q i = [ Q k i ] transformation according to Equation (11). The 50 components of the Q i vectors are shown in Figures 2(a)-(d) with an arbitrary bin and time interval selection for best visualization. The selected time steps and Q k i elements are shown in:

E4. Example of quantum of data vectors for measured data

A true, outside temperature data stream of 327 days sampled at regular 5-minute time intervals is accessed from a commercial weather data vendor for Northern Nevada, USA. The data is processed into 50-element quantum vectors using the same process described in E3. The time compartmentalization in E2 into 50 bins is used for the X ( i ) → Q i = [ Q k i ] transformation according to Equation (11). The 50 components of the Q i vectors are shown in Figures 3(a)-(d) with an arbitrary bin and time interval selection for best visualization as before. As shown in

It is straightforward to expand the concept of the autoregressive (AR) model into a dynamic operator. The AR model of order p is defined following Shumway and Stoffer [

X ( i ) = c + ∑ j = 1 p φ j X ( i − j ) + ε ( i ) (12)

where c is a constant, φ j are constant coefficients, and ε ( i ) is noise. Applying the AR concept to the quantum of data with low-pass-filtered components instead of the original time series and absorbing c into the φ j coefficients leads the definition of the dynamic operator.

D3. Definition of the z-step dynamic operator

The z-step dynamic operator of the system, ϕ i , z , is defined by its matrix. The matrix of DQO is defined by the set of its φ k , p i , z coefficients, which satisfies the

simultaneous, AR model fit for all elements of the Q k i modeled quantum vector, Q k , m i , to the measured origin, Q k i with a minimized ε ( i ) fitting error for a set of quantum vector samples i ∈ S for all k ∈ M elements:

Q k i = ∑ p = 1 M φ k , p i , z Q p i − z + ε ( i ) (13)

where ε ( i ) = min [ ∑ S [ ∑ k ( Q i − Q m i ) 2 ] ] , i ∈ [ 1 , N ] , k ∈ M , and S ⊂ [ 1 , N ] .

The Q k i modeled quantum vector component on the left side and the Q p i − z quantum vector component given at a shifted time instant by z number of time steps on the right side are the inputs of the model fitting procedure, derived from measured data. The [ φ k , p i , z ] coefficients on the right side of Equation (13) are to be evaluated by best fitting the model prediction, Q k , m i to Q k i input with minimum error for all k components.

Time step shift z is a parameter of choice to forward predict future outcome from previous measured values of the time series. Equation (13) must be applied for all k components simultaneously. Using a matrix notation for the dynamic operator of the system at time index i as ϕ i , z = [ φ k , p i , z ] , Equation (13) for all k components reads:

Q i = ϕ i , z Q i − z + ε ( i ) , (14)

where ε ( i ) = min [ ∑ S [ ∑ k ( Q i − Q m i ) 2 ] ] , i ∈ [ 1 , N ] , k ∈ M , and S ⊂ [ 1 , N ] .

The [ φ k , p i , z ] coefficients of the ϕ i , z operator on the right side of Equation (14) must be evaluated from the measured data and subsequently process Q i and Q i − z quantum vectors using an optimization procedure for minimizing the error of fit, ε ( i ) .

The ϕ i , z operator is assigned to time index i, where t ( i ) is the current (most recent) time step. Each ϕ i , z operator is determined over a subset of sampled time steps, S, as well as over M quantum vector components to incorporate past history data. Each ϕ i , z operator characterizes the changing system with respect to time variation, focusing on to z-step forward prediction. While operator ϕ i , z has constant matrix coefficients, it may be considered as the sampled element of a dynamic, time variable operator, ϕ z ( t ) . Each ϕ i , z operator has an inherent matching error originating from the stochasticity of the data, processed into quantum vectors Q m i , obtained from the unknown system, and the mismatch between the temporal characteristics of the system and the AR operator model that enforces an autoregressive behavior.

D4. Definition of forward prediction from the dynamic operator of the system.

Equation (14) may be directly used for forward-predicting an expected, modeled quantum vector, Q m i , at time t ( i ) from a previous quantum vector, Q m i − z , processed from measured data at t ( i − z ) . Likewise, assuming the continuity of the ϕ i , z operator, forecast estimate may be written, jumping z steps from the most recent time t ( i ) , as:

Q m i + z = ϕ i , z Q i (15)

Alternatively, choosing z t r a i n = 1 in identifying operator ϕ i , 1 during model training, a z-step forward prediction estimate may be written as follows, repeatedly using z-times Equation (15), each step resulting in increasing the power index of ϕ i , 1 by one until the power of the required forward steps, z p r e d i c t = z are reached:

Q m i + z = ( ϕ i , 1 ) z Q i (16)

D5. Definition of a training data set for the solution of the dynamic operator of the system.

A training data set i ∈ S must be selected from the set of the Q m i quantum vectors for identifying the unknown ϕ i , z = [ φ k , p i , z ] coefficients in Equation (14). Set S is defined by the requirement for a unique solution for the elements of matrix ϕ i .

From elementary algebra, a minimum of M equations are needed for the solution of M unknown coefficients in an M-variable equation. For example, assuming a zero error term, hypothetically, for M = 50 and z = 1 , S = [ 1 , 51 ] quantum data set were sufficient to fill the left and right sides of Equation (14) and set 50 equations for the evaluation of the coefficients:

[ Q 51 Q 50 ⋯ Q 2 ] = ϕ 1 [ Q 50 Q 49 ⋯ Q 1 ] (17)

The solution, provided that the inverse matrix [ Q 50 Q 49 ⋯ Q 1 ] − 1 exists, is:

ϕ 1 = [ Q 51 Q 50 ⋯ Q 2 ] [ Q 50 Q 49 ⋯ Q 1 ] − 1 (18)

In reality, for the effective minimization of the fitting error term, a much larger input quantum set S is required. A least-square fit minimization scheme is devised by selecting a subset of time series input data, j ∈ S , S ⊂ [ 1 , N ] as follows:

[ Q j ] = ϕ z [ Q j − z ] , j ∈ S ⊂ [ 1 , N ] (19)

where [ Q j ] and [ Q j − z ] are M × j matrices, j ≫ M .

Multiplying Equation (19) on both sides from the right by the [ Q j − z ] T transpose matrix; and again, multiplying the result from the right by the inverse of the square matrix { [ Q j − z ] [ Q j ] T } − 1 gives the LSQ solution for the over-determined set of equation, provided that the inverse exists:

ϕ z = [ Q j ] [ Q j − z ] T { [ Q j − z ] [ Q j − z ] T } − 1 , j ∈ S , S ⊂ [ 1 , N ] (20)

The ϕ z is a matrix representation of the linear operator of the system applicable for dynamic, time-series analysis and prediction.

The solvability of Equation (20) defines the necessary training data set for the determination of the ϕ z DQO model. The solvability depends on the existence of the { [ Q j − z ] [ Q j − z ] T } − 1 inverse matrix.

E5. Illustrative example of a DQO model fit and forward prediction for weather

A true, outside temperature data stream of 327 days sampled at regular 5-minute time intervals is used in its quantum-processed form discussed in E4 for a model fitting and prediction exercise. At each of the i = 1 to 327 × 288 time steps, a separate DQO model is built using four days with sliding window width, w = 8 × 288 = 2304 as set S. The goals of the exercise are to check the quality of: 1) the DQO model fit for each time step, measured by the normalized absolute error between input data and model prediction at each time step; and 2) the DQO forward prediction steps of z = 12 steps ahead at each time step, measured by the normalized absolute error between the known (but yet unused) input data at i + z and the model forward prediction at i + z time step. The sliding time window moves from i = 1 , starting from an initial assumption of all zero history quantum values. The DQO model is trained to match the last 20 quantum components only (for k ∈ [ 31 , 50 ] ) as just a short memory of the system is needed to learn for a z = 12 -step forward prediction.

After the 400 coefficients of the ϕ i , z matrix of the DQO model of Equation (14) are determined with the LSQ solution of Equation (20) at each i time step over the w = 2304 -step training window, the model prediction, Q m i + z , is calculated for quality check from the quantum-processed input data Q i taken at real-time instants as:

Q m i + z = ϕ i , z Q i (21)

The variation of the Q m i and Q i quantum vector components for the k ∈ [ 31 , 50 ] components for the last moving window segment for i ∈ S are shown in Figures 4(a)-(h), (i being used instead of j in the notation in the figure). The components of the Q m i and Q i vectors with time are shown in (a)-(g) for k ∈ [ 44 , 50 ] (with each individual pair and k marked); and in (h) for k ∈ [ 31 , 43 ] (with only each k marked as no difference between Q m i and Q i can be seen). Note that

The forward-predicting capability of the DQO model is tested by evaluating forecasted outputs, Q m i + z from previous known values, Q i . Using Equation (21), the model’s output, Q m i + z , is calculated at each future time by z = 12 time steps outside the training time window, while using a 12-step-old DQO. The forecasted results, Q m i + z are compared with the known future values, Q i + z , not used in the DQO model training.

The absolute error of the model fit for each time step, normalized by dividing it with the average of the absolute values over each sliding window of w = 2304 is calculated as E ( i ) for i = 1 to 327 × 288 time step (327 days):

E ( i ) = w ( Q m i − Q i ) ∑ j = 1 j = w | Q i − j + 1 | × 100 [ % ] (22)

The variation of E ( i ) over the 327 × 288 time steps is shown in

The normalized absolute error of the model fit at forward predicted instances by z time steps for each time step over each sliding window w = 2304 is calculated as E z ( i ) :

E z ( i ) = w ( Q m i + z − Q i + z ) ∑ j = 1 j = w | Q i − j + z + 1 | × 100 [ % ] (23)

The graph of E z ( i ) and its histogram are shown in

A comparison between

The DQO model is developed for analyzing and controlling atmospheric conditions for safety and health in working and living. As demonstrated in E5, a DQO model can be identified and used for forecasting with minimum cost and efforts, adding values for the raw data. The hypothesis is that precious, and quite significant time may be saved for preventive interventions to alleviate impending hazard conditions at any monitored, living or working place. The hypothesis is tested in a mine safety and health application example.

Atmospheric conditions are obtained from in situ, monitored data from an operating mine for 327 days under normal operating conditions. The monitored parameters are air flow rate in the face drift (Qa), incoming Methane (CH_{4}) gas concentration at the main gate (c_{MG}), and exiting Methane concentration at the tail gate (c_{TG}). A synthetic data modification is introduced in Day 322 by an added Methane source (qms) surge that increases the CH_{4} concentration above the allowable threshold value of 2%. The goal is to forecast the effect of the qms

gas inburst as well as the resulting CH_{4} concentration by the DQO model for preventive intervention before the condition for a fatal explosion may happen.

E6. Illustrative example of a DQO model fit and forecast using large forward steps

The monitored parameters of air flow rate, Qa, incoming Methane gas concentration at the main gate, c_{MG}, and exiting concentration at the tail gate, c_{TG}, are inter-related. A transport model is used first for back calculating the root-cause gas source term, qm, from the observed incoming and exiting gas concentrations. A simplified Methane mass balance transport equation is used for the working drift:

c T G = c M G + 100 q m / q a [%] (24)

From the monitored data of the c_{MG}, c_{TG}, and qa variables, the q m p = 100 q m term is processed for DQO model building and forward prediction. The sampled values of qmp are first processed into quantum vectors. The DQO model is built for each 5-minute time step of 288 × 322 time intervals. Note that a 2-day interval is reserved beyond the model-building time period of 360 missing the last two days for model. At each of the i = [ 1 , 327 × 288 ] time steps, a separate DQO model is built using three days for sliding window width, w = 3 × 288 = 864 as set S. The goals of the exercise are to check: 1) the quality of DQO model fit for each time step, measured by the normalized absolute error between input data and model prediction at each time step; 2) the quality of DQO forward prediction at steps of z = 36 ahead, measured by the normalized absolute error between the known (but yet unused) input data at i + z and the model forward prediction at i + z time step; and 3) the hypothesis that CH_{4} threshold crossing can be detected many time steps ahead of the real-time occurrence from previous measured data. The sliding time window moves from i = 1 , starting from an initial assumption of all zero history quantum values. The DQO model is trained to match the last 20 quantum components only (for k ∈ [ 31 , 50 ] ) as just a short memory of the system is needed to learn for a z = 36 -step forward prediction.

The DQO model of Equation (14) are determined with the LSQ solution of Equation (20) at each i time step over the w = 864 -step training window. The model prediction, Q m i , is calculated for quality check from the quantum-processed input data Q i − z according to Equation (21), using z = 36 . The variation of the Q m i and Q i quantum vector components for the k ∈ [ 31 , 50 ] components for the last moving window segment for i ∈ S are shown in Figures 8(a)-(h), (i being used instead of j in the notation in the figure). The components of the Q m i and Q i vectors with time are shown in (a)-(g) for k ∈ [ 44 , 50 ] (with each individual pair and k marked); and in (h) for k ∈ [ 31 , 43 ] . As shown in Figures 8(a)-(h), the match between the DQO model’s output results, Q m i and the input data, Q i , is gradually improving toward slower frequency components at decreasing k values.

The forward-predicting capability of the DQO model is tested by evaluating forecasted outputs, Q m i + z from previous known values, Q i . Using Equation (15), the model’s output, Q m i + z , is calculated at each of future time by z = 36 time steps outside the training time window, while using a 36-step-old DQO. The forecasted results, Q m i + z are compared with the known future values, Q i + z , not used in the DQO model training.

The absolute error of the model fit, E ( i ) , for each of the 327 × 288 time step, normalized according to Equation (22), is shown in

The time gain by using the 36-step forward predicting DQO model against the real-time input data is directly evaluated. The temporal Methane concentration variation, c_{TG} in quantum vector form is back-calculated from the modeled Q m i + 36 prediction for the 100qm Methane source term. The comparison between the Q m i + 36 (forward modeled) and Q i (measured) concentrations are depicted in

E7. Illustrative example of a DQO model fit and forecast using repeated, small forward steps

The same input data and contaminant gas transport system is used in a demonstrational example for the same task but with the application of refined froward prediction steps to z = 1 , applying Equation (16). The goal is to forecast the effect of the qms gas inburst by the DQO model for preventive intervention before condition for a fatal explosive condition may happen. The DQO model training steps is reduced to z t r a i n = 1 step, while the z p r e d i c t = z = 20 is used by experimentation for forward prediction for achieving a similar result to the example in E6. The shortest forward step in training allows reducing the training

window to w = 360 without destabilizing model training. As before, the DQO model is trained to match only the last 20 quantum components amid the short-lived memory of the gas transport system.

The DQO model are determined with the LSQ solution of Equation (20) at each i time step over the w = 360 -step training window. The model prediction, Q m i , is calculated for quality check from the quantum-processed input data Q i − 1 , applying the power index formula in Equation (16) for model forecast, using z = 20 :

Q m i + z = ( ϕ i , 1 ) z Q i (25)

The variation of the Q m i and Q i quantum vector components for the k ∈ [ 31 , 50 ] components for the last moving window segment for i ∈ S are shown in Figures 14(a)-(h), (i being used instead of j in the notation in the figure).

The components of the Q m i and Q i vectors with time are shown in (a)-(g) for k ∈ [ 44 , 50 ] (with each individual pair and k marked); and in (h) for k ∈ [ 31 , 43 ] . As shown in Figures 14(a)-(h), the match between the DQO model’s output results, Q m i and the input data, Q i , is excellent for all frequency components over all k values.

The forward-predicting capability of the DQO model is tested by evaluating forecasted outputs, Q m i + z from previous known values, Q i . Using Equation (25), the model’s output, Q m i + z , is calculated at each of future time by z = 20 time steps outside the training time window, while using a 1-step-old DQO matrix, ϕ i , 1 , on the power index of z = 20 . The forecasted results, Q m i + z are compared with the known future values, Q i + z , not used in the DQO model training.

The absolute error of the model fit, E ( i ) , for each of the 327 × 288 time step, normalized according to Equation (22), is shown in

The DQO model performance is analyzed for the critical, disturbed day in forecasting of a CH4 surge, surpassing the threshold value of 2% by a synthetically induced bump during day 325. The time gain by using the 26-step forward predicting DQO model against the real-time input data is directly evaluated. The temporal Methane concentration variation, c_{TG} in quantum vector form is back-calculated from the modeled Q m i + 20 prediction for the 100qm Methane source term. The comparison between the Q m i + 20 (forward modeled) and Q i (measured)

concentrations are depicted in

A dynamic model identification method is described with definition of quantum vectors, representing a time series of data, X ( t i ) . Definitions and examples are given for the time compartmentalization into bins; the data processing and distribution into bins; and transport of the quantum of data between the bins during step-by-step sliding from the most recent to the past time periods. Various data compression methods are shown for comparison of characteristics including

the common, sliding time window averaging and a new property, named the “moving window quantum of data”. The moving window quantum value in each bin is defined from the contained X ( t i ) data for constructing a set of base vectors of the dynamic operator of the system. It is shown that the quantum vector form for retaining past and present data characteristics is most advantageous for time series analysis for short-time and long-time memory effects of the modeled system as the data is efficiently compressed from tens of thousands recorded numbers into only fifty elements without loosing pertinent information.

The compressed form of data into quantum vectors is used as the linear space for building a DQO model, ϕ i , z , for the system at every time step for a real-time process. Definition of the ϕ i , z operator and its training data set, as well as the solution for identification from input data are both given in mathematical forms.

Forward prediction is defined, using the ϕ i , z operator as it inherently includes a time step z for model identification. As shown in three illustrative examples, the quality of model identification of ϕ i , z foretell that of the error in forward prediction, a useful feature in practical applications. A steady DQO model performance within about ±20% normalized error up to 1 hour forward-step forecast is shown in the E5 outside weather temperature example, making the method appealing, especially in comparison to published results for LSTM NN models with poorer forward prediction performance. In addition to excellent stability, the computational time for model DQO identification and forward prediction at each time step takes 18 milliseconds using a laptop computer.

Two additional examples are shown for analyzing and controlling atmospheric conditions for safety and health in working and living. DQO models are identified and used for forecasting methane concentration variations from monitored data. A hypothesis is tested regarding a time advantage that may be gained by DQO model prediction, and saved for preventive interventions to alleviate impending hazard conditions at any monitored, living or working place. The hypothesis is tested quantitatively, using two forward-prediction algorithms to consider in a mine safety and health applications.

• A new method is presented for AR time series analysis of a real-time, continuous data stream.

• A new type of data compression, using data quantum vectors, is developed, and implemented for practical applications.

• A new type of DQO model-building and identification method is described.

• Three numerical application examples are shown using real-world input data for DQO model identification. Performance metrics of the DQO model are demonstrated in forward prediction applications.

• The hypothesis test about significant time gain is affirmed by forward prediction using the DQO model in the racing for preventive interventions to counter impending hazard events in atmospheric conditions.

A research granted from the Alpha Foundation for Mine Safety and Health is gratefully recognized. The research was thankfully supported by the GINOP-2.3.2-15-2016-00010. “Development of Enhanced Engineering Methods with the Aim at Utilization of Subterranean Energy Resources” project of the Research Institute of Applied Earth Sciences of the University of Miskolc in the framework of the Széchenyi 2020 Plan, funded by the European Union, co-financed by the European Structural and Investment Funds.

The author declares no conflicts of interest regarding the publication of this paper.

Danko, G. (2021) Quantum Operator Model for Data Analysis and Forecast. Applied Mathematics, 12, 963-992. https://doi.org/10.4236/am.2021.1211064