Feasibility Study of Parameter Identification Method Based on Symbolic Time Series Analysis and Adaptive Immune Clonal Selection Algorithm

The feasibility of a parameter identification method based on symbolic time series analysis (STSA) and the adaptive immune clonal selection algorithm (AICSA) is studied. Data symbolization by using STSA alleviates the effects of harmful noise in raw acceleration data. The effect of the parameters in STSA is theoretically evaluated and numerically verified. AICSA is employed to minimize the error between the state sequence histogram (SSH) that is transformed from raw acceleration data by STSA. The proposed methodology is evaluated by comparing it with AICSA using raw acceleration data. AICSA combining STSA is proved to be a powerful tool for identifying unknown parameters of structural systems even when the data is contaminated with relatively large amounts of noise.


Introduction
Structural health monitoring (SHM) for predicting the onset of damage and deterioration of building structures is receiving more and more attention because of the rising numbers of aged structures and high costs caused by unpredictable hazards.
Some success has been achieved with various heuristic optimization algorithms such as genetic algorithms (GAs), evolution strategy (ES), simulated annealing (SA), particle swarm optimization (PSO), clonal selection algorithm (CSA), and differential evolution (DE).These heuristic stochastic search techniques seem to be a promising alternative to traditional approaches.The SA and GA methods have been used to accurately describe the dynamic behaviors of structures [1].Cunha & Smith used GAs to identify the elastic constants of composite materials [2].PSO has been used to estimate the severity of damage and identify parameters of shear frame building structures [3].An improved CSA, called adaptive immune CSA (AICSA), has been used for structural damage localization and quantification [4,5].Moreover, DE has been used to identify induction motor problems [6] and structural systems [7].These heuristic approaches are very powerful in many applications.However, they are often sensitive to noise.Symbolic time series analysis (STSA) for anomaly detection in complex systems [8] has the potential to deal with noise.Several case studies [9][10][11] have shown that STSA is more effective at anomaly detection than pattern recognition techniques such as principal component analysis and neural networks.STSA has also been used for fault detection in electromechanical systems, such as in three-phase induction motors [12] and helical gearboxes in rotorcraft [13].
We studied the feasibility of using the Euclidean distance of a state sequence histogram (statistical features of the symbol series that transformed from time series data) of symbols as an objective function of AICSA for the purpose of identifying structural parameters.We theoretically investigated the effects of parameters in STSA and conducted various numerical tests to show how combining AICSA and STSA improves performance of structural parameter identification.The results show that with the proper parameters, our methodology is a reliable and effective way of identifying structural parameters.

Symbolic Time Series Analysis
It may be appropriate to say that, while classical data analysis focuses on individuals, symbolic data analysis deals with concepts, a less specific type of information.Through symbolic conversion, the original time series signals are converted into sequences of discrete symbols, and the statistical features of the symbols can be used to describe the dynamic statuses of a system.Consider a structural system .The response of raw acceleration data can be recorded by using sensors.A section of this data is  0 1 1 T  , which can be obtained by sliding a rectangular window with length T along the time series of raw acceleration.The first step is to transform the raw acceleration data into a binary symbol series   i equals "0" or "1" due to a partition line.After that, we select an integer (word length) and define the symbolic state at time t as the vector s containing the follow-up r output symbols, namely , , , , t s defines a state series  0 1 . A binary coded t  , , , s s s  s should be transformed into the decimal domain, and note that t s can take possible values (called "states"), which can be listed in a finite set .We can then derive the statistics of the symbolic state, i.e., compute the vector of the observed state frequencies   , where (integer ) is the number of occurrences of .Also, since there are T r states in the S i state series in total, D can be normalized as 1 In the example shown in Figure 1, the window length and the sampling points of a raw acceleration data series are shown as small circles, which have different values; the x-axis is time and the y-axis is acceleration data.The partition line is the one with the mean value of the raw acceleration data series.Thus, the whole space is separated into two regions.The acceleration data that falls inside the upper region is symbolized by "1"; otherwise, it is "0".The result of the symbolization is a binary coded symbol series that only contains "0" and "1".In this example, a word length of 3 was used to create words, which means the first three symbols "1 0 0" are chosen as the first word, and the second to fourth symbols "0 0 0" are chosen as the second word.By repeating this procedure, 24 words can be created from the symbol series.Every binary coded word needs to be transformed into the decimal domain.Take the first word as an example."1 0 0" can be transformed to 4   , which is called a "state".A state series can be obtained after all the words are transformed from the binary domain to the decimal domain, which constitutes the values 0 -7.
As shown in Figure 1, the occurrence number of certain states in the state series varies.A bar graph used to plot the occurrence number of every state in a state series is called a "state sequence histogram" (SSH).The corresponding SSH for this example is plotted in Figure 2(a).Taking state "5" as an example, the corresponding count number is "3", meaning that state "5" occurs three times in the state series (as marked in the state series of Figure 1).Also, the SSH can be normalized, which can be accomplished by dividing the occurrence number of each state by the total number of states in the state series (Figure 2(b)).

Procedure
In the research field of structural parameter identification, the time response of the system is usually compared with that of a parameterized model using a norm or some performance criterion to give us a measure of how well the model explains the system.
We will explain our methodology (Figure 3) using a physical system with input and output y.
denote the value of the actual system at the ith discrete time step.Suppose that a parameterized model able to capture the behavior of the physical system is developed and this model depends on a set of n parameters, i.e., Given a candidate parameter value x and a guess 0 , the value of the parameterized model, i.e., the identified system at the ith discrete time ( ) step, can be obtained.Hence, the problem of system identification boils down to finding a set of parameters that minimize the prediction error between the system output y t , which is the measured data, and the model output y x t which is calculated at each time instant .
Usually, our interest lies in minimizing the predefined error norm of the time series outputs, e.g., the following mean square error (MSE) function, where represents the Euclidean norm of vectors.Formally, the optimization problem requires one to find a set of n parameters x R  ( ) so that a certain quality criterion is satisfied, namely, that the error norm f  is minimized.The function ( )  f  is called a fitness function or objective function.Typically, an objective function that reflects the goodness of the solution is chosen.
In our methodology, we introduce an index, the relative state sequence histogram error (RSSHe), to measure the distance between SSH a and SSH b (SSH a and SSH b are the system output and model output, respectively).The definition is: where is the frequency of state i in SSH a or SSH b .

/ a b
Inspired by the clonal selection principle (CSP), the clonal selection algorithm (CSA) has been used to deal with optimization problems because of its search capability is superior to those of classical optimization techniques [14].
Although CSA has great advantages over the genetic algorithm (GA), it is still difficult to use it to solve complex problems.To be able to solve complex problems, in AICSA, three strategies, i.e., secondary response, adaptive mutation regulation and vaccination, are used to improve the CSA's convergence speed and global optimum search.For detailed information about AICSA, please refer to [4,15].

Guideline for Parameter Selection
In STSA, the main parameters are the word length and window length, and they control the resolution of the whole representation space.For a window length T and word length r, two limiting cases of SSH are predefined as: Case 1: All states in the SSH are distributed uniformly, and the frequency of each state is 1 2 r .Case 2: Only one state in the SSH has the frequency of 1; the frequencies of the other states are 0.
Suppose there are two different SSHs: SSH a and SSH b .From Equation (3), when SSH a corresponds to limiting case 1 and SSH b to limiting case 2, the maximum value of RSSHe is: when SSH a and SSH b are the same, the minimum RSSHe is 0.Then, Since the minimum changeable unit in SSH is the change in frequency of one state in SSH will absolutely be related to the change in frequencies of other states.Supposing that there are only two minimum unit differences between SSH a and SSH b , the minimum distinguishable RSSHe is: when SSH a is limiting case 1, the maximum distinguishable will be: ( 1) max 2 1 when SSH a is limiting case 2, the minimum distinguishable will be: The resolution is: Note that we also need to consider the number of the possible distributions of states in one SSH.If the number of states in SSH is and the minimum changeable unit , finding the total number of possible distributions SSH of SSH boils down to a classic combination problem, which is "put identical balls in different boxes".The combinatorial number is: (10) As we can see, longer window and word lengths are related to higher resolution, which means that the self and non-self spaces can be separated much more accurately.This is the key to obtaining accurate structural parameter identification.
So far, our discussion of the effect of the window length and word length has been based on a case in which only one story's output (raw acceleration data) is used, but structures with multiple degrees of freedom (MDOF) may have more outputs than that.Supposing the outputs from N stories can be obtained, the boundary of the solution space is: The resolution falls to: Also, the total number of possible distributions increases to: From Equations ( 11) to (13), it is evident that as more story outputs are obtained, the more accurate the identification results will be.

Description of SDOF Model
For simplicity and generality, we used a single-story shear frame structure as a representative case to verify the effect of the parameters in STSA, and we modeled it as a single degree-of-freedom (SDOF) lumped mass system (Figure 4).As for the structure, its mass was 1000 kg, stiffness 1.000 MN/m, and natural frequency 5.032 Hz.The dynamic equation is [16]: where M, C, and K are respectively the mass matrix, damping matrix, and stiffness matrix.  f t is the force vector linked to the ground acceleration.X  , X 

 
, and X are respectively relative acceleration, velocity, and displacement response.The sampling frequency was 100 Hz.In the simulation, the input signal was Gaussian white noise.The root-mean-square error (RMSe) was used to verify the feasibility and performance of the identification results.RMSe is defined as 2 , real, 2 real, 1 where , c i and real,i are the candidate stiffness and real stiffness of the ith story, respectively.k To test the noise immunity of our method, noise at levels of 5%, 10% or 20% was added to the raw acceleration data.

Effect of Varying the Window Length and Word Length
As stated before, the window length and word length are the control parameters in STSA.In the simulation, the mass distribution and damping parameters were assumed to be known and the stiffness of each story was set as the objective parameters that needed to be identified.In the first verification, the word length was varied from 1 to 12 and the window length was 3000.In the second verification, the word length was 9 and the window length was varied from 500 to 6000 at intervals of 500.Each test was run 10 times independently by choosing an initial value of AICSA randomly every time.The parameters of AICSA were the same as in [4,5].Figures 5 and 6 indicate that the word length and window length greatly affected the performance of the methodology.Larger word and window lengths yielded better performance.The reason is that, as theoretically shown in Section 3.2 (Equations ( 4)-( 10)), a longer word or window can symbolize the raw acceleration data much more accurately than a shorter one.As the word length and window length increase, much more dynamic information about the system is captured, the identified results become more accurate, and the maximum and mean RMSe decrease.In the simulation, a word length more than 9 and window length more than 3.0E+03 gave acceptable results.
Table 1 (under the label "STSA") lists the identification results of the SDOF structure using a word length of 9 and a window length of 3000 for different noise levels.For comparison, we estimated the parameters and RMSe for an SDOF model using raw acceleration data as input, for a data length of 3000, i.e., the same as the window length in STSA; the results are listed under the label "RAW".As we can see, although AICSA using raw acceleration data gives good results for the noise-free case, its RMSe greatly increases as the noise level grows.In con-trast, our method has good noise immunity; the identification results stay accurate as the noise level increases.Even for the high noise level of 20%, the RMSe of the results is only 0.04%.

Description of MDOF Models
Next, we tried to see if our methodology can be reliably used to identify the parameters of an MDOF system.Here, we choose three cases, a 3-DOF, 5-DOF (shown in Figure 7 as an example) and 10-DOF structure, as examples.These structures were modeled as multiple degreeof-freedom lumped mass systems.Table 2 summarizes the structural parameters.In these structures, the mass of each story was 1000 kg, and the stiffness of each story was 2.000 MN/m.The damping ratios of all MDOF structures were the same, i.e., 0.03 and 0.05 for the first and second modes, respectively.

RMSe for MDOF Models
In the input signal of the MDOF simulation, was Gaussian white noise, as was used in the SDOF simulation.The stiffness of each story was unknown and needed to be identified.The methodology of combining AICSA with STSA was compared with AICSA using raw acceleration data.Window and word lengths were the same as before, and the full output of the structure was used.
Table 3 lists the identified stiffness of each story of the 3, 5 and 10 DOF models.The estimated parameters for the 3, 5 and 10 DOF models using AICSA using raw acceleration are not listed for lack of space, but Table 4 summarizes the RMSe of the identification results of the 3, 5 and 10 DOF models using AICSA combining with STSA as well as AICSA using raw acceleration data.Figure 8 compares the RMSes of the identification results for different structures when using our methodology (the "STSA" column in Tables 1 and 4).As we can see, AICSA combining STSA can identify the parameters of a structure accurately regardless of whether the structure is SDOF or MDOF.RMSe does increase slightly as the DOF of the structure increase, because the solution space of the identification problem becomes much more complex as the DOF go up.
Moreover, AICSA combining STSA outperformed AICSA using raw acceleration data (results under the "RAW" label in Table 1 and 4) on the MDOF models.Furthermore, it provided much better estimates when the output data was contaminated with noise.These results clearly show that our methodology has excellent noise immunity.

Estimation Using Partial Outputs
The simulations results of the MDOF structures are based on the full output of the structural acceleration data.For an SDOF structure, only one output can be used, but for MDOF structures, it may be the case that not all outputs are available.Therefore, to verify the methodology on only partial output, a 5-DOF structure was simulated and data from some of its stories (randomly chosen) were used.The simulated output data was noise-free, with 5% noise, 10% noise, or 20% noise.The window length was 3000, and the word length was 9.
The results in Figure 9 illustrate that even using partial data, the proposed method gets acceptable results.Note that for a certain noise level, the RMSe of the identification results increase as the number of outputs decreases.Moreover, Equations ( 11)-( 13) can be numerically proved to be right as t more outputs are obtained.

Conclusion
We conducted a feasibility study of a parameter identification method based on symbolic time series analysis (STSA) and adaptive immune clonal selection algorithm (AICSA).Harmful noise in the raw acceleration data was alleviated by employing STSA.The effect of varying the parameters of word length and window length in STSA was evaluated theoretically and verified numerically.A comparison with AICSA using raw acceleration data revealed that our methodology provided better estimates of structural parameters when the data was contaminated by noise.The results show that with the proper parameters, our methodology is a reliable and effective method for structural parameter identification.

Figure 1 .Figure 2 .
Figure 1.Process of symbolizing a time series of raw acceleration data.

Figure 3 .
Figure 3. Procedure of AICSA combining STSA for identification of structural parameters.

Figure 5 .Figure 6 .
Figure 5.Effect of varying word length for a window length of 3000.18

Figure 9 .
Figure 9.Comparison of RMSe due to partial output.