Method of Multi-Mode Sensor Data Fusion with an Adaptive Deep Coupling Convolutional Auto-Encoder ()
1. Introduction
As the working scenarios of industrial machines become more and more complex and diverse, their health condition monitoring is of increasing importance [1] . Single sensor signals cannot fulfill this task in diversified scenarios. In many circumstances, the health states of machines can be reflected by sensor signals such as vibration, electrical current, and sound. Therefore, it is essential to fuse these multi-mode sensor signals to obtain overall equipment health status for monitoring.
For status monitoring based on multi-sensor data fusion, the three classical methods—data fusion, feature fusion, and decision fusion—find a broad range of applications [2] [3] [4] [5] . Fatehi et al. proposed an iterative algorithm to obtain the covariance matrix of the estimation errors of two Kalman filters and applied it to data fusion [6] . Xiong et al. proposed a data fusion method based on mutual dimensionless [7] . Sun et al. proposed a multi-rule fusion feature selection algorithm that identified the optimal subset of features from the high-dimensional original feature space for status monitoring [8] . Wang et al. proposed a multi-dimensional feature fusion method in which features of ongoing degradation such as peak-to-peak value and mean squared value of the bearing vibration signal were extracted for bearing lifetime prediction [9] . Liu et al. applied a new sparse classification fusion method to the health status monitoring of locomotive bearings and proved its validity [10] . Haghighat et al. proposed discriminant correlation analysis (DCA), which incorporates class associations into correlation analysis of the feature sets for feature-level fusion [11] . It is worth mentioning that all these methods require the specific features to be extracted before status monitoring. However, some research has indicated that many features are valid only at certain stages under certain conditions, making it difficult to extract features that are representative of equipment conditions [12] . Therefore, classical status monitoring based on multi-sensor fusion is hindered by difficulties in feature extraction.
Deep neural network (DNN) and deep learning (DL) techniques have become a powerful tool in industrial applications due to their strong capabilities in feature learning and their high accuracy in inference [13] [14] [15] [16] . These methods can automatically mine complex structures and learn, layer by layer, the useful features from the original data. Many studies have illustrated that DNN can fuse the original input data, extract the fundamental information from the lower layers, fuse the results into high-level representations in the middle layers, and further fuse them in the upper layers to form the final decision [17] [18] [19] [20] . Note that DNN is a fusion structure that integrates feature extraction, feature selection, and three-level fusion into a single learning body [21] . Therefore, it is very suitable for multi-sensor fusion.
In recent years, health status monitoring algorithms with multi-sensor fusion based on deep learning have been extensively investigated. Li et al. proposed a convolutional neural network with atrous convolution for the adaptive fusion of multiple source data [22] . Wang et al. proposed a multi-sensor data fusion method and a bottleneck layer-optimized convolutional neural network for fault recognition [23] . Chen et al. fused horizontal and vertical vibration signals through a deep convolution neural network and obtained improved bearing health condition detection [24] . Hao et al. One-dimensional convolutional neural network with short-term and long-term memories is used to extract temporal and spatial features of vibration signals. The features were then stacked for better bearing fault diagnosis [25] . Chen et al. used an SAE-DBN strategy to extract the time and frequency domain features of various sensor signals, which were then fused and classified [26] . Moslem et al. proposed a novel 2-dimensional CNN for electrical motor current fusion analysis [27] . Li et al. proposed a multi-scale sensor feature fusion CNN that fused vibration signals in the data and feature layers for bearing status monitoring [28] . Although these methods have obtained good monitoring results, they fused single-mode sensor data only, without using complementary information from other modes for equipment status monitoring.
To address this problem, Zhou et al. proposed a deep learning method based on multi-mode feature fusion for on-line machinery bearing diagnosis [29] . Wang et al. used a deep learning-based multi-resolution multi-sensor fusion network that fuses current and vibration signals for asynchronous motor status monitoring [30] . Fu et al. investigated a dynamic-routing based multi-mode network that could adaptively assign weights to different modes to monitor induction motor status after fusion [31] . One problem with these methods is that they all extract the features of different modes separately. Once the features have been extracted, any shared information among the modes is lost. In addition, all these methods use the stacked extracted features directly for status monitoring and do not consider the situation when feature qualities from different data sources differ greatly, resulting in feature collisions.
Presently, vibration signals are widely used for machinery health condition monitoring. However, the literature on current signal-based methods is sparse [32] . Although the damage features of the current signal are markedly weaker than those of the vibration signal, it is not only easier to acquire than other signals, but also contains rich information about machinery health condition. Therefore, to obtain more comprehensive information, vibration and current signals were selected for fusion analysis.
For this purpose, a new deep structure has been proposed—a fusion model based on an adaptive deep coupling convolutional auto-encoder (ADCCAE). This approach not only ensures the effective utilization of the original multi-mode data to extract common feature information, but also solves the problem that arises when the features of different data sources vary widely.
The main contributions of this paper can be summarized as follows:
1) A coupling convolutional auto-encoder has been designed to synchronously extract individual and compound features from multi-mode data;
2) Vibration and motor current signal were used to the health status monitoring system synchronously;
3) A GWO algorithm was used to optimize the coupling coefficients and network parameters in a closed loop to obtain an optimal model, which has some extent solved the problem that the features of current and vibration signals differ greatly.
The rest of the paper is organized as follows: Section 2 introduces network basics. Section 3 describes the details of the ADCCAE fusion model. Section 4 demonstrates the application of the proposed deep fusion network in classifying the health status of motor bearings using vibration and current data, and compares the results with other methods. Finally, in Section 5, conclusions are drawn, and future research prospects are discussed.
2. Convolution Auto-Encoder
A convolution neural network (CNN) is a typical feed forward neural network. It contains multiple filters to extract features from input data [33] [34] [35] [36] . Through a local receptive field, shared weights, and sub-space sampling, a CNN can maintain the features of theoriginal data, or in other words, it is translation invariant [24] . The convolutional auto-encoder (CAE) possess the characteristics of both a CNN and an auto-encoder (AE). It is a neural network model with hidden layers in which the output layer recovers the original input data. The central hidden layer performs dimension reduction and feature extraction on the input data. Therefore, it usually has fewer neurons than the input layer. For a mono-channel input x the latent representation of the kth feature map can be written as [37] :
(1)
where the bias
is encoding layer bias,
is the weight matrix, f is the activation function (the LeakyRelu function is used in this paper), and ‘*’ denotes the 2D convolution. The output of the hidden layer can be reconstructed and recovered in the form:
(2)
where again there is one bias c per input channel. H identifies the group of latent feature maps;
identifies the flip operation over both dimensions of the weights. When the CAE is trained, the parameters are obtained through the minimized reconstruction error, which can be written as:
(3)
where
represents the model parameters,
is the ith training sample, nis the number of training samples, and
is the reconstruction error function. In this study, the mean squared error was used to calculate the reconstruction error:
(4)
As in standard networks, the back-propagation algorithm and the Adam optimizing algorithm were used to update the parameters to minimize L [38] .
3. ADCCAE Fusion Strategy
3.1. DCCAE Framework
Because intrinsic correlation exists among the multi-mode data acquired from various types of sensors on the subject under monitoring, it is necessary to extract compound features during learning. Therefore, the Deep Coupling Convolutional Auto-Encoder (DCCAE) fusion framework was constructed in this study to process multi-mode data, as illustrated in Figure 1. The DCCAE fusion framework comprises a coupling convolutional auto-encoder (CCAE) module constructed from two CAEs, and a feature fusion and compression module containing the encoding part of CCAE, consisting of two full convolution layers and two full connection layers.
In the CCAE structure, each CAE extracts the features from individual single-mode data. In addition, to obtain the compound information, the two CAEs couple the pre-defined similarity measures of the reconstructed output to capture the correlation between different types of data. Although these two types of CAE share the same architecture, their internal parameters for learning are different. The compound information obtained from the trained model is fed to the second part (the fusion and compression module) for fusion, resulting in fused data that contain more comprehensive information.
Let the reconstructed data obtained from two different sensor data inputs be defined as
and
, where
and
are the reconstructed outputs of the vibration and current signals and
and
are the parameters of each model. The similarity between the reconstructed outputs of the two CAEs can be formulated as:
(5)
where
and
are the vibration and current signals, respectively. To understand the correlation between two types of data from the same subject, a coupling loss function was constructed to simultaneously train the two CAE models. This coupling loss function can be written as [39] :
(6)
where
and
are the reconstruction losses for the vibration CAE and the current CAE respectively;
,
, and
are the parameters that control the coupling between the reconstruction loss function and the similarity loss function.
is the control parameter that measures the similarity between the two types of data.
Training can be carried out through back-propagation. The gradient of the model loss function can be computed as:
(7)
3.2. ADCCAE Fusion Model
Because the quality of the features of different data types (vibration and current) can vary greatly, it is difficult to determine the parameters of the coupling loss function used by the ADCCAE fusion framework. As a result, a high-efficiency fusion model is hard to obtain.
The GWO algorithm is a popular optimizing method that imitates the predatory behavior of gray wolves. Compared to other optimization methods, it is more robust, stronger in generalization, and capable of finding the global optimum [40] [41] . An ADCCAE fusion model was therefore proposed. The model selects weight coefficients for the coupling loss function as the objective and seeks the optimum through automatic iterations with GWO.
The fusion model with optimal parameters is obtained as a result. As illustrated in Figure 2, the model contains four modules. Module A is the section that collects and pre-processes the data. It standardizes the input and sequentially converts the 1-D synchronous vibration and current data into a 2-D matrix. Module B is the fusion network of ADCCAE. It consists of the coupling convolutional auto-encoder network, the feature fusion and compression network module, the fusion feature visualization network module, and the performance validation module. Its main function is to fuse the pre-processed multi-mode data and evaluate the fusion. The fusion performance validation module contains a Softmax network. Module C is the GWO algorithm module that seeks to optimize the unknown parameters; module D displays the results of model optimization for overall model performance evaluation. Suppose that there are K labels; the Softmax module can then be defined as:
(8)
Figure 2. The framework of the proposed method.
where
,
,…,
are model parameters and
is the evaluation result of the DCCAE fusion model.
The operation steps are as follows: (1) Input the data into module A for preprocessing and initialize the coupling weight parameters; (2) Input the pre-owned data into Module B for fusion compression to evaluate the fusion effect; (3) With the coupling weight coefficient as the positioning parameter of GWO and the maximum fusion verification accuracy as the optimization objective of GWO, the fusion network is trained to optimize the coupling weight parameters, and this step is repeated until the model converges and the model optimization results are displayed.
Figure 3 illustrates the flowchart.
To summarize, the ADCCAE fusion model has two important characteristics: (1) it differs from other methods in which representation and fusion are based on a single feature and can learn similar features from different sources through mapping multi-mode data into the same representational space, resulting in fused data that contain more comprehensive information about the equipment; and (2) it optimizes key model parameters through GWO to achieve the best fusion performance.
4. Experimental Results
To validate the effectiveness of the ADCCAE multi-mode data fusion strategy, experiments were conducted with public bearing damage data sets from the University of Paderborn, Germany [42] .
Figure 3. Flowchart illustrating of the proposed method.
4.1. Data Collection
The datasets employed in this experiment consist of synchronous vibration and current data obtained from bearing accelerated lifetime tests conducted by the University of Paderborn on a practical scientific experimental platform.
The platform analyzes the impacts of operating parameters under various working conditions, including the rotational speed, the axial force, and the system load torque. This study used the data with rotational speed N = 900 RPM, load torque T = 0.7 Nm, and axial force F = 1000 N as the working conditions. To validate fusion effectiveness, first-class single-point fatigue pitting damage to the outer ring (KA04), first-class plastic deformation indentation damage (KA15), first-class single-point fatigue pitting damage to the inner ring (KI21), and normal bearing data (K001) were used for model validation. KA04, KA15, and K001 were the data sets for Experiment I, and KA04, KI21, and K001 were the data sets for Experiment II. Figure 4 illustrates the vibration and current data for bearings under three health conditions. Clearly, there are obvious differences between the damage features of the two data types. The damage features of the vibration data are obvious, and the damage features of the current data are weak.
4.2. Fusion Performance Validation Experiment
Two groups of experiments were conducted: Experiment I was designed to test the model's classification capability for various types of damage that occur on the outer ring of the bearing; Experiment II was intended to test the classification capability of the model for pitting damage occurring on the inner or outer rings. The operating environment was configured as follows: (1) a processor (AMD Ryzen 5 2600X six-core processor, 3.60 GHz); (2) memory (16G); (3) a graphic card (NVIDIA GeForce GTX 1660, 6G); and (4) a code execution environment (Pytorch = 1.2.0, Python = 3.7.9).
When using the ADCCAE fusion model, the partitioning of the data sets, the
Figure 4. Vibration and current signals of 4 health conditions: (a) normal, (b) OR indentations, (c) OR pitting, (d) IR pitting.
number of neurons, and the selection of hyperparameters are important factors that affect the quality of feature extraction and fusion. For fair and effective comparisons, the first 160,000 samples of the data set were partitioned into a training set and a test set with a ratio of 3:1. By manipulating the seed, the sequence of each data withdrawal was kept the same. In addition, to simplify parameter selection, the network parameters of the two CAE were configured to be the same as those listed in Table 1, where Conv_1 and Conv_2 and pool_1 and pool_2 are the encoding section of the CAE and Conv_3, Conv_4, and F. interpolate form the decoding section. F. interpolate implements up-sampling for the decoding operation.
In the compression module of feature fusion, the network parameters were configured according to Table 2. The input of this module is the stacking of the output from the two CAE encoding layers. Due to the overly concentrated information within each data type and the overly absolute distance between the two data types, the fusion section was designed as two multi-channel full-convolutional layers to simultaneously perform semantic extraction and fusion, thus achieving superior fusion performance. Finally, two full-connection layers were designed as a compression section for easier validation and visualization.
It is worth mentioning that when the ADCCAE fusion network was trained,
Table 1. Network structure parameter (CAE).
Table 2. Feature-fusion network structure parameters.
the coupling loss weight
,
, and
were first set as the location parameters of GWO. The returns from GWO were used as initial values for CCAE, which was then trained with a 0.001 learning rate. Then the outputs from the trained CCAE were stacked and fed to the back-end network. The fusion compression and fusion validation networks were then trained with a 0.0001 learning rate. Finally, with best classification performance as the objective, GWO was used iteratively to optimize network parameters and coupling weights.
The training data for Experiments I and II were then fed into the network. Figure 5 illustrates coupling loss, training loss, and training accuracy. Evidently, the network can accurately perform inferencing on the training data.
The returned coupling weights were then applied to DCCAE and the fusion data fed into Pytorch.Axes3D for fusion visualization, as illustrated in Figure 6. It is apparent that the fusion network can effectively fuse the features of the three categorical samples in Experiments I and II.
Finally, the test data of Experiments I and II were applied to the network. Figure 7 presents the resulting confusion matrix, and Figure 8 presents the resulting ROC curve. Clearly, the ADCCAE fusion model not only can discriminate outer-ring pitting and indentation damage, but also can identify the same damage type in the inner or outer ring. In addition, it can be seen from the ROC curve and AUC that the model has better precision and recall than previous models.
4.3. Comparison and Discussion
The test data sets from Experiments I and II were fed into the CAE classifier and
Figure 5. Visualization of the training process: (I) pitting and indentation of the bearing outer ring; (II) pitting damage of the inner or outer rings.
Figure 6. Visualization of the fused data of the test samples.
Figure 7. Confusion matrix of 3 health conditions.
Figure 8. ROC curve of 3 health conditions.
the ADCCAE model. Table 3 and Table 4 present the test results. It can be seen that the characteristic quality of vibration data is obviously better than that of current data, and the performance of the fusion model is also better than that of the single model.
To validate the GWO fusion model, a PSO-ADCCAE optimization model was designed and compared to GWO-ADCCAE, as shown in Figure 9. It is evident that GWO-ADCCAE converges more efficiently to the global optimum.
Figure 10 presents a comparison of all methods investigated in this study. It is clear that the ADCCAE fusion model outperformed traditional multi-mode fusion models on data from Experiments I and II because [43] [44] :
(1) traditional methods did not consider correlated information among the data types;
(2) when an independently extracted feature did not fall into the corresponding feature space, the classifier found it difficult to draw a reliable decision boundary.
Therefore, the data fusion strategy proposed in this study is of great importance for analyzing multi-mode data, and the ADCCAE fusion model is a good choice for multi-mode data fusion.
5. Conclusion
This study highlights the challenges in multi-modal data fusion and the significant
Table 3. Pitting and indentation of the bearing outer ring (Experiment I).
Table 4. Pitting damage of the inner or outer rings (Experiment II).
Figure 9. A comparison of two optimization algorithms.
Figure 10. Comparison of accuracy of different methods.
variations in feature quality, and introduces a novel ADCCAE fusion model to address these issues. The model employs a deep learning strategy that involves three steps: initially, it utilizes a coupling convolutional auto-encoder to extract individual and compound features from various sensor data. The extracted features are then fused and compressed for classification evaluation; finally, the coupling weights and network parameters are adaptively optimized with the GWO algorithm. The experimental results demonstrate that this method can more accurately determine the health status of motor bearings compared to earlier approaches, demonstrating its superior performance in multi-modal data fusion. Our future research focus will be on exploring data fusion approaches for multi-source heterogeneous data and investigating fusion strategies with broader applicability.
Funding Statement
This work was supported by the Research Foundation of Education Bureau of Hunan Province, China (20A162).