Bearing Defect Detection Based on Acoustic Feature Extraction and Statistical Learning

Bearings are widely utilized as key components in industrial scenarios. Therefore, the automatic and precise inspection of bearing defects is imperative for the manufacturing of the bearing. In this paper, a novel defect detection method based on acoustics is proposed to further improve both the accuracy and the efficiency of the defection process. We firstly constructed a labeled dataset composed of acoustic signals sampling from different bearings with a certain rotational speed. OpenSMILE is adopted to extract the acoustic features and the target acoustic feature dataset with 6373 features is formed. To further improve the efficiency of the proposed method, a feature selection strategy based on the chi-square test is adopted to eliminate the most inefficient features. Several statistical learning models are constructed and trained as the classifier. Eventually, the performance of classifiers is evaluated and achieves relatively high accuracy and efficiency with an extremely imbalanced dataset.


Qualified, Inner Ring Defect (IRD) and Ultra Precision Defect(UPD). Samples
with Inner Ring Defect and Ultra Precision Defect should be recalled and destructed. The purpose of our work is to propose an algorithm that can precisely classify the samples and reduce defective rate. In this scenario, capability of recalling the defective products is the main descriptor of model performance.
In former research, the methods of detection of bearing defects were usually developed in perspective of time-domain signal, deep learning [1] and computer vision [2]. The deep learning model with high complexity may lead to massive amount of parameter, which will reduce the efficiency of training and classification. In our model, instead of using a complicated convolutional or recursive deep neural network, we constructed classifiers based on traditional machine learning methods. The model was trained with a dataset composed of filtered acoustic features. The acoustic feature dataset and statistical learning models brought many unique characteristics to our model.
We used openSMILE to extract acoustic features of original time domain signal, and built a new dataset based on the extracted features. With a certain vibration sequence, openSMILE can output the corresponding feature vector of the sample. Based on the extracted feature vectors, we constructed a new dataset. The acoustic feature dataset allowed to avoid using deep learning methods, such as one-dimensional Convolutional Neural Network to categorize.
Then, we performed feature selection to the acoustic feature dataset to further improve the performance of our model. Due to the restrained size of our dataset and the mathematical characteristics of most machine learning algorithms, an extremely high dimensional dataset will result in overfitting and unacceptable computational cost. To improve the accuracy and efficiency of the classifier, we use MATLAB and the built-in statistical algorithms to select features.
Finally, we applied statistical learning methods as the classifiers. In the industrial scenes, the prevalent deep learning models are unacceptably time-consuming in both training and predicting stages. To construct a model with balanced performance, we chose some of the statistical learning methods to improve the overall efficiency of the model. We also evaluated and compared the performance of the different classifiers.
The article is organized as follows: In Section 2 we introduced the basic structure and distribution of the original dataset. Section 3 presented the construction of ComParE 2016 Acoustic Feature Set and the procedure of feature extraction by openSMILE toolkit. In Section 4 we described feature selection with chi-square test. In Section 5, we trained the statistical learning models, listed related statistics, compared and analyzed their performances. Section 6 is the conclusion of the paper.

Dataset Construction
According to the experience of industrial production, defective bearings make a different sound from normal bearings when rotating. In our work, we built the bearing defect detection model based on sound signals.

D. Wu et al. Journal of Applied Mathematics and Physics
The original dataset is composed of sound signals of different bearings rotating at a certain rotational speed of 1800 rpm. The sound signals are presented as numerical sequences; the value of elements in the sequences reflected the amplitude of sound signal when rotating.
The samples of sound signals were manually labelled and categorized into 3 groups: Qualified, Inner Ring Defection (IRD), and Ultra Precision Defection (UPD). The total number of the samples in the dataset is 708, which indicated that the dataset is relatively small scaled. In the original labeled dataset, the distribution of different samples is shown in Table 1. According to Table 1, the dataset is relatively imbalanced, this may lead to models low capacity of generalization, hence accuracy will not be the only evaluation standard of overall performance.

Feature Extraction
Traditionally, the classification problems about time domain signals are usually solved by deep learning methods. By using one-dimensional Convolutional Neural Networks (CNN), the model can automatically extract the features [3]. Recurrent Neural Networks (RNN) and their derivatives are another widely used category of Artificial Neural Networks in time-related problems. RNN can build the relationship between current output and previous hidden state, and one of the derivatives, Long Short-Term Memory model introduced gating system to solve the problem of gradient exploding and vanishing [4]. However, despite the outperforming results in accuracy of the deep learning models, their time cost of training and predicting is unacceptable in industrial bearing production. To apply our model into the scene of industrial production, the balance of accuracy and efficiency is required. Thus, we considered using statistical learning methods. However, traditional statistical learning methods usually performs badly on sequential problems, so we performed feature extraction by using openSMILE to extract acoustic features of the time domain signals.

Feature Set
Effective feature extraction can output crucial acoustic features for bearing defect detection, which can directly improve the performance of prediction system. However, there are few studies about feature extraction of bearing defect detection. In our work, we adopted experience on emotion recognition to our model.
In emotion recognition and natural language processing tasks, to process more complex sound signals, a more detailed feature set is need. We utilized the ComParE Journal of Applied Mathematics and Physics 2016 Acoustic Feature Set in feature extraction. The ComParE feature set contains 6373 features resulting from the computation of various functionals over low-level descriptor (LLD) contours [5]. The feature set has been proved effective in emotion recognition [6], thus it is very likely to be capable of defect detection.

Extraction with OpenSMILE
OpenSMILE is an open-source audio feature extractor. It can process the original vibration sequences and directly output the corresponding numerical values and name of the acoustic features [7]. With different configuration files, openS-MILE can extract different features. The overall steps of feature extraction are described in Figure 1.
In our architecture, we used one of the default configuration files provided by the developers of openSMILE. We extracted 6373 acoustic features in total.The features can be divided into 65 categories by corresponding LLDs [8]. The LLDs are shown in Table 2. We performed feature selection with MATLAB and its built-in function fsc-chi2. The fscchi2 function automatically compute the p-values, then output the vector of predictor scores.The predictor score is −log(p), therefore, a larger predictor score indicates the corresponding predictor is more label-dependent and contains more information. We selected 600 features with largest predictor scores and built the final dataset. The features with the largest predictor scores are shown in Table 3.
The most label-dependent attributes are mainly related to spectral LLDs, such as spectral variance and spectral slope. This is reasonable for the sound signals of bearing rotation are generally periodic and simple signals when compared to human emotional signals.

Classification Algorithms
The training of machine learning models is implemented on WEKA platform, which integrated various statistical learning methods [9]. We selected several Journal of Applied Mathematics and Physics Discrete AdaBoost, also known as AdaBoost M1, it is another widely used boosting algorithm [11]. To investigate the effect of Decision Tree on bearing defect detection, the C4.5 Decision Tree Algorithm is also included to comparison [12].

Performance Analysis
Due to the limited scale of the dataset, we implemented the evaluation of models with 10-flod cross validation. Since the dataset is extremely imbalanced, and the final purpose is to detect the defective samples, to evaluate the overall performance, in addition to accuracy, we also made use of other information such as AUC (area under ROC curve), recall and confusion matrices.
The accuracies, AUC and recalls are presented in Table 4. The simplest classifier, logistic regression, performs well in both Qualified and IRD categories, with an overall accuracy of 89.22%. However, even though its recall of the UP category is 0.364, which is relatively good, the corresponding AUC is as low as 0.545. According to knowledge about machine learning, a low AUC value, which is close to 0.5, indicates that the classifier tends to classify the samples randomly,  This may explain their good performances on rare categories in an extremely imbalanced scenario.
To compare the efficiency of machine learning model and evaluate the improvement brought by feature selection, we trained the model with original acoustic feature dataset and outputted relative statistics. The time cost data of model training with dataset before and after feature selection is shown in Table   5. High training time cost may result in difficulty in updating and maintaining the model, which will greatly reduce the efficiency. However, after feature selection, the time costs were greatly lowered, made the models more practical in production.

Conclusions
The classification of three categories of bearing conditions: Qualified, Inner Ring Defect, Ultra Precision Defect, was investigated with integrated statistical learning methods. We adopted ComParE 2016 Acoustic Feature Set in our model, which has been proved to be effective in various fields including emotion and language recognition. The ComParE 2016 Acoustic Feature Set configuration file was integrated into openSMILE, by using the file and openSMILE toolkit, we extracted 6373 acoustic features related to 65 LLDs. Then we performed feature selection with a chi-square test and selected 600 of most label-dependent attributes.
The important attributes are mainly related to spectral LLDs. In the WEKA platform, we trained several statistical learning classifiers. The best accuracy of 91.46% was obtained by LogitBoost model. The model also performed relatively well in minor categories. In IRD category, the model acquired a recall of 0.592.
The AUC of the model is also relatively high, with a weighted average value of 0.930. We also compared the time cost of models trained with 6373 and 600 features and concluded that feature selection greatly improved the efficiency.
Since the dataset is small-scaled and unevenly distributed, the overall perfor-D. Wu et al.