A Noise Suppression Method for Speech Signal by Jointly Using Bayesian Estimation and Fuzzy Theory

Speech recognition systems have been applied to inspection and maintenance operations in industrial factories to recording and reporting routines at construction sites, etc. where hand-writing is difficult. In these actual circumstances, some countermeasure methods for surrounding noise are indispensable. In this study, a new method to remove the noise for actual speech signal was proposed by using Bayesian estimation with the aid of bone-conducted speech and fuzzy theory. More specifically, by introducing Bayes’ theorem based on the observation of air-conducted speech contaminated by surrounding background noise, a new type of algorithm for noise removal was theoretically derived. In the proposed noise suppression method, bone-conducted speech signal with the reduced high-frequency components was regarded as fuzzy observation data, and a stochastic model for the bone-conducted speech was derived by applying the probability measure of fuzzy events. The proposed method was applied to speech signals measured in real environment with low SNR, and better results were obtained than an algorithm based on observation of only air-conducted speech.


Introduction
Speech recognition systems have been applied to various fields, for example, to inspection and maintenance operations in industrial factories and at construction sites, etc. where hand-writing is difficult. For speech recognition in such actual circumstances, some suppression methods for surrounding noises are indispensable.
Previously reported methods for noise reduction in speech recognition can be classified into two categories. One is based on a single microphone [1] [2], and the other uses a microphone array [3]. Since the latter requires a priori information on the number of noise sources, and the number of microphones larger than that of the noise sources is needed in the case of multi-noise sources, this category demands large scale systems. Therefore, the former based on a single microphone is more advantageous than the latter [4] [5]. In such a noise suppression task for speech signals based on a single microphone, many algorithms applying the Kalman filter have been proposed up to now [6] [7] [8] [9]. However, the Kalman filter is originally based on the assumption of Gaussian white noise [10]. The actual noises show complex fluctuation forms with non-Gaussian and non-white properties.
From the above viewpoint, in our previously reported study, a noise suppression algorithm for the actual speech signals without requirement of the assumption of Gaussian white noise has been proposed [11]. The method can be applied to actual complex situation where both the noise statistics and the fluctuation forms of speech signal are unknown. By applying the algorithm to real speech signals with several kinds of noises, its effectiveness has been experimentally confirmed in comparison with the Kalman filter.
Furthermore, signal processing methods to remove the noise for actual speech signals have been proposed by jointly using the measured data of bone-and air-conducted speech signals [12] [13]. However, the algorithms of the previous methods were introduced a simple additive model of the original speech signal and surrounding noise for the air-conducted speech observation. Furthermore, the derived algorithms have applied to only the signals mixed with noises on computer, and not to signals in real environment under existence of noises.
In this study, a new noise suppression method for speech signals is proposed by using Bayes theorem after employing a posterior distribution based on the air-conducted speech observation contaminated by surrounding noise. In the proposed algorithm, in order to improve the accuracy of estimation of speech signal, an expansion expression of conditional probability density function reflecting all linear and non-linear correlation information between original speech signal and air-conducted speech observation is adopted as the model of the speech observation. Then, a probability distribution with parameters estimated from the bone-conducted speech is adopted as the prior distribution. Furthermore, the algorithm proposed in this study is applied to signals measured in real environment under existence of noises.
Though the bone-conducted speech signal is a kind of solid propagation sound with less effect by the surrounding noise, the high frequency components of the signal are reduced through the propagation process [14]. After considering the bone-conducted speech signal with the reduction of higher components as fuzzy data, applying the probability measure of fuzzy events [15], a new simplified noise suppression method is derived by reflecting the air-and bone-conducted speech signals.
The effectiveness of the proposed method is confirmed by applying it to boneand air-conducted speech measured in a real environment under the existence of surrounding noise.

Stochastic Model for Air-and Bone-Conducted Speech Signals by Introducing Fuzzy Theory
In the actual environment with a surrounding noise, let k x , k y and k z be the original speech signal, the observations of air-and bone-conducted speech signals at a discrete time k. The observation k y is contaminated by a surrounding noise k v . In our previous studies, a simple additive model was considered for the air-conducted speech observation k y [12] [13]. In this study, in order to improve the accuracy of estimation of speech signal k x , an expansion expression of conditional probability density function ( ) | k k P y x [11] reflecting all linear and non-linear correlation information between k x and k y is adopted as the model of air-conducted speech observation.
where denotes the averaging operation on variables. As the probability density functions k x and k y showing non-Gaussian distribution, the following statistical orthonormal expansion series expressions are adopted.
The coefficients ( ) , , , , , , , , , the following simple dynamical model is introduced for the simultaneous estimation of the parameters with the specific signal k x : Next, in order to express the relationship between the original speech signal and bone-conducted speech, after regarding the bone-conducted speech as fuzzy data, the conditional probability distribution function ( ) | k k P x z can be obtained by applying the probability measure of fuzzy events [15] to (1), as follows.
where ( ) k y k m y is a membership function of the bone-conducted speech k z , and a Gaussian type function: where a and b are constants and ( ) 0 > α is a parameter, is adopted. Accordingly, by considering ( ) , k k P x y in Equation (1) and ( ) k P y in Equation (4), and the membership function in Equation (11), the numerator of Equation (10) can be expressed as follows: After considering the equality on Hermite polynomial: where ij d are expansion coefficients reflecting bone-conducted speech signal, and using the orthonormal condition: ( ) the integral in Equation (12) can be calculated. Thus, the following expression is Furthermore, through the similar calculation process, the denominator of Equation (10) can be derived as follows: Therefore, by substituting Equations (16) and (18) into Equation (10), the conditional probability distribution function ( ) | k k P x z can be expressed explicitly.

Derivation of Noise Suppression Algorithm Based on Bayesian Estimation
To derive an estimation algorithm for the speech signal k x , the Bayes' theorem for the conditional probability distribution [17] is first considered. Since the parameter a is also unknown, the conditional joint probability distribution of k x and k a is expressed as     22  22  20  20  21  10  20  2  2  2  2  11 11 22 22   Therefore, computation time of the proposed algorithm can be reduced than the previous one [12]. Furthermore, by considering Equation (9)

Application to Speech Signal in Real Environment
In order to confirm the actual usefulness of the proposed noise suppression algorithm, it was applied to speech signals in real noise environment. Though, in the previous studies [12] [13], the noisy air-conducted speeches were created on a computer by mixing the original air-conducted speech signal measured in a noise-free environment, the algorithm proposed in this study was applied to signals measured in real environment under existence of actual noises. For a female and a male speech signals digitized with sampling frequency of 10 kHz and quantization of 16 bits, we estimated the speech signal based on the observation corrupted by additive noise.
More specifically, air-conducted speeches were measured in real environment under existence of a white noise generated from a noise generator and an actual machine noise. The bone-conducted speech was simultaneously measured by use of an acceleration sensor with the air-conducted speech. By setting roughly the amplitude of the noises at two levels, the proposed algorithm was applied to extremely difficult situations with low SNR (noise-free air-conducted speech signal to noise ratio defined by ( ) shown in Figure 5 and Figure 6 for the female speech signal and in Figure 7 and Figure  The air-conducted female and male speech signals spoken by the same speakers in the different situation without any noises are shown in Figure 13 and

Conclusions
In this paper, after considering the bone-conducted speech signal with the reduction of higher components as fuzzy data, applying the probability measure of fuzzy events, a new noise suppression method is derived on the basis of Bayes' theorem as the fundamental principle of estimation. Furthermore, the proposed algorithm has been applied to real speech signals contaminated by noises measured in actual environment with low SNR. As a result, it has been revealed by experiments that better estimation results may be obtained by the proposed algorithm as compared with the method based on only air-conducted observations.
The proposed approach is quite different from the traditional standard techniques. However, we are still in an early stage of development, and a number of practical problems are yet to be investigated in the future. These include: 1) application to a diverse range of speech signals in actual noise environment, 2) extension to cases with multi-noise sources, and 3) finding an optimal number of expansion terms for the expansion-based probability expressions adopted.