Speech Enhancement Using Cross-Correlation Compensated Multi-Band Wiener Filter Combined with Harmonic Regeneration

The speech signal in general is corrupted by noise and the noise signal does not affect the speech signal uniformly over the entire spectrum. An improved Wiener filtering method is proposed in this paper for reducing background noise from speech signal in colored noise environments. In view of nonlinear variation of human ear sensibility in frequency spectrum, nonlinear multi-band Bark scale frequency spacing approach is used. The cross-correlation between the speech and noise signal is considered in the proposed method to reduce colored noise. To overcome harmonic distortion introduced in enhanced speech, in the proposed method regenerate the suppressed harmonics are regenerated. Objective and subjective tests were carried out to demonstrate improvement in the perceptual quality of speeches by the proposed technique.


Introduction
In many speech communication systems, recognition of speech signal from a corrupted speech signal with background noise is a challenging task especially at low SNR (signal to noise ratio) values.Speech quality and intelligibility might significantly deteriorate in the presence of background noise, especially when the speech signal is subjected to In many speech communication systems, background noise in corrupted speech is a challenging task especially at low SNR (signal noise ratio) values.Speech quality and intelligibility might significantly deteriorate in the presence of background noise, especially when the speech signal is subject to subsequent processing, such as automatic speech recognition and speech coding.Due to use of automatic speech processing systems in a variety of real world applications, speech enhancement has become an important topic of research.Several speech enhancement systems are available in the literature [1][2][3][4].The enhancement of noise corrupted speech signal can be done using the Wiener filtering technique [5,6], spectral subtraction method [7] or Kalman filtering technique.The power spectral subtraction and the Wiener filtering algorithms are widely used be-cause of their low computational complexity and impressive performance.
In general, in these algorithms the enhanced speech spectrum is obtained by subtracting an estimated noise spectrum from noisy speech spectrum or by multiplying the noisy spectrum with a gain function.Let the noisy speech, clean speech and noise signals are denoted by   y n ,   x n and   d n respectively in time domain.
If it is assumed that noise is additive, then   y n can be expressed as: applying the Fast Fourier transform (FFT) to (1),at the frame and frequency bin, where   The phase of the noisy speech is kept unchanged since it is assumed that the phase distortion is not perceived by the human ear.It is well-known the frequency resolution of human's hearing is non-uniform and usually described by critical bands or bark scale.The real-world noise does not affect the speech signal uniformly over the whole spectrum therefore; multiplying with a constant factor of noise spectrum over the whole range may remove speech also.
A new multi-band approach to the Wiener filter method that reduces colour noise is developed.The method uses a different weighting factor for each frequency sub-band.The factor includes cross-correlation components between clean speech and noise signal also.Enhanced speech quality can be improved in perceptual sense using non-linear Bark-scaled frequency spacing based on the fact that human ear sensibility varies nonlinearly in frequency spectrum.
In most spoken languages, voiced sounds represent a large amount (around 80%) of the pronounced sounds.In the classic short-time suppression techniques some harmonics are considered as noise only components and are consequently suppressed by the noise reduction process.This is one major limitation of those methods.To overcome this limitation, a method, called regeneration of suppressed harmonics that takes into account the harmonic characteristic of speech, is proposed.In this approach, the output signal of classic noise reduction technique is further processed to create an artificial signal where in the missing harmonics are automatically regenerated.This artificial signal is used to refine the apriori SNR used to compute a spectral gain.

Multi-Band Wiener Filter
In real environments, noise spectrum is not uniform for all the frequencies.For example, in the case of engine noise the most of noise energy is concentrated in low frequency.The human ear sensibility varies nonlinear in frequency spectrum.The principle of psychoacoustics [8,9] suggests that a spectral gain may be shared among adjacent high frequency components.A commonly used scale for signifying the critical bands is the Bark scale that divides the audible frequency range of 16 KHz into 24 abutting bands.Figure 1 illustrates the relationship between the frequency in hertz and the critical band rate in Bark.An approximate analytical expression to describe the conversion from linear frequency f, into the critical band number b (in Bark) is: 13 arctan 0.76 3.5 arctan 7.5 In the frequency range from 0-8 KHz, there are 18 critical bands.Therefore the spectral Wiener filter was modified for a critical band analysis to obtain the power spectral density on a Bark scale k: where i is the critical band number, K = 18 is the total number of critical bands and is the frequency index depending on the lower and upper frequency boundary of the critical band i.
The sub-band Wiener filter is derived according to the minimum mean square error (MMSE) criterion between the ideal and estimated sub-band speech signals in each of the sub-band.Its cost function in one sub-band is defined as where i  represents the expectation operator.

 
ˆi S k and   i S k denote the estimated and ideal sub-band speech signals in the i th sub-band respectively.
In each sub-band, the noise suppression is performed by multiplying the Wiener filter gain to the sub-band noisy speech as: By substituting (6) in (5) and simplifying (5) we will get i  as

Conventional Wiener Filter
In conventional Wiener filter assumed that are zero mean and uncorrelated in each sub-band and ( 7) can be simplified to be By setting the differentiation of (8) w.r.t weighting factor i to zero and the weighting factor can be derived to be of calculating the cross term between and where By considering the crosscorrelation between

Crosscorrelation Compensated Wiener Filter
The autocorrelation sequences of one frame of a clean speech, together with the background and noisy version of the same speech signal are shown in Figure 2. The autocorrelation sequence of noisy speech signal is not exactly equal to the sum of the autocorrelations of the noise and clean speech signals.This indicates the existence of the crosscorrelation between clean speech signal and noise signal [10].
where  is the crosscorrelation coefficient [9] for estimating the correlation between noisy speech signal and noise in a sub-band.By substituting ( 11) and ( 12) in (10), filter gain can be obtained as Therefore, we cannot neglect the crosscorrelation between and .Then by differentiating (7) w.r.t and equating to zero and simplifying, we get 10) where

Regeneration of Suppressed Harmonics
The output signal or ŝ t in time domain, obtained by the multiband Wiener filter presented in the previous section still suffers from distortions.This is inherent to the estimation errors introduced by the noise spectrum estimation since it is very difficult to get reliable instantaneous estimates in single channel noise reduction techniques.Since 80% of the pronounced sounds are voiced in average, the distortions generally turnout to be harmonic distortion.Indeed, some harmonics are considered as noise only components and are suppressed.For that reason, we propose to process the distorted signal to create a fully harmonic signal where all the missing harmonics are regenerated.This signal will then be used to compute a spectral gain able to preserve the speech harmonics.This will be called the speech harmonic regeneration step and can be used to improve the results of any noise reduction technique and not only the multiband Wiener filter.
A simple and efficient way to restore speech harmonics consists of applying a nonlinear function NL (e.g., absolute value, minimum or maximum relative to threshold, etc.) to the time signal enhanced in a first procedure with a classic noise reduction technique.Then, the artificially restored signal is obtained by In this work, half wave rectification is used as a nonlinear function and applied to the signal.As a consequence, this signal cannot be used directly as clean speech estimation.Nevertheless, it contains very useful information that can be exploited to refine the apriori SNR.
The parameter is used to control the mixing

Results and Discussion
To evaluate and compare the performance of the proposed method, simulations are carried out with the NO-IZEUS [14], a database widely used in testing speech enhancement algorithms.The noisy database contains 30 IEEE sentences (produced by three male and three female speakers) corrupted by eight different real-world noises at different SNRs.Speech signals were degraded with seven types of noise at global SNR levels of 0 dB, 5 dB, 10 dB and 15 dB.The noises were airport, car, babble, train and street noises.The objective quality measures used for the evaluation of the proposed method are the segmental SNR and noise reduction (NR) values.
It is well known that the segmental SNR is more accurate in indicating the speech distortion than the overall SNR.The higher value of the segmental SNR and NR values indicates the weaker speech distortions and better perceived quality of the processed speech signal [15].The performance of the proposed method is compared with Wiener filter and multi-band Wiener filter.
Table 1 shows the segmental SNR improvement with segment size equal to 256 for various noise levels.The performance of the proposed method almost outperforms that of the Wiener filter and multi-band Wiener filter.
Table 2 demonstrates the comparison of NR values.It reveals that the proposed method benefits low speech distortion and retains the residual noise at an acceptable level.The timing waveforms of the enhanced speech are demonstrated in Figure 3. Clean speech signal is corrupted by airport noise at 0 dB SNR.It shows that proposed method can efficiently remove the background noise.
Figure 4 shows the comparison of spectrograms.The background noise can be efficiently removed by the proposed method.It is evident from listening tests that the proposed method efficiently reduces the background noise with less speech distortion.

Conclusions
This paper presents an improved Wiener filtering method that takes into account the non-uniform effect of colored noise on the spectrum of speech.Proposed method includes the cross correlation terms between the clean speech and noise.Multi-band Wiener filtering method reduces residual musical tones that appear in enhanced speech for Wiener filtering.A noise reduction technique based on the principle of harmonic regeneration is also proposed.Classic techniques, including the Multi-band wiener, suffer from harmonic distortions when the SNR is low.This is mainly due to estimation errors introduced by the noise PSD estimator.To solve this problem, nonlinearity is used to regenerate the degraded harmonics of the distorted signal in efficient way.
The resulting artificial signal is used to refine the apriori SNR which is then used to compute a spectral gain that preserves speech harmonics, and hence avoids distortions.Results are given in terms of segmental SNR speech, clean speech and noise signals FFT coefficients.An estimate of the clean speech component denoted as  , D m k   , X m k can be obtained by multiplying with filter gain function   k , W m as given in (3)

2 
represent the variance of the sub-band clean speech, noise and noisy speech in the i th sub-band respectively.
we have access only corrupt signal SNR and the aposteriori SNR in the i th sub-band respectively.

Figure 2 .
Figure 2. Autocorrelation sequences of clean speech, noisy speech, noise and sum of the clean speech and noise signals.
The properties of this parameter are:  when the estimation of   ˆi S k provided by the multiband Wiener filter algorithm is reliable, the harmonic regeneration process is not needed and   k  should be equal to 1.  when the estimation of   ˆi S k provided by the multiband Wiener filter algorithm is unreliable, the harmonic regeneration process is required to cor-rect the estimation and   k  should be equal to 0 (or any other constant value depending on the chosen nonlinear function).The   k parameter can be chosen constant to realize a compromise between the two estimators properties.And the apriori SNR is refined which is used to compute a new spectral gain[11][12][13]

Figure 3 .
Figure 3. Timing waveforms of (a) the clean speech (b) noisy speech corrupted and the enhanced speech using (c) Wiener filter (d) Multi-band Wiener filter and (e) the proposed method.

Figure 4 .
Figure 4. Spectrograms of (a) the clean speech (b) noisy speech corrupted and the enhanced speech using (c) Wiener filter (d) Multi-band Wiener filter and (e) the proposed method.

Table 1 . Segmental SNR in the enhanced speech in various noise environments.
Wiener Filter Combined with Harmonic Regeneration