Paper Menu >>
Journal Menu >>
Int. J. Communications, Network and System Sciences, 2009, 2, 742-745
doi:10.4236/ijcns.2009.28085 blished Online November 2009 (http://www.SciRP.org/journal/ijcns/).
Copyright © 2009 SciRes. IJCNS
A Perceptual Approach to Reduce Musical Noise
Using Critical Bands Tonality Coefficients and
Ch. V. Rama Rao1, M. B. Rama Murthy2, K. Srinivasa Rao3
1Department of ECE, Gudlavalleru Engineering College, Gudlavalleru, India
2Jayaprakash Narayan College of Engineering, Dharmapur, Mahabubnagar, India
3TRR College of Engineering, Pathancheru, India
Received August 18, 2009; revised September 27, 2009; accepted October 19, 2009
Traditional noise reduction techniques have the drawback of generating an annoying musical noise. A new
scheme for speech enhancement in high noise environment is developed by considering human auditory sys-
tem masking characteristics. The new scheme considers the masking threshold of both noisy speech and the
denoised one, to detect musical noise components. To make them inaudible, they are set under the noise
masking threshold. The improved signal is subjected to extensive subjective and objective tests. It is ob-
served that the musical noise is appreciably reduced even at very low signal to noise ratios.
Keywords: Noise Reduction, Musical Noise, Masking Threshold
In many speech communication systems, enhancing the
corrupted speech is a challenging task especially at high
noise level. A large number of noise reduction tech-
niques have been proposed in the past. They are based on
spectral subtraction  and Wiener filtering  tech-
niques. The main drawback of these methods is the ap-
pearance of an annoying residual noise, often referred to
as musical noise. Later techniques developed rely on
psychoacoustical considerations. Mainly they exploit the
masking properties of the human auditory system. For
example according to the enhancement scheme proposed
in , only audible noise components are estimated and
suppressed. Other approaches introduce a perceptual mo-
dification on traditional denoising systems [4,5].
In the present paper a new speech enhancement tech-
nique is developed for reducing the musical noise. In this
work, the auditory masking threshold is estimated for
musical noise detection and reduction. Musical noise is
detected based on fact that musical noise components
present in the enhanced signal lie above the noise mask-
ing threshold. On the other hand, the frequency compo-
nents of noisy speech lie below the noise masking
threshold. Hence, by using some comparison rules musi-
cal noise is detected. The detected musical noise com-
ponents are set under the noise masking threshold and
their closet neighbours are smoothed resulting in musical
2. Basic Speech Enhancement System
Let the corrupted speech signal be represented as
() ()()yn sn dn
where is the clean speech signal and is the
noise signal. The processing is done on a frame-by-frame
basis. The Short Time Fourier transform (STFT) is used
and the previous model is re-written as
)( ns ()dn
where m indicates the frame index and f is the frequency
index. The denoised speech short time magnitude
(,)Smf is obtained using a spectral denoising ap-
proach. In this paper, modified Wiener filter  is used
to denoise the speech signal. The denoised speech is ob-
tained as follows
C. V. R. RAO ET AL. 743
),().,(),( fmYfmWfmS (3)
where is the modified Wiener filter gain ,
obtained by including the cross correlation between clean
speech signal and noise signal. is given by
.( ,)( ,)
(,) .(, )(, )
mf ED mf
Wmf Ymf Dmf
mf ED mf
where is apriori
signal to noise ratio (SNR).
is calculated ac-
cording to the decision directed approach reported in .
is the cross correlation coefficient for estimating the
correlation between the noisy speech and noise signal in
a frame . The modified wiener filter gain function is
not only controlled by ),( fm
as for conventional
Wiener filter but also by
is zero noise and
clean speech signals are uncorrelated and is
reduced to conventional Wiener filter gain function. The
proposed approach consists on reducing musical noise
existing in denoised speech signal spectrum denoted by
ˆfms . The temporal domain enhanced speech is
obtained with the following relationship
3. Proposed Enhancement Technique
The proposed enhancement technique consists of differ-
ent steps described below.
Modified Wiener filter gain function is applied to
get denoised speech.
The noise masking threshold NMT is calculated
for both noisy speech and denoised one.
A musical noise detector is used. For each fre-
quency, it gives a Boolean flag M which indicates the
presence or absence of musical noise.
The musical noise is reduced when present.
3.1. Musical Noise Detection
In order to detect musical noise in denoised speech, per-
ceptual properties of human auditory system are used.
There are two steps in detecting musical noise: calcula-
tion of noise masking threshold, detection of tonal com-
ponents in both noisy speech and denoised speech.
3.1.1. Noise Masking Threshold Calculation
The NMT is obtained through modelling the frequency
selectivity of the human ear and its masking property. By
using masking threshold we distinguish “tone masking
noise” and “noise masking tone”. In our context of mu-
sical noise detection, we consider only the situation of
“noise masking tone”. In fact, the musical noise is a tone
signal which is audible during noise components. The
NMT is calculated according to principle explained in .
3.1.2. Tonal Components Detection
Tonal and non tonal components are identified because
their masking models are different. The power spectrum
and noise masking threshold of both noisy speech and
denoised speech are calculated. Components above noise
masking threshold in noisy speech are treated as tonal
and belong to speech components. Components above
noise masking threshold in denoised speech are marked
as tonal and belong to either speech components or mu-
sical noise components. Hence, musical noise compo-
nents can be detected and they are the marked tonal
components appearing in denoised speech and not ap-
pearing in noisy speech. Figure 1 shows locations of
Sound Pressure Level (dB)
0200300 400 500 600 700 800
Sound Pressure Level (dB)
0200 300400 500 600 700800
Figure 1. Location of tones in (a) noisy speech (b) denoised
opyright © 2009 SciRes. IJCNS
C. V. R. RAO ET AL.
tones in noisy speech and denoised speech. In this work
to identify the musical noise tonality coefficient is used.
The tonality coefficient is computed for each criti-
cal band of denoised speech and for the noisy
speech. Musical noise appears in any critical band if
is greater than . It becomes audible if the dif-
ference is greater than a certain prede-
termined threshold . The threshold of the
band depends on critical band order and masking proper-
ties of human ear. We are interested in the audibility of
tones in the presence of narrow-band noise. A nar-
row-band noise having 1 bark bandwidth can mask a
tone within the same critical band if intensity is below
the noise masking threshold where the is
calculated as follows [5,8]
NMT E (6)
3.1.3. The Experimental Determination of
The experimental procedure to determine is as fol-
A white Gaussian noise is considered and power spec-
trum of each frame is subdivided in critical bands. For
each critical band, its energy and its tonality coeffi-
cient are computed. For the critical band, the
power of an additive audible tone which is equal to
the noise masking threshold is computed. A
sinusoid of the power is injected in the center of the
critical band and tonality coefficient is com-
puted. The difference represents the threshold
over which an additive tone becomes audible in the
presence of narrow-band noise. Experimentally it is ob-
served that is quite constant for all critical bands and
is about =0.06. Hence in present work equal to
0.06 is used. Finally, a Boolean flag M, indicating musi-
cal noise presence in any critical band is computed using
3.2. Musical Noise Reduction
Musical noise reduction is to remove only the parts re-
sponsible of the musical noise character by shifting down
the power spectrum of detected musical components un-
der the denoised speech noise masking threshold. In this
work correction term, is used to shift down suffi-
ciently the power spectrum. The estimated power spec-
trum of corrected speech is written as
Sm fSmf otherwise
where the correction term is chosen according to
subjective listening tests. Values of for speech
and pause frames as given by Sofia Ben Jebara  indi-
cated in Table 1 are used in the present work. It is ob-
served that the attenuation is small for low frequency and
is considerable for high frequency components. During
pause, it is constant since distortion and musical compo-
nent appear in the same way in all frequency bands.
Table1. Correction constants for musical noise reduction.
[0,1] [1,2] [2,3] [3,4]
SpeechC(f) 0.5 2 5 10
Pause C(f) 10 10 10 10
Output SNR (dB)
Modifed Wiener filtering
-5 5 10 15 20
Figure 2. Output segmental SNR values.
Figure 3. Output MBSD values
Copyright © 2009 SciRes. IJCNS
C. V. R. RAO ET AL.
Copyright © 2009 SciRes. IJCNS
musical noise imposed by a modified Wiener filtering is
proposed. The masking characteristics of the human ear
are used to detect and to reduce musical noise. Simula-
tion results show that this scheme provides better results
in terms of temporal, spectral and perceptual criteria.
 S. F. Boll, “Suppression of acoustic noise in speech using
spectral subtraction,” IEEE Transaction Acoustics,
Speech and Signal Processing, Vol. ASSP-27, No. 2, pp.
113–120, April 1979.
Figure 4. Spectrograms of (a). Noisy speech (b). Denoised
speech (c). Enhanced speech by proposed scheme.
 Y. Ephraim and D. Mallah, “Spectral enhancement using
optimal non-linear spectral amplitude estimation,” on
Proceedings of International Conference on Acoustics,
Speech and Signal Processing (ICASSP), IEEE, pp.
4. Results and Discussions
 A. Akbari-Azrani, R. Le bouquin Jannes, and G. Faucon,
“Optimizing speech enhancement by exploiting masking
properties of the human ear,” on Proceedings Interna-
tional Conferences on Acoustic, Speech and Signal Proc-
essing ICASSP, IEEE, pp. 800–803, 1995.
The proposed technique is evaluated using temporal,
spectral and perceptual criteria. Segmental signal to
noise ratio (SNRSEG) is used as quantitative temporal
criteria. For spectral criteria, spectrograms are used and
the Modified Bark spectral Distance (MBSD) is used as
perceptual criteria .
 N. Virag, “Single channel speech enhancement based on
masking properties of human auditory system,” IEEE
Transactions on Speech and Audio Processing, Vol. 7, No.
2, pp. 126–137, February 1999.
In our simulations, recorded speech samples are used
and corrupted with white Gaussian noise and simulations
are performed on MATLAB platform. Figure 2 and Fig-
ure 3 shows the comparison of performance results of the
classical Wiener filtering, modified Wiener filtering and
the proposed scheme for different values of signal to
noise ratio in terms of SNRSEG and MBSD values respec-
tively. Figure 4 shows spectrogram plots.
 K. A. Sheela, CH. V. R. Rao, K. S. Prasad, and A. V. N.
Tilak, “A new noise reduction pre-processor for mobile
voice communication using perceptually weighted spec-
tral subtraction method,” 3rd International Conferences on
Mobile Ubiquitous and Pervasive Computing, VIT Uni-
versity, 16-19 December 2006.
 CH. V. R. Rao, M. B. R. Murthy, and K. S. Rao, “Speech
enhancement using modified Wiener filter,” National
Conference on Futuristic Advancements in Computing &
Electronics, Deccan College of Engineering & Technol-
ogy, 19-21 March, 2009.
Interpretations from the Figures 2, 3 and 4 are as fol-
The proposed scheme leads to better performance in
terms of quality and intelligibility speech signal for all
criteria and also it is well noticeable for spectral and
perceptual criteria which have good correlation with lis-
 Y. Ephraim and D. Mallah, “Speech enhancement using a
minimum mean square error short-time spectral ampli-
tude estimator,” IEEE Transaction on Speech Audio
Processing, Vol. ASSP-32, pp. 1109–1121, 1984.
Spectrograms are considered in Figure 4. The noisy
speech signal is a speech corruptedby a white Gaussian
noise whose SNR=10 dB. The denoised speech signal by
a modified Wiener filtering is affected by a musical noise
(isolated points randomly distributed in time and fre-
quency). The amount of such noise is reduced by the
 J. D. Johnston, “Transform coding of audio signal using
perceptual noise criteria,” IEEE, Journal on Selected Ar-
eas of Communication, Vol. 6, pp. 314–323, 1988.
 S. B. Jebara, “A perceptual approach to reduce musical
noise phenomenon with wiener denoising technique,”
proceedings of IEEE International Conference on Acous-
tics, Speech and Signal Processing, ICASSP, 2006.
 W. Yan, M. Dixon, and R. Yantorno, “A modified bark
spectral distortion measure which uses noise masking
threshold,” on Proceedings of the Speech Coding Work-
shop IEEE, pp. 55–56, 1997.
In this work, a new enhancement scheme for reducing