Você está na página 1de 8

REAL-TIME BROADBAND NOISE REDUCTION

Robert Hoeldrich and Markus Lorber


Institute of Electronic Music Graz
Jakoministrasse 3-5, A-8010 Graz, Austria
email: robert.hoeldrich@mhsg.ac.at

Abstract
A real-time implementation of a broadband noise reduction method based on non-linear spectral subtraction is
proposed. To prevent processing distortions in form of musical noise, over-subtraction is applied to the degraded signal
spectrum. Furthermore, time-averaging is used to reduce the variance of the estimated signal-to-noise ratio (SNR). A
masking threshold obtained by spectral smoothing leads to further reduction of audible processing distortions.

1 Introduction
Old audio recordings (e.g. on tape, record or wax cylinder) are corrupted by different signal degradations. Digital
signal processing (DSP) is used for restoration, where each kind of degradation is processed independently. In this
paper the emphasis will be on broadband noise reduction.
Broadband noise is common to all forms of recording. Usually the noisy signals are processed in the frequency domain
using short-time spectral attenuation techniques, where components are attenuated according to their signal-to-noise
ratio (SNR). These techniques were firstly applied to speech signals within the area of speech enhancement [Lim,83],
befoe they were adopted for noise reduction in musical recordings [Vaseghi,92]. Although the restoration of speech
signals is closely related to the restoration of musical signals, the emphasis is on different criteria. While in speech
enhancement intelligibility is the main criteria, sound quality is the dominant parameter in restoration of musical
recordings [Cappe,95].
One well-known method for achieving the broadband noise reduction is based on spectral subtraction
([Boll,79],[Vaseghi,96]). The degraded signal x(n) is modeled as a pure audio signal s(n) and a superimposed
broadband noise d(n). The signal degradation is processed in the frequency domain using the short-time Fourier
transform. For each frequency bin k and each frame m, a specific amount of the noise spectrum |D(k)|b is subtracted
from the short-time spectrum |X(m,k)|b ( b=1 for magnitude subtraction, and b=2 for power subtraction) . The noise
spectrum |D(k)| has to be estimated from a noise only signal segment. For re-synthesis the denoised signal spectrum

|S(m,k)|
is combined with the original phase spectrum arg[X(m,k)] . The spectral subtraction method can be
modeled as time variant non-linear filter. Its transfer function depends on the signal-to-noise ratio SNR(m,k) which is
estimated from the short-time spectrum |X(m,k)| and the noise spectrum |D(k)| .
The noise variance within a single frame causes the SNR to be overestimated for some frequency bins. This results in
a residual noise consisting of short sinusoidal impulses whose frequencies vary from frame to frame. This phenomenon
is known as musical noise. The noise variance can be reduced by using a time-averaged signal spectrum instead of
|X(m,k)|

([Boll,79],[Vaseghi,96]). This reduces the musical noise but without completely eliminating it. Vaseghi

et al, in [Vaseghi,92] exploit the musical noise's behavior for its detection and partial elimination.

An alternative method to prevent this processing distortion in spectral subtraction is to apply over-subtraction
[Berouti,83] to the degraded signal spectrum. According to the estimated SNR, more than the average noise spectrum
has to be subtracted. This leads to a strong attenuation of small signal components. If the factor of over-subtraction is
high enough, the musical noise will be completely eliminated, but audible distortions in the audio signal can be
generated.
The noise suppression rule proposed by Ephraim and Mallah ([Ephraim,84],[Ephraim,85]) allows significant noise
reduction without causing musical noise. This is mainly due to using a non-linear time-averaged SNR [Cappe,94],
which exhibits a lower variance than a SNR estimated without averaging. Furthermore, the masking properties of the
human auditory system can be used to reduce processing distortions (e.g. [Tsoukalas,93]).
The proposed method combines spectral over-subtraction with several smoothing strategies in both in time and in
frequency domain to reduce the SNR variance. This leads to little audible processing distortions. Section 2 discusses
the methods used within the denoising scheme. In section 3, the new noise reduction filter is described.

2 Review of Denoising Filter Methods


2.1 Spectral Subtraction
Berouti et al., in [Berouti,83] present a variant of the spectral subtraction method for speech enhancement. An
overestimate of the noise magnitude or power spectrum is subtracted from the degraded signal spectrum. In each
short-time frame m, the amount of over-subtraction depends on the SNR of the incoming degraded signal. A
subtraction factor (m)1 is determined in order to reach good noise reduction and little processing distortions. If

the resulting denoised signal spectrum |S(m,k)|


falls below a minimum level, it is replaced by a scaled version of the
input spectrum |X(m,k)|

. The scaling parameter  determines the residual noise floor after re-synthesis. The

denoising algorithm is expressed through the following relationship:

|S(m,k)|
= ( |X(m,k)|b - (m) # |D(k)|b )1/b

and

|S(m,k)|
= {

|S(m,k)|

for |S(m,k)|
> |X(m,k)|,

|X(m,k)| ,

else.

 is set to 0 <  1 . The remaining noise floor can be used to mask the generated musical noise. Thus the
setting of the two parameters  and  determines the tradeoff between the amount of residual broadband noise
and the level of perceived musical noise. For a fixed value of  , increasing the value of  reduces both the
broadband noise and the musical noise. Increasing  above a certain limit leads to audible distortions because of the
strong attenuation of small signal components.
We adopt this denoising algorithm but use a subtraction factor

(m,k) , which is calculated for each frequency bin

k in each short-time frame m. The spectral subtraction equation 1 changes to

|S(m,k)|
= ( |X(m,k)|b - (m,k) # |D(k)|b )1/b
The subtraction factor (m,k) is a function of an estimated
signal-to-noise ratio

(m,k)

SNRprio(m,k) (see figure 1), which is

controlled by the parameter 0 , the over-subtraction factor


for SNRprio(m,k)
0dB . For magnitude subtraction,

6
5

0

should be within the range of 1.8 to 2.5.

3
2
1
5

10

15

20

25
SNR prio(m,k) (dB)

Figure 1: Subtraction factor (m,k) as a function of


SNRprio(m,k)

2.2 Reducing the Noise Variance


Due to the noise variance, the power of the noise components within a single frame can deviate from the estimated
mean value. This leads to the musical noise phenomenon in spectral subtraction. A number of methods to reduce the
noise variance are suggested in literature. Boll ([Boll,79]) uses a time-averaged signal spectrum
|X(m,k)|
1/M

M 1
i
0

|X(m i,k)| instead of the current signal frame |X(m,k)| , from which the average noise spectrum

is subtracted. This averaging procedure does not only reduce the variance of the superimposed noise
spectrum |D(m,k)| , but also introduces temporal smearing of short transients in the signal spectrum

|S(m,k)| .

Therefore, the averaging is limited to a small number of adjacent frames.


Ephraim and Mallah, ([Ephraim,84],[Ephraim,85]), use a time-averaged SNR for the reduction of the influence of
noise variance. Because of a non-linear averaging procedure, this method has a better performance than the averaging
method described previously. To reduce the variance of the SNR, Ephraim and Mallah introduce a recursive
evaluation scheme for the SNR in the current frame which takes into account information from previous frames:
SNRprio(m,k) = (1-) # P[SNRlocal(m,k)] +  #

|S(m-1,k)|

|D(k)|2

with P[x]=x for x>0 and P[x]=0 else. SNRlocal(m,k) = |X(m,k)|2 / |D(k)|2 - 1
is a signal-to-noise ratio estimated

from the data in the current frame m. |S(m-1,k)|


is the denoising result of the previous frame.
2
2

Therefore, the term |S(m-1,k)| / |D(k)|


is an estimate of the SNR in frame m-1. Due to the noise variance,
SNRlocal(m,k)

is likely to be overestimated for some bins k but but its influence on SNRprio(m,k)

because of the weighting factor with (1-) . For small signal components, the variance of is SNRprio(m,k)
smaller than the variance of

SNRlocal(m,k)

is reduced
much

, as long as the parameter  is close enough to 1. For signal

component levels well above the noise level,


It just follows

SNRlocal(m,k)

SNRprio(m,k)

is not longer a smoothed estimate of the local SNR.

with a delay of one frame. Using such a time averaged SNR helps reducing the

musical noise phenomenon even for recordings whit non-stationary background noise. For details see [Cappe, 94].
The behavior of the smoothed SNR is similar, if the Ephraim-Mallah suppression rule is replaced by the Wiener
suppression rule [Cappe, 94]. If the power subtraction rule is used, the attenuation for SNR values ariund 0 dB is too
small. Therefore,

SNRprio

attenuation around
SNRprio(m,k)
factor

undergoes less smoothing. In our implementation we use over-subtraction. Thus, the

SNR = 0dB is high enough and leads to a temporal smoothed signal-to-noise ratio

. As shown in figure 1,

(m,k)

SNRprio(m,k)

is used to determine the subtraction

2.3 Reduction of Processing Distortions using Psychoacoustic Criteria


Tsoukalas et al, in [Tsoukalas,93] propose a denoising scheme that takes into account psychoacoustic criteria
[Zwicker,90]. Only noise components above an estimated masking threshold are removed from the noisy signal
spectrum. So, the audio signal is little affected by the denoising process and distortions are reduced.
The simultaneous masking property of the human auditory system can be modeled as a spectral smoothing procedure
along a non-linear frequency axis. The masking bandwidth increases with increasing center frequency and depends on
the absolute sound pressure level. The shape of the smoothing filter is asymmetric with a steeper descent to lower than
to higher frequencies. To determine the masking threshold, the signals are usually transformed from the linear Hz scale
to the critical-band-rate scale [Zwicker,90] or to the equivalent rectangular band rate scale [Moore,83].
The authors use a filter Hmask

with a "frequency impulse response" that takes into account some of the auditory

masking properties. The degraded signal spectrum

|X(m,k)|2

is filtered in the linear frequency domain (along

index k). The filter output is


Xmask(m,k) = Hmask(k) t |X(m,k)|2
where t

stand for convolution along k. The duration of the masking filter's "frequency impulse response''

increases for high frequencies which simulates the increasing masking bandwidth. The non-linear level dependence
and the absolute threshold contour are not taken into account.
The smoothed spectrum |Xmask(m,k)|2
a reduced variance of SNRlocal(m,k)

is used instead of |X(m,k)|2

to calculate the local SNR. This results in

and subsequently of SNRprio(m,k)

(see equation 4).

3 Proposed Noise Reduction Method


3.1 Denoising Filter
The proposed denoising scheme uses the methods discussed in section 2. Non-linear spectral subtraction is combined
with non-linear smoothing strategies in both the time and the frequency domain. The smoothing operations are used
to estimate the SNR which exhibits a small variance and are used for determining the transfer function

HFilt(m,k)

of the spectral subtraction filter. The resulting noise reduction method generates small audible processing distortions.
The spectral subtraction equation 3 can be expressed as a non-linear time-variant filter with a zero-phase frequency
response HFilt(m,k)

. The denoised signal frame is obtained with

S(m,k)
=X(m,k) # HFilt(m,k)

and
1

(m,k)

HFilt(m,k) = ( 1 -

SNRFilt(m,k) + 1

)b

The exponent b equals 1 for magnitude subtraction and 2 for power subtraction. The transfer function HFilt(m,k)
depends on the subtraction factor

(m,k)

and the signal-to-noise-ratio

SNRFilt(m,k)

. The local SNR is

calculated from the data in the current frame m:


SNRlocal(m,k) =

|Xmask(m,k)|2

|Xmask(m,k)|2
|D(k)|2

- 1

is a smoothed version of the degraded signal spectrum |X(m,k)|2

, which is passed through the

masking filter described in section 2.3. Refering to equation 4, the local SNR is used to calculate the two
time-averaged signal-to-noise ratios SNRprio(m,k)

(eq.4) and SNRFilt(m,k)

SNRFilt(m,k) = (1- ) # P[SNRlocal(m,k)] + #

|S(m-1,k)|

|D(k)|2

The amount of time-averaging for each of the two SNRs can be controlled independently by the parameters

( ,  = 0

SNRprio(m,k)

means no averaging). The subtraction factor

(m,k)

The denoising filter of equation 6 and 7 is controlled by four parameters:


The noise floor parameter 

&

The over-subtraction factor

&

The averaging parameters

, which determines the remaining broadband noise floor.

0

for
and 

SNRprio(m,k) = 0 dB
.

The block diagram of the proposed noise reduction scheme is shown in figure 2.

and

in equation 6 is a function of

. It is obtained via the non-linear function in figure 1.

&

3.2 Implementation
The algorithm was tested with a real-time implementation in MAX1 on SGI R 5000 and Next-ISPW. In each
implementation, the following parameters were fixed: The window length, the window type (Hanning or Tukey), the
the FFT length, and the window overlap. The sampling frequency was set to 16kHz, 22.05kHz, 32kHz or 44.1kHz.

Signal processing software developed by Miller Puckette at IRCAM-Paris.

Each implementation works either with magnitude spectrum subtraction (b=1 in equation 6) or with power spectrum
subtraction (b=2).

3.3. Influence of Parameter Settings


Beside the main filter parameters , 0,

and

, the parameters of the short-time transform (window

length, window type, FFT length, and overlap) affect the noise reduction results. The influence of each parameter shall
be briefly discussed in the following.
Window Length: The window length must be chosen in order to ensure a good frequency resolution and to prevent
smearing of signal transients. Increasing the window length decreases processing distortions but also introduces
smearing. A length of 30 to 40ms (e.g. 1024 points at 32kHz sampling frequency) was found to be optimal for many
signals.
FFT Length: The proposed filtering operation in the frequency domain can cause long impulse responses. Therefore,
the length of the FFT should be longer than the analysis window. If the zero-padding is to short, time domain aliasing
may occur. The zero-padding factor should be set to 2.
Window Overlap: In implementations with a Hanning window an overlap of four windows (75% overlap) is used.
This helps to reduce the spreading of signal transients due to the time averaged SNR (reported in [Cappe,94]). In
implementations with Tukey window (rectangular window with cosine fade in and fade out) two windows are
overlapped. The amount of overlap depends on the length of the fading portions.
Noise Floor Parameter 

: The parameter

the noise level of the input signal. For high noise levels,

determines the remaining noise floor. Ist value depends on

should be in the range of 0.1 to 0.01. Especially for

old recordings, it was found that the remaining noise floor must not be too small for a natural sound quality.
Over-Subtraction Factor

0

: A higher amount of over-subtraction leads to a stronger attenuation of

components with a low SNR. This prevents musical noise, but suppresses too many small signal components. A range
between 1.8 and 2.3 was found suitable for magnitude subtraction. For power subtraction, the optimal range is between
3 and 6 (see also [Berouti,83]).
Averaging Parameters

and

: These parameters control the amount of time averaging for the SNR

estimation. The averaging reduces the SNR variance, but also introduces some smearing of signal transients (see
[Cappe,94]). Both parameters should be within the range of 0.9 to 0.98.

4 Conclusion
A broadband noise reduction method based on non-linear spectral subtraction was presented. Non-linear smoothing
operations in both the time and the frequency domain are used to determine low variance estimates of the SNR. From
the estimated SNR, the transfer function of the spectral subtraction filter is calculated. Over-subtraction and the
averaging procedures prevent musical noise. Listening tests show that proper adjustment of the control parameters
leads to excellent restoration results.
Using a short-time transform with nonuniform frequency resolution, and taking into account information from the
short-time phase spectrum might lead to further improvement of the proposed method.

References
[Berouti,83] M.Berouti, R.Schwartz,and J.Makhoul: Enhancement of speech corrupted by acoustic noise. In Speech
Enhancement. Edited by J.S.Lim, Prentice-Hall, Inc. Englewood Cliffs, New Jersey 07632, 1983, pp.69-73.
[Boll,79] S.F.Boll: Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. on Acoustics,
Speech, and Signal Processing, Vol.ASSP-27, No.2, pp. 113-120, April 1979.
[Capp,94] O.Capp: Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor.
IEEE Trans. on Speech and Signal processing, vol.2, no.2, pp.345-349, April 1994.
[Capp,95] O.Capp, J.Laroche: Evaluation of short-time spectral attenuation techniques for the restoration of musical
recordings. IEEE Trans. on Speech and Audio Processing, vol.3, no.1, pp.84--93, Jan.1995
[Ephraim,84] Y.Ephraim and D.Malah: Speech enhancement using a minimum mean-square error short-time spectral
amplitude estimator. IEEE Trans. on Acoustics, Speech, and Signal Processing, vol.32, no.6, pp.1109-1121, December
1984.
[Ephraim,85]Y.Ephraim and D.Malah: Speech enhancement using a minimum mean-square error log-spectral
amplitude estimator. IEEE Trans. on Acoustics, Speech, and Signal Processing, vol.33, no..2, pp..443-445, April 1985.
[Lim,83] J.S.Lim (editor): Speech Enhancement. Prentice-Hall, Inc. Englewood Cliffs, New Jersey 07632, 1983.
[Moore, 83] B.C.J.Moore and B.R.Glasberg: Suggested formulae for calculating auditory-filter bandwidths and
excitation patterns. J.Acoust.Soc.Am., vol.74, no.3, pp.750-753, September 1983.
[Tsoukalas,93] D.Tsoukalas, M.Paraskevas, J.Mourjopoulos: Speech enhancement using psychoacoustic criteria. Proc.
IEEE, ICASSP-1993, vol. II, pp. II-359--II-362.
[Vaseghi,92] S.V.Vaseghi and R.Frayling-Cork: Restoration of old gramophone recordings. Journal AES, vol.40,
no.10, pp.791-801, October 1992.
[Vaseghi,96] S.V.Vaseghi: Advanced Signal Processing and Digital Noise Reduction. Wiley & Sons Ltd. and B.G.
Teubner, 1996.
[Zwicker,90] E.Zwicker, H.Fastl. Psychoacoustics: Facts and Models. Springerverlag, 1990.

Você também pode gostar