Escolar Documentos
Profissional Documentos
Cultura Documentos
Abstract
A real-time implementation of a broadband noise reduction method based on non-linear spectral subtraction is
proposed. To prevent processing distortions in form of musical noise, over-subtraction is applied to the degraded signal
spectrum. Furthermore, time-averaging is used to reduce the variance of the estimated signal-to-noise ratio (SNR). A
masking threshold obtained by spectral smoothing leads to further reduction of audible processing distortions.
1 Introduction
Old audio recordings (e.g. on tape, record or wax cylinder) are corrupted by different signal degradations. Digital
signal processing (DSP) is used for restoration, where each kind of degradation is processed independently. In this
paper the emphasis will be on broadband noise reduction.
Broadband noise is common to all forms of recording. Usually the noisy signals are processed in the frequency domain
using short-time spectral attenuation techniques, where components are attenuated according to their signal-to-noise
ratio (SNR). These techniques were firstly applied to speech signals within the area of speech enhancement [Lim,83],
befoe they were adopted for noise reduction in musical recordings [Vaseghi,92]. Although the restoration of speech
signals is closely related to the restoration of musical signals, the emphasis is on different criteria. While in speech
enhancement intelligibility is the main criteria, sound quality is the dominant parameter in restoration of musical
recordings [Cappe,95].
One well-known method for achieving the broadband noise reduction is based on spectral subtraction
([Boll,79],[Vaseghi,96]). The degraded signal x(n) is modeled as a pure audio signal s(n) and a superimposed
broadband noise d(n). The signal degradation is processed in the frequency domain using the short-time Fourier
transform. For each frequency bin k and each frame m, a specific amount of the noise spectrum |D(k)|b is subtracted
from the short-time spectrum |X(m,k)|b ( b=1 for magnitude subtraction, and b=2 for power subtraction) . The noise
spectrum |D(k)| has to be estimated from a noise only signal segment. For re-synthesis the denoised signal spectrum
|S(m,k)|
is combined with the original phase spectrum arg[X(m,k)] . The spectral subtraction method can be
modeled as time variant non-linear filter. Its transfer function depends on the signal-to-noise ratio SNR(m,k) which is
estimated from the short-time spectrum |X(m,k)| and the noise spectrum |D(k)| .
The noise variance within a single frame causes the SNR to be overestimated for some frequency bins. This results in
a residual noise consisting of short sinusoidal impulses whose frequencies vary from frame to frame. This phenomenon
is known as musical noise. The noise variance can be reduced by using a time-averaged signal spectrum instead of
|X(m,k)|
([Boll,79],[Vaseghi,96]). This reduces the musical noise but without completely eliminating it. Vaseghi
et al, in [Vaseghi,92] exploit the musical noise's behavior for its detection and partial elimination.
An alternative method to prevent this processing distortion in spectral subtraction is to apply over-subtraction
[Berouti,83] to the degraded signal spectrum. According to the estimated SNR, more than the average noise spectrum
has to be subtracted. This leads to a strong attenuation of small signal components. If the factor of over-subtraction is
high enough, the musical noise will be completely eliminated, but audible distortions in the audio signal can be
generated.
The noise suppression rule proposed by Ephraim and Mallah ([Ephraim,84],[Ephraim,85]) allows significant noise
reduction without causing musical noise. This is mainly due to using a non-linear time-averaged SNR [Cappe,94],
which exhibits a lower variance than a SNR estimated without averaging. Furthermore, the masking properties of the
human auditory system can be used to reduce processing distortions (e.g. [Tsoukalas,93]).
The proposed method combines spectral over-subtraction with several smoothing strategies in both in time and in
frequency domain to reduce the SNR variance. This leads to little audible processing distortions. Section 2 discusses
the methods used within the denoising scheme. In section 3, the new noise reduction filter is described.
. The scaling parameter determines the residual noise floor after re-synthesis. The
|S(m,k)|
= ( |X(m,k)|b - (m) # |D(k)|b )1/b
and
|S(m,k)|
= {
|S(m,k)|
for |S(m,k)|
> |X(m,k)|,
|X(m,k)| ,
else.
is set to 0 < 1 . The remaining noise floor can be used to mask the generated musical noise. Thus the
setting of the two parameters and determines the tradeoff between the amount of residual broadband noise
and the level of perceived musical noise. For a fixed value of , increasing the value of reduces both the
broadband noise and the musical noise. Increasing above a certain limit leads to audible distortions because of the
strong attenuation of small signal components.
We adopt this denoising algorithm but use a subtraction factor
|S(m,k)|
= ( |X(m,k)|b - (m,k) # |D(k)|b )1/b
The subtraction factor (m,k) is a function of an estimated
signal-to-noise ratio
(m,k)
6
5
0
3
2
1
5
10
15
20
25
SNR prio(m,k) (dB)
M 1
i
0
|X(m i,k)| instead of the current signal frame |X(m,k)| , from which the average noise spectrum
is subtracted. This averaging procedure does not only reduce the variance of the superimposed noise
spectrum |D(m,k)| , but also introduces temporal smearing of short transients in the signal spectrum
|S(m,k)| .
|S(m-1,k)|
|D(k)|2
with P[x]=x for x>0 and P[x]=0 else. SNRlocal(m,k) = |X(m,k)|2 / |D(k)|2 - 1
is a signal-to-noise ratio estimated
is likely to be overestimated for some bins k but but its influence on SNRprio(m,k)
because of the weighting factor with (1-) . For small signal components, the variance of is SNRprio(m,k)
smaller than the variance of
SNRlocal(m,k)
is reduced
much
SNRlocal(m,k)
SNRprio(m,k)
with a delay of one frame. Using such a time averaged SNR helps reducing the
musical noise phenomenon even for recordings whit non-stationary background noise. For details see [Cappe, 94].
The behavior of the smoothed SNR is similar, if the Ephraim-Mallah suppression rule is replaced by the Wiener
suppression rule [Cappe, 94]. If the power subtraction rule is used, the attenuation for SNR values ariund 0 dB is too
small. Therefore,
SNRprio
attenuation around
SNRprio(m,k)
factor
SNR = 0dB is high enough and leads to a temporal smoothed signal-to-noise ratio
. As shown in figure 1,
(m,k)
SNRprio(m,k)
with a "frequency impulse response" that takes into account some of the auditory
|X(m,k)|2
stand for convolution along k. The duration of the masking filter's "frequency impulse response''
increases for high frequencies which simulates the increasing masking bandwidth. The non-linear level dependence
and the absolute threshold contour are not taken into account.
The smoothed spectrum |Xmask(m,k)|2
a reduced variance of SNRlocal(m,k)
HFilt(m,k)
of the spectral subtraction filter. The resulting noise reduction method generates small audible processing distortions.
The spectral subtraction equation 3 can be expressed as a non-linear time-variant filter with a zero-phase frequency
response HFilt(m,k)
S(m,k)
=X(m,k) # HFilt(m,k)
and
1
(m,k)
HFilt(m,k) = ( 1 -
SNRFilt(m,k) + 1
)b
The exponent b equals 1 for magnitude subtraction and 2 for power subtraction. The transfer function HFilt(m,k)
depends on the subtraction factor
(m,k)
SNRFilt(m,k)
|Xmask(m,k)|2
|Xmask(m,k)|2
|D(k)|2
- 1
masking filter described in section 2.3. Refering to equation 4, the local SNR is used to calculate the two
time-averaged signal-to-noise ratios SNRprio(m,k)
|S(m-1,k)|
|D(k)|2
The amount of time-averaging for each of the two SNRs can be controlled independently by the parameters
( , = 0
SNRprio(m,k)
(m,k)
&
&
0
for
and
SNRprio(m,k) = 0 dB
.
The block diagram of the proposed noise reduction scheme is shown in figure 2.
and
in equation 6 is a function of
&
3.2 Implementation
The algorithm was tested with a real-time implementation in MAX1 on SGI R 5000 and Next-ISPW. In each
implementation, the following parameters were fixed: The window length, the window type (Hanning or Tukey), the
the FFT length, and the window overlap. The sampling frequency was set to 16kHz, 22.05kHz, 32kHz or 44.1kHz.
Each implementation works either with magnitude spectrum subtraction (b=1 in equation 6) or with power spectrum
subtraction (b=2).
and
length, window type, FFT length, and overlap) affect the noise reduction results. The influence of each parameter shall
be briefly discussed in the following.
Window Length: The window length must be chosen in order to ensure a good frequency resolution and to prevent
smearing of signal transients. Increasing the window length decreases processing distortions but also introduces
smearing. A length of 30 to 40ms (e.g. 1024 points at 32kHz sampling frequency) was found to be optimal for many
signals.
FFT Length: The proposed filtering operation in the frequency domain can cause long impulse responses. Therefore,
the length of the FFT should be longer than the analysis window. If the zero-padding is to short, time domain aliasing
may occur. The zero-padding factor should be set to 2.
Window Overlap: In implementations with a Hanning window an overlap of four windows (75% overlap) is used.
This helps to reduce the spreading of signal transients due to the time averaged SNR (reported in [Cappe,94]). In
implementations with Tukey window (rectangular window with cosine fade in and fade out) two windows are
overlapped. The amount of overlap depends on the length of the fading portions.
Noise Floor Parameter
: The parameter
the noise level of the input signal. For high noise levels,
old recordings, it was found that the remaining noise floor must not be too small for a natural sound quality.
Over-Subtraction Factor
0
components with a low SNR. This prevents musical noise, but suppresses too many small signal components. A range
between 1.8 and 2.3 was found suitable for magnitude subtraction. For power subtraction, the optimal range is between
3 and 6 (see also [Berouti,83]).
Averaging Parameters
and
: These parameters control the amount of time averaging for the SNR
estimation. The averaging reduces the SNR variance, but also introduces some smearing of signal transients (see
[Cappe,94]). Both parameters should be within the range of 0.9 to 0.98.
4 Conclusion
A broadband noise reduction method based on non-linear spectral subtraction was presented. Non-linear smoothing
operations in both the time and the frequency domain are used to determine low variance estimates of the SNR. From
the estimated SNR, the transfer function of the spectral subtraction filter is calculated. Over-subtraction and the
averaging procedures prevent musical noise. Listening tests show that proper adjustment of the control parameters
leads to excellent restoration results.
Using a short-time transform with nonuniform frequency resolution, and taking into account information from the
short-time phase spectrum might lead to further improvement of the proposed method.
References
[Berouti,83] M.Berouti, R.Schwartz,and J.Makhoul: Enhancement of speech corrupted by acoustic noise. In Speech
Enhancement. Edited by J.S.Lim, Prentice-Hall, Inc. Englewood Cliffs, New Jersey 07632, 1983, pp.69-73.
[Boll,79] S.F.Boll: Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. on Acoustics,
Speech, and Signal Processing, Vol.ASSP-27, No.2, pp. 113-120, April 1979.
[Capp,94] O.Capp: Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor.
IEEE Trans. on Speech and Signal processing, vol.2, no.2, pp.345-349, April 1994.
[Capp,95] O.Capp, J.Laroche: Evaluation of short-time spectral attenuation techniques for the restoration of musical
recordings. IEEE Trans. on Speech and Audio Processing, vol.3, no.1, pp.84--93, Jan.1995
[Ephraim,84] Y.Ephraim and D.Malah: Speech enhancement using a minimum mean-square error short-time spectral
amplitude estimator. IEEE Trans. on Acoustics, Speech, and Signal Processing, vol.32, no.6, pp.1109-1121, December
1984.
[Ephraim,85]Y.Ephraim and D.Malah: Speech enhancement using a minimum mean-square error log-spectral
amplitude estimator. IEEE Trans. on Acoustics, Speech, and Signal Processing, vol.33, no..2, pp..443-445, April 1985.
[Lim,83] J.S.Lim (editor): Speech Enhancement. Prentice-Hall, Inc. Englewood Cliffs, New Jersey 07632, 1983.
[Moore, 83] B.C.J.Moore and B.R.Glasberg: Suggested formulae for calculating auditory-filter bandwidths and
excitation patterns. J.Acoust.Soc.Am., vol.74, no.3, pp.750-753, September 1983.
[Tsoukalas,93] D.Tsoukalas, M.Paraskevas, J.Mourjopoulos: Speech enhancement using psychoacoustic criteria. Proc.
IEEE, ICASSP-1993, vol. II, pp. II-359--II-362.
[Vaseghi,92] S.V.Vaseghi and R.Frayling-Cork: Restoration of old gramophone recordings. Journal AES, vol.40,
no.10, pp.791-801, October 1992.
[Vaseghi,96] S.V.Vaseghi: Advanced Signal Processing and Digital Noise Reduction. Wiley & Sons Ltd. and B.G.
Teubner, 1996.
[Zwicker,90] E.Zwicker, H.Fastl. Psychoacoustics: Facts and Models. Springerverlag, 1990.