Você está na página 1de 19

Perceptual WPT and time-adaptive level

thresholding based enhancement of


degraded speech

Presented by
Nitesh Kumar Chaudhary
Department of Electronics & Communication Engineering
The LNM Institute Of Information Technology, Jaipur
Under the Supervision of
Dr. Navneet upadhyay

Why speech enhancement ?...


The presence of noise in speech can significantly reduce the intelligibility of
speech and degrade automatic speech recognition performance.
Reduction of noise has become an important issue in speech signal processing
system, such as speech coding and speech recognition system.
(a) Additive acoustic noise - such as the noise added to the speech signal when
recorded in an environment with noticeable background noise, like in an aircraft
cockpit.
(b) Acoustic reverberation - results from the additive effect of multiple reflections
of an acoustic signal.

(c) Convolutive channel effects - resulting in an uneven or band-limited response,


can result when the communication channel is not modeled effectively for the
channel equalizer to remove the channel impulse response.

(d) Electrical interference


(e) Codec distortion - distortion caused by the coding algorithm due to compression
(f) Distortion introduced by recording apparatus - poor response of microphone

Keywords: Perceptual Wavelet packet transform (PWPT), Time adaptive Thresholding,


TEO, Probability of detection Pd and false alarm Pf, Masking.

Block Diagram

Noisy Signal
X(n)

Wj,m (K)
Perceptual WPT

m =1...17

Teager Energy
Operator

tj,m (K)

Critical Band
Selection
m =1...17
Mj,m (K)
m =1...17

Recovered
Clean Signal
Y(n)

VAS & Time


adaptive
Thresholding

Wm (n)

Lj,m (K)
Inverse PWPT

m =1...17

m =1...17

level
dependent
Thresholding

Perceptual Wavelet Packet Transform :

The Wavelet Packet Transform (WPT) is one such time frequency analysis
tools. It is a transform that brings the signal into a domain that contains both
time and frequency information.

In wavelet analysis, a signal is split into an approximation and a detail. The


approximation is then itself split into a second-level approximation and detail,
and the process is repeated.

In the corresponding Perceptual wavelet packet situation, each detail coefficient


vector is also decomposed into two parts using the same approach as in
approximation vector splitting and 17 critical bands are selected because for
speech with 8 kHz sampling rate, 17 critical bands are required to cover the
entire range of frequency

Noisy Signal Wavelet Packet Decomposition


0.4

(0,0)
0.3

(1,0)

(1,1)

(2,0)

(2,1)

(3,0)

(3,1)

(3,2)

(2,2)

(3,3)

(3,4)

(2,3)

(3,5) (3,6) (3,7)

Signal Magnitude

Decomposition Level

0.2

0.1

-0.1
(4,0)

(4,1)

(4,2)

(4,3)

(4,4) (4,5) (4,6) (4,7) (4,8) (4,9)

-0.2
(5,0) (5,1) (5,2) (5,3) (5,4) (5,5) (5,6) (5,7)

-0.3
Wavelet Decomposition

0.5

Sample Point

1.5

2
4

x 10

data1
data2
data3
data4
data5
data6
data7
data8
data9
data10
data11
data12
data13
data14
data15
data16
data17
data18
data19
data20
data21
data22
data23
data24
data25
data26
data27
data28
data29
data30
data31
data32

TEO & level dependent thresholding


TEO is powerful non-linear operator which has been successfully used in various
speech applications, TEO can then be used to estimate the second moment
angular bandwidth of a signal and the moments of a signal duration and that of
its spectrum.
TEO can determine the energy functions of quite complicated functions For a
given band limited signal, TEO introduced by Kaiser is given by

[()] = () ( + )( )

The time adaptive threshold selection for wavelet coefficients has been
computed, which takes care of varying noise time into account.
,() =

, ,
,

{ , }

0.4

(0,0)
0.3

(1,0)

(1,1)

(2,0)

(2,1)

(3,0)

(3,1)

(3,2)

(2,2)

(3,3)

(3,4)

(2,3)

(3,5) (3,6) (3,7)

Signal Magnitude

Decomposition Level

0.2

0.1

-0.1
(4,0)

(4,1)

(4,2)

(4,3)

(4,4) (4,5) (4,6) (4,7) (4,8) (4,9)

-0.2
(5,0) (5,1) (5,2) (5,3) (5,4) (5,5) (5,6) (5,7)

-0.3
Wavelet Decomposition

0.5

1.5
Sample Point

2
4

x 10

Masking Construction:
For a selected band, mask is obtained by
, = , (

Where * denotes the convolution operation and Hj (k) is 256

point level dependent

Hamming window.

The voice activity shape V(n) is calculated by

()
=

Where Wm(n) is the inverse perceptual Wavelet packet tranform of Mj,m k in equation

Time adaptive threshold calculation :


To determine this time-adaptive threshold value AWT, an iterative algorithm has been proposed .

. ,

) <

Where AWT(i) is the time adaptive threshold value of frame i, and frame(i) is defined as
Frame(i) = [V(( i-1)*160 + 1], [V(( i-1)*160],
Noise is defined as Noise(n) = p *{E[V(2)(n)] + Mean(Frame(i))}/2
E[V(k)(n)] is the mean of V(k)(n).
The voice-active regions are characterized by V(n) > AWT

Level 3
Noise Signal of level 3rd of Wavelet Tree

Denoised Signal of level 3rd of Wavelet Tree


Node (3,5)
Signal Amplitude

Signal Amplitude

Node (3,5)
1
0.5
0
-0.5
-1

500

1000

1500

2000

2500

3000

1
0.5
0
-0.5
-1

3500

500

1000

Frequency in Hz

0.5
0
-0.5
500

1000

1500

2000

2500

3000

500

1000

Signal Amplitude

Signal Amplitude

-0.5
2000

Frequency in Hz

1500

2000

3500

2500

3000

3500

Node (3,7)

1500

3000

-0.5

Frequency in Hz

0.5

1000

2500

Node (3,7)

500

3500

0.5

-1

3500

3000

Frequency in Hz

-1

2500

Node (3,6)
Signal Amplitude

Signal Amplitude

Node (3,6)

2000

Frequency in Hz

-1

1500

2500

3000

3500

1
0.5
0
-0.5
-1

500

1000

1500

2000

Frequency in Hz

Level 3, node by node denoising

Level 4
Denoised Signal Of Level 4th Of Wavelet Tree

Noise Signal Of Level 4th Of Wavelet Tree

Node (4,4)

Node (4,4)

200

400

600

800
1000
Frequency in Hz
Node (4,5)

1200

800
1000
Frequency in Hz
Node (4,6)

1200

800
1000
Frequency in Hz
Node (4,7)

1200

1400

200

400

600

1400

Amp

200

400

600

1400

1600

Amp

Amp

200

400

600

800
1000
Frequency in Hz
Node (4,8)

1200

1400

1600

Amp

Amp

0
200

400

600

800
1000
Frequency in Hz
Node (4,9)

1200

1400

1600

Amp

0
200

400

600

800
1000
Frequency in Hz

1400

1600

200

400

600

800
1000
Frequency in Hz
Node (4,6)

1200

1400

1600

200

400

600

800
1000
Frequency in Hz
Node (4,7)

1200

1400

1600

200

400

600

800
1000
Frequency in Hz
Node (4,8)

1200

1400

1600

200

400

600

800
1000
Frequency in Hz
Node (4,9)

1200

1400

1600

200

400

600

800
1000
Frequency in Hz

1200

1400

1600

0
-1

1200

-1

800
1000
Frequency in Hz
Node (4,5)

0
-1

600

-1

400

0
-1

200

0
-1

1
0
-1

Amp

1
0
-1

1600

1
0
-1

1
0
-1

1600

Amp

1
0
-1

Amp

Amp

-1

Amp

Amp

1
0

1200

1400

1600

Level 4, node by node denoising

Level 5
Noise Signal Of Level 5th Of Wavelet Tree
Node (5,0)

Node (5,1)

200
400
600
Frequency in Hz
Node (5,4)

200
400
600
Frequency in Hz
Node (5,5)

200
400
600
Frequency in Hz
Node (5,6)

Amp

200
400
600
Frequency in Hz
Node (5,7)

Amp

200
400
600
Frequency in Hz

800

-1

Amp

200
400
600
Frequency in Hz
Node (5,4)

200
400
600
Frequency in Hz

800

800

200
400
600
Frequency in Hz
Node (5,5)

800

200
400
600
Frequency in Hz
Node (5,7)

800

200
400
600
Frequency in Hz

800

200
400
600
Frequency in Hz
Node (5,6)

0
-1

800

0
-1

200
400
600
Frequency in Hz
Node (5,3)

0
-1

800

0
-1

800

0
-1

800

0
-1

800

200
400
600
Frequency in Hz
Node (5,2)

0
-1

800

Amp

Amp

Amp

1
Amp

0
-1

800

-1

-1

800

Amp

-1

200
400
600
Frequency in Hz
Node (5,3)

1
Amp

Amp

Amp

-1

800

Amp

200
400
600
Frequency in Hz
Node (5,2)

Amp

Node (5,1)

1
Amp

-1

Node (5,0)

1
Amp

Amp

-1

Denoised Signal Of Level 5th Of Wavelet Tree

200
400
600
Frequency in Hz

Level 5, node by node denoising

800

0
-1

Evaluation

To verify the effectiveness of the proposed algorithms, we compared the speech detection
and false-alarm probabilities
The proposed methods are all evaluated by receiver operating characteristic (ROC)
curves which show discriminative properties of VAD between noise-only and noisy
speech frames in terms of the Probability of Correct detection (Pd) and Probability of
false-alarm (Pf) such that

Performance Evaluation
20.6710 dB
shape-preserving
linear

0.01

Pd: Probability of detection

10

10

-0.01

10

-0.01

10

10

0.01

0.02

10
10
Pf: Probability of False alarm

0.03

10

0.04

10

Wavelet Filter type (filter


Length)

Probability Of Correct
Detection (Pd %)

Probability Of False Alarm


(Pf %)

Computation time
(CP)

Daubechies 2

86.4

15.6

2.872 s

Daubechies 4

89.3

11.7

2.884 s

Daubechies 8

91.8

9.2

3.023 s

Daubechies 10

94.3

5.7

3.074 s

Daubechies 12

94.5

5.5

3.898 s

Daubechies 14

94.8

5.2

3.899 s

The cost-performance (CP) is defined as


CP =

( )

Where the CP time is the average PWPT process time of specific wavelet. Considering the
cost performance rate given in Table 1, the Daubechies wavelet filter with length 12,
which has the best CP ratio, is recommended for the proposed algorithm.

References :

Shi-Huang Chen, HsinTe Wu, Yukon Chang and T.K. Truong Robust voice activity
detection using perceptual wavelet-packet transform and Teager energy operator in Pattern
Recognition Letters 28 (2007) 13271332.
Daubechies, I. (1992), Ten lectures on wavelets, CBMS-NSF conference series in applied
mathematics, SIAM Ed.
D. L. Donoho, I. M. Johnstone, Ideal Spatial Adaptation via Wavelet Shrinkage,
Biometrika, vol. 81, pp. 425-455, 1994.
S. Mallat, A theory for multiresolution signal decompo-sition: The wavelet representation,
IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 11, No. 7, pp. 674
693, July 1989.
M. Berouti, R. Schwartz, and J. Makhoul, Enhancement of speech corrupted by acoustic
noise, in Proc. IEEE ICASSP, Apr. 1979, pp. 208211.
Johnstone, I.M., Silverman, B.W., 1997. Wavelet threshold estimators for data with correlated
noise. J. Roy. Stat. Soc. B 59, 319351.
G. David Forney, Jr., Exponential error bounds for erasure, list, and decision feedback
schemes, Information Theory, IEEE Transactions on, vol. 14, no. 2, pp. 206220, Mar 1968.

TEO is powerful non-linear operator which has


been successfully used in various speech
applications, TEO can then be used to estimate
the second moment angular bandwidth of a
signal and the moments of a signal duration and
that of its spectrum.
TEO can determine the energy functions of
quite complicated functions For a given band
limited signal, TEO introduced by Kaiser is
given by

The time adaptive threshold selection for


wavelet coefficients has been computed, which
takes care of varying noise time into account.

Você também pode gostar