Escolar Documentos
Profissional Documentos
Cultura Documentos
Amplitud
3
e -2
-3
-4
-5
-10 -5 -1 0 5 10
-3
-5
y
1
3.14) -10 -5 -1 0
-3
5 10
-5
Time-domain signals
The Independent Variable is Time
The Dependent Variable is the Amplitude
Most of the Information is Hidden in the
Frequency Content
1 1
0.5 0.5
Magnitude
2 Hz
Magnitude
0 0
10 Hz
-0.5 -0.5
-1 -1
0 0.5 1 0 0.5 1
Time Time
1 4
0.5 2 2 Hz +
20 Hz 10 Hz +
Magnitude
Magnitude
0 0
20Hz
-0.5 -2
-1 -4
0 0.5 1 0 0.5 1
Time Time
Signal Transformation
Why
To obtain a further information from the signal that
is not readily available in the raw signal.
Raw Signal
Normally the time-domain signal
Processed Signal
A signal that has been "transformed" by any of the
available mathematical transformations
Fourier Transformation
The most popular transformation
between time and frequency domains
Frequency domain analysis
Phase
ϕ = tan-1 (b/a)
Frequency domain analysis
Frequency Spectrum
Be basically the frequency components (spectral
components) of that signal
Show what frequencies exists in the signal
Fourier Transform (FT)
One way to find the frequency content
Tells how much of each frequency exists in a signal
Spectrum of
speech signal
Fourier Transform
•Fourier transform decomposes a function into a spectrum
of its frequency components,
•The inverse transform synthesizes a function from its
spectrum of frequency components
•Discrete Fourier transform pair is defined as:
(Hz)
5
4
3
2
1
0
-1 0 200 400 600 800 1000 1200 1400
-2
-3
-4
-5
5 10 15
(Hz)
Fourier Trans. of 1D signal
5
4
3
2
1
0
-1 0 200 400 600 800 1000 1200 1400
-2
-3
5 10 15
-4
-5
(Hz)
Fourier Spectrum of 1D
Fourier Transform
11
kkk==1
=11 -0.5
-0.5
-1
-1
-1.5
-1.5
00 22 44 66 88 10
10
tt
Stationary Signal
Signals with frequency content
unchanged over the entire time
All frequency components exist at all
times
Non-stationary Signal
Frequency changes in time
One example: the “Chirp Signal”
Stationarity of the signal
2 Hz + 10 Hz + 20Hz 3 6
0
0 Occur at all times
2 5
0
0
Magnitude
Magnitude
1 4
0
0
Stationary 0 3
0
0
-
1 2
0
0
-
2 1
0
0
-
3 0
0 0
.
2 0
.
4 0
.
6 0
.
8 1 0 5 1
0 1
5 2
0 2
5
Time Frequency (Hz)
0.4-0.7: 10 Hz +
0
.
8
0
.
6 2
0
0
Magnitude
0.7-1.0: 20Hz
Magnitude
0
.
4
0
.
2 1
5
0
Non- 0
Stationary -
0
.
2
-
0
.
4
1
0
0
-
0
.
6 5
0
-
0
.
8
-
1 0
0 0
.
5 1 0 5 1
0 1
5 2
0 2
5
Time Frequency (Hz)
Chirp signal
Frequency: 2 Hz to 20 Hz
Frequency: 20 Hz to 2 Hz
Different in Time Domain
1 150 1 150
0.8 0.8
0.6 0.6
0.4 0.4
100
Magnitude
100
Magnitude
Magnitude
Magnitude
0.2 0.2
0 0
-0.2 -0.2
50 50
-0.4 -0.4
-0.6 -0.6
-0.8 -0.8
-1 0 -1 0
0 0.5 1 0 5 10 15 20 25 0 0.5 1 0 5 10 15 20 25
Time Frequency (Hz) Time Frequency (Hz)
STFTX( ω ) ( t ′, f ) = ∫ [ x( t ) • ω* ( t − t ′) ] • e − j 2 πft dt
t
ω( t ) : the window function
FT
FT
Speech signal
and its STFT
Drawbacks of STFT
Unchanged Window
Dilemma of Resolution
Narrow window -> poor frequency resolution
Wide window -> poor time resolution
Heisenberg Uncertainty Principle
Cannot know what frequency exists at what time
intervals
Via Narrow Via Wide Window
Window
Wavelet Transform
To overcome some limitations of
Fourier transform
S
S
A1 D
1
A2 D2
A3 D3
Discrete Wavelet
decomposition
Wavelet Overview
Wavelet
A small wave
Wavelet Transforms
Provide a way for analyzing waveforms, bounded in both
frequency and duration
Allow signals to be stored more efficiently than by Fourier
transform
Be able to better approximate real-world signals
Well-suited for approximating data with sharp discontinuities
“The Forest & the Trees”
Notice gross features with a large "window“
Notice small features with a small "window”
Multi-resolution analysis
Wavelet Transform
An alternative approach to the short time Fourier
transform to overcome the resolution problem
Similar to STFT: signal is multiplied with a function
Multi-resolution Analysis
Analyze the signal at different frequencies with
different resolutions
Good time resolution and poor frequency resolution at
high frequencies
Good frequency resolution and poor time resolution at
low frequencies
More suitable for short duration of higher frequency;
and longer duration of lower frequency components
Advantages of WT over STFT
1 * t − τ
CWT ( τ, s ) = Ψ ( τ, s ) =
ψ
x
ψ
x ∫ x( t ) • ψ dt
s s
Translation
(The location of Scale
the window)
Mother Wavelet
Wavelet
Small wave
Means the window function is of finite length
Mother Wavelet
A prototype for generating the other window functions
All the used windows are its dilated or compressed and
shifted versions
Principles of WT
Wavelet bases
Time
domain Frequency
domain
-1 2
j ω η −η
Wavelet Basis Functions: Morlet (ω0 = frequency ) : π 4 e 0 e
2
2 m i m m!
Paul ( m = order ) : DOG (1 − iη) −( m+1)
π( 2m )!
( - 1) m+1d m −η2
( )
DOG ( m = devivative ) : e 2
1 dη m
Derivative Of a Gaussian Γ m +
M=2 is the Marr or Mexican hat wavelet 2
Scale of wavelet
Scale
S>1: dilate the signal
S<1: compress the signal
Low Frequency -> High Scale -> Non-
detailed Global View of Signal -> Span
Entire Signal
High Frequency -> Low Scale -> Detailed
View Last in Short Time
Only Limited Interval of Scales is Necessary
Computation of WT
1 * t − τ
CWT xψ ( τ, s ) = Ψxψ ( τ, s ) = ∫ x ( t ) • ψ dt
s s
Step 1: The wavelet is placed at the beginning of the
signal, and set s=1 (the most compressed wavelet);
Step 2: The wavelet function at scale “1” is multiplied
by the signal, and integrated over all times; then
multiplied by ;
1 s
Step 3: Shift the wavelet to t= , and get the
transform value at t= and s=1;τ
τ
Step 4: Repeat the procedure until the wavelet
reaches the end of the signal;
Step 5: Scale s is increased by a sufficiently small
value, the above procedure is repeated for all s;
Step 6: Each computation for a given s fills the single
row of the time-scale plane;
Step 7: CWT is obtained if all s are calculated.
Time & Frequency Resolution
Better time
resolution;
Poor
frequency
resolution
Frequency
Better
frequency
resolution;
Poor time
resolution Time
• Each box represents a equal portion
• Resolution in STFT is selected once for entire
Comparison of transformations
Discretization of WT
S 2 4 8 …
N 2 = s1 s2 ⋅ N1 = f1 f 2 ⋅ N1
N 32 16 8 …
Effective and Fast DWT
A3 D3
Decomposition with DWT
Halves the Time Resolution
Only half number of samples resulted
Doubles the Frequency Resolution
The spanned frequency band halved
0-1000 Hz
256
X[n]5 Filter 1 D1: 500-1000 Hz
12
256
S
S Filter 2 D2: 250-500 Hz
A1 128
A1 D1 128
Filter 3 D3: 125-250 Hz
A2 64
A2 D2
A3 D3 A3: 0-125 Hz
64
Decomposition of non-
stationary signal
fL
Signal:
0.0-0.4: 20 Hz
0.4-0.7: 10 Hz
0.7-1.0: 2 Hz
Wavelet: db4
fH
Level: 6
Decomposition of non-
stationary signal
fL
Signal:
0.0-0.4: 2 Hz
0.4-0.7: 10 Hz
0.7-1.0: 20Hz
Wavelet: db4
fH
Level: 6
Reconstruction from WT
What
How those components can be assembled
back into the original signal without loss of
information?
A Process After decomposition or analysis.
Also called synthesis
How
Reconstruct the signal from the wavelet
coefficients
Where wavelet analysis involves filtering and
downsampling, the wavelet reconstruction
process consists of upsampling and filtering
Reconstruction from WT
Highest Frequencies
Appear at the Start of
The Original Signal
Approximations Appear
Less and Less Noisy
Also Lose Progressively
More High-frequency
Information.
In A5, About the First
20% of the Signal is
Truncated
Breakdown Detection
Purpose
Resolving a signal into constituent
sinusoids of different frequencies
The signal is a sum of three pure
sine waves
Analysis
D1 contains signal components whose
period is between 1 and 2.
Zooming in on detail D1 reveals that
each "belly" is composed of 10
oscillations.
D3 and D4 contain the medium sine
frequencies.
There is a breakdown between
approximations A3 and A4 -> The
medium frequency been subtracted.
Approximations A1 to A3 be used to
estimate the medium sine.
Zooming in on A1 reveals a period
of around 20.
Empirical Mode Decomposition
Principle
Objective — From one observation of x(t), get a AM-FM type
representation :
K
A LF sawtooth 0
-1
0 1
+
1
A linear FM 0
-1
0 1
= 0
0 1
Empirical Mode Decomposition
Algorithmic definition
S
I
F
T
I
N
G
P
R
O
C
E
S
S
Empirical Mode Decomposition
Algorithmic definition
S
I
F
T
I
N
G
P
R
O
C
E
S
S
Empirical Mode Decomposition
Algorithmic definition
S
I
F
T
I
N
G
P
R
O
C
E
S
S
Empirical Mode Decomposition
Algorithmic definition
S
I
F
T
I
N
G
P
R
O
C
E
S
S
Empirical Mode Decomposition
Algorithmic definition
S
I
F
T
I
N
G
P
R
O
C
E
S
S
Empirical Mode Decomposition
Algorithmic definition
S
I
F
T
I
N
G
P
R
O
C
E
S
S
Empirical Mode Decomposition
Algorithmic definition
S
I
F
T
I
N
G
P
R
O
C
E
S
S
Empirical Mode Decomposition
Algorithmic definition
S
I
F
T
I
N
G
P
R
O
C
E
S
S
Empirical Mode Decomposition
Algorithmic definition
S
I
F
T
I
N
G
P
R
O
C
E
S
S
Empirical Mode Decomposition
Algorithmic definition
S
I
F
T
I
N
G
P
R
O
C
E
S
S
Empirical Mode Decomposition
Algorithmic definition
S
I
F
T
I
N
G
P
R
O
C
E
S
S
Empirical Mode Decomposition
Algorithmic definition
S
I
F
T
I
N
G
P
R
O
C
E
S
S
Empirical Mode Decomposition
Algorithmic definition
S
I
F
T
I
N
G
P
R
O
C
E
S
S
Empirical Mode Decomposition
Algorithmic definition
S
I
F
T
I
N
G
P
R
O
C
E
S
S
Empirical Mode Decomposition
Algorithmic definition
S
I
F
T
I
N
G
P
FirstIntrinsicM
odeFunction
R
O
C
E
S
S
Empirical Mode Decomposition
Algorithmic definition
S
I
F
T
I
N
G
P
R
O
C
E
S
S
Empirical Mode Decomposition
Algorithmic definition
S
I
F
T
I
N
G
P
R
O
C
E
S
S
Empirical Mode Decomposition
Algorithmic definition
S
I
F
T
I
N
G
P
R
O
C
E
S
S
Empirical Mode Decomposition
Algorithmic definition
S
I
F
T
I
N
G
P
R
O
C
E
S
S
Empirical Mode Decomposition
Algorithmic definition
S
I
F
T
I
N
G
P
R
O
C
E
S
S
Empirical Mode Decomposition
Algorithmic definition
S
I
F
T
I
N
G
P
R
O
C
E
S
S
Empirical Mode Decomposition
Algorithmic definition
S
I
F
T
I
N
G
P
R
O
C
E
S
S
Empirical Mode Decomposition
Algorithmic definition
S
I
F
T
I
N
G
P
S
econdIntrinsicM
odeFunction
R
O
C
E
S
S
Empirical Mode Decomposition
Algorithmic definition
S
I
F
T
I
N
G
P
R
O
C
E
S
S
Empirical Mode Decomposition
Algorithmic definition
S
I
F
T
I
N
G
P
R
O
C
E
S
S
Empirical Mode Decomposition
Algorithmic definition
S
I
F
T
I
N
G
P
R
O
C
E
S
S
Empirical Mode Decomposition
Algorithmic definition
S
I
F
T
I
N
G
P
R
O
C
E
S
S
Empirical Mode Decomposition
Algorithmic definition
S
I
F
T
I
N
G
P
R
O
C
E
S
S
Empirical Mode Decomposition
Algorithmic definition
S
I
F
T
I
N
G
P
R
O
C
E
S
S
Empirical Mode Decomposition
Algorithmic definition
S
I
F
T
I
N
G
P
ThirdIntrinsicM
odeFunction
R
O
C
E
S
S
Empirical Mode Decomposition
Algorithmic definition
S
I
F
T
I
N
G
P
R
O
C
E
S
S
Empirical Mode Decomposition
Algorithmic definition
R
esidu
S
I
F
T
I
N
G
P
R
O
C
E
S
S
Empirical Mode Decomposition
Algorithmic definition
S
ignal
1stIntrinsicM
odeFunction
2ndIntrinsicM
odeFunction
3rdIntrinsicM
odeFunction
R
esidu
Empirical Mode Decomposition
Intrinsic Mode Functions
Signal
time
Spectrum
frequency
Time-Frequency representation
Empirical Mode Decomposition
Intrinsic Mode Functions
S
ignal
Empirical Mode Decomposition
Intrinsic Mode Functions
frequency
time
s(t)
φ
STFT Source ISTFT
spectrograms
STFTM
PCA
Basis Individual
Basis vector selection sources
vector
clustering
ICA
T-F representation of mixture
Mixture
Audio
window 30ms
Overlap20ms
Proposed separation model
Mixture spectrogram X=∑ xi Source
Spectrograms
B Basis vectors A
Source
Spectrograms
Source re-synthesis
Separated subspaces
(spectrograms)
Mixture of speech & bip-bip sound
Inverse STFT
Separated speech
Append phase
information
j [φ ( n ,k )]
Si (n, k ) = xi .e Separated bip-bip sound
Experimental results
Separated signals with proposed algorithm
mixtures separated
Speech+bip-bip
Male+female speech
Application of DSP in audio analysis
ITD ILD
Source localization
1 | X l (ω ) |
τ I T D(ω ) = {φ l (ω ) − φ r (ω ) } κ ILD(ω ) = 20log
ω |
r X (ω ) |
ITD ITD
AtILD
low dominates
frequency At high frequency
to resolve the problem
Source localization (cont.)
ITD and ILD are quantized into 50 levels
Collection of T-F points corresponding to each
ITD/ILD quantized pair produces peaks
Separation by beamforming
1 H L (ω , t f ) H R (ω , t f )
{ ITD(ω , t f ), ILD(ω , t f )} = ∠ ,
ω H R (ω , t f ) H L (ω , t f )
H (ω , t ) 2
R f
ILDdB (ω , t f ) = 20 log 10 2
H L (ω , t f )
where tf is the time frame
ITD-ILD Space Localization
ITD and ILD are quantized into 50 levels
Collection of T-F points from HS
corresponding to each ITD/ILD quantized
pair produces peaks
Source Separation
Each peak region in the histogram
refers to a source of the binaural
mixtures
Construct a binary mask (nullifying T-F
points of interfering sources) Mi(ω ,t)
The HS of ith source is separated as
H i (ω , t ) = M i (ω , t ) H L (ω , t )
F1 (ω , t ) F2 (ω , t ) = 0; ∀ ω , t
where F1 and F2 are TFR of two signals
SIR (signal to interference ratio) is used as
the basis to measure DO
Source disjoint orthogonality
(cont.)
s1 s2 s3 Three audio sources
Microphone s1
s s2
Frequency TFR s3
Time
Source disjoint orthogonality
(cont.)
The SIR of the jth source is defined as:
X j (ω , t )
SIR j = ∑ ∑ ; Y j (ω , t ) ≠ 0
ω t Y j (ω , t )
N
Y j (ω , t ) = ∑ X i (ω , t )
i =1
i≠ j
∑
1
ADO = SIR j
N j =1
N number of sources
Experimental results
The three mixtures are defined as
m1{sp1(-40°, 0°), sp2(30°, 0°), ft(0°, 0°)},
m2{sp1(20°, 10°), sp2(0°, 10°), ft(-10°,10°)},
m3{sp1(40°, 20°), sp2(30°, 20°), ft(-20°, 20°)}
The separation efficiency is measured
as OSSR (original to separated signal
ratio) defined as: w
T
∑ 2
s original (t + i )
∑
1
OSSR = log 10 i =1
T w
t =1
∑
i =1
2
s separated (t + i )
Experimental results
(cont.)
The comparative separation efficiency
(OSSR) using HS and STFT :
Mixtures TFR OSSR of sp1 OSSR of sp2 OSSR of ft
m1 HS -0.0271 0.0213 0.0264
STFT 0.0621 -0.0721 -0.0531
m2 HS 0.0211 -0.0851 -0.0872
STFT 0.0824 0.1202 0.1182
m3 HS 0.0941 -0.0832 0.0225
STFT -0.1261 0.1092 -0.0821
Experimental results
This experiment also compares the
DO using HS and STFT as TFR
STFT is affected by many factors
window function and its length,
overlapping, FFT points
HS is independent of such factors
It is slightly affected by the number of
frequency bins used in TFR
Experimental results (cont.)
The ADO of HS and STFT as a function of
number of frequency bins (N=3):
Experimental results (cont.)
The ADO of only STFT is affected by the
factor of window overlapping (%)
Experimental results (cont.)
STFT includes more cross-spectral energy
terms
The TFR of two pure tones using HS and STFT
Experimental results (cont.)
Always HS has better DO for audio signals
DO depends on the resolution of TFR
STFT has to satisfy the inequality 1
∆ t∆ω ≥
2
The frequency resolution of HS is up to
Nyquist frequency
Its time resolution is up to sampling rate and
hence offers better resolution
Remarks
Three methods of
pitch estimation
V/UV
differentiation
Localization of three
sources using TD-LD
computed in T-F space
Speech with
white noise
Source
separation
by ASA
1.5m
Non-stationary
time-series
Different parts
of ECG signal
Questions/Suggestion
Please
Source Separation
Each peak region in the histogram refers to
a source of the stereo mixtures
Construct a binary mask (nullifying TF
points of interfering sources) Mi(n,t)
The HS of ith source is separated as
Questions/Suggestion
Please