Time Modification of Speech Signals

SOLAFS (Synchronized Overlap-
Add, Fixed Synthesis)

Introduction
Overview of the methods
Basic Idea
SOLAFS Method
Matlab Code
The results
Conclusion

There are a large number of applications
to modify the time-scale of speech, music
or other acoustic material.
Without modifying the pitch.
To speed up or slow down the speech
No Donald Duck or Minnie Mouse
effects.

TSM-Time scale modification refer to
changing the reproduction rate of a
signal.
Two primary operation involved
- time-scale expansion -slow down
- time-scale compression -speed up

original
expansion
compression
Time- scale modification utilizes three
basic methods:
- frequency domain processing methods
- analysis/synthesis methods
- time-domain processing methods
SOLAFS is a time-domain processing
method.
SOLAFS is an improvement of the prior SOLA
method( Synchronized overlap-add).
SOLA consists of
-shifting the beginning of a new speech
segment over the end of the preceding
segment to find the point of the highest cross-
correlation.
-when found it, the frame are overlapped and
average together.

There are 4 parameters
Window length (W) - smallest unit of input signal
that is manipulated by the method
Analysis shift (S
a
) - inter-frame interval between
successive search ranges for analysis windows
along the input signal
Synthesis shift (S
s
) - inter-frame interval between
successive analysis windows along the output
signal
Shift search interval (k
max
) - the duration of the
interval over which an analysis window may be
shifted for purpose of aligning it with the region of
the output signal it will overlap.
The four parameters used in the SOLAFS

The analysis windows are chosen as follows:

where
m = a window index, i.e. it refers to the m
th
window
n = a sample index in an input buffer for the input
signal, which buffer is W samples long
k
m
= the number of samples of shift for the mth
window
x
m
[n] = the nth sample in the mth analysis window
) 1 ......(
1 ,..., 0
0
] [
] [
Otherwise
W n for n k mS x
n x
m a m
m

The analysis windows are then used to form the output

signal y[i] recursively in accordance to the following:

where:

W
ov
= W S
s
is the number of points in the overlap region

b[n] = an overlap-add weighting function which is referred
to as a fading factor an averaging function, a
linear fade function, and so forth.
) 2 ...(
1 ,...,
1 ,..., 0
] [
] [ ]) [ 1 ( ] [ ] [
] [

W W n for
W n for
n x
n x n b n mS y n b
n mS y
ov
ov
m
m s
s
Calculation for k
m
k
m
is an optimal shift that is determined
by the normalized cross-correlation between x
and y in the overlap region.

where
k
max
is the maximum allowable shift from
the initial string position of the analysis window

) 3 ].......( [ max
max
0
k R k
m
xy
k k
m

K
m
can be often predicted without computation
of the similarity.
The m
th
shift, k
m
, should be determined by:

] [ max
) (
max
0
1
k R
S S k t
k
m
xy
k k
a s m m
m
if
max
k t o
m

otherwise
There are 7 steps as follows;

1. As an initialization step , take W samples
from the input signal, which samples are
stored in an input signal buffer, and place
them in an output sample buffer for the
output signal.

2. find the start of the first analysis
window mS
a
.

3. Next, find the maximum similarity
between the first W
ov
samples at the start
of the analysis window and at the end of
the output signal by computing the cross-
correlation between the samples from the
start of the analysis window, and the
samples from the end of the output
window.

4. We shift the start of the analysis
window by one or two and repeat step 3.

5. Steps 3 and 4 are repeated until we
have shifted the analysis window by the
maximum amount of k
max
that is allowed.
6. If the maximum cross-correlation
occurs for a certain shift of the analysis
window, overlap-add the last W
ov
samples of the output signal and the
first W
ov
samples of the shifted analysis
window, and transfer W W
ov
further
samples into the output buffer.
7. Steps 2 7 are repeated by choosing
the next analysis window, until the input
signal reaches its end.
The smallest useful synthesis shift is
S
s
= W
ov
The smallest useful window length is
W = 2W
ov
K
max
= 2W

The result can be accept with the proper
choice of the parameters.
The SOLAFS algorithm provides time-
scale modified speech over the wide
range of compression and expansion.
It requires significantly less computation
than many other methods.

From the MATLAB code, it requires a lot
of buffer to hold the sample . It will cause
difficulties in real-time applications.
In real-time applications, they have to
process everything as fast as possible. If
the data is stored in compressed form or
the storage units are slow, they will be
difficult to process.
D.J Hejna. Real-time time-scale
modification of speech via the
synchronized overlap-add algorithm.
Masters thesis, M.I.T.,1990
Don Hejna and Bruce R. Musicus. The
SOLAFS Time-Scale Modification
Algorithm. Research.1991

Time Modification of Speech Signals

Enviado por

Dados do documento

Descrição original:

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Time Modification of Speech Signals

Enviado por

Direitos autorais:

Formatos disponíveis

SOLAFS (Synchronized Overlap-

Add, Fixed Synthesis)

The analysis windows are then used to form the output

Você também pode gostar