Você está na página 1de 21

Speech Recognition

Speech Recognition

Simplest method -- Directly compare the input signal and the reference signals E = (Input speech) (Reference speeches) and find the optimum reference speech Difficulties: Speeches are different from time to time. Its impossible for a large database.

Speech Recognition

Solve as pattern recognition problem:


Feature Extraction
Feature vector sequence Feature Extraction

Input signal

Pattern Matching

Decision Making

Output

Reference signal

Reference Patterns

Basic Approach

A simplified block diagram -Front-end Signal Processing

Input Speech

Feature Vectors

Linguistic Decoding and Search Algorithm

Output Sentence

Speech Corpora

Acoustic Model Training

Acoustic Models

Lexicon

Language Model

Language Model Construction

Text Corpora

Lexical Knowledge-base

Grammar

Basic Approach

A simplified block diagram -Front-end Signal Processing

Input Speech

Feature Vectors

Linguistic Decoding and Search Algorithm

Output Sentence

Speech Corpora

Acoustic Model Training

Acoustic Models

Lexicon

Language Model

Language Model Construction

Text Corpora

Lexical Knowledge-base

Grammar

Block Diagram

Input Speech

Front-end Signal Processing

Feature Vectors

Linguistic Decoding and Search Algorithm

Output Sentence

Speech Corpora

Acoustic Model Training

Acoustic Models

Lexicon

Language Model

Language Model Construction

Text Corpora

Lexical Knowledge-base

Grammar

Front-end Signal Processing

End point detection (speech/silence discrimination) Noise reduction: clean signal = (input) (environmental noise) How to get environmental noise? Simplest way: Let noise be the first 10 frames

Front-end Signal Processing

Windowing

Windowing

Frames

DFT

Front-end Signal Processing


Decay in higher frequency

Pre-emphasis of spectrum at higher frequencies

Pre-emphasis --

Front-end Signal Processing


Spectrum

Mel-Filter Bank

Feature vector Yt(1)

SUM

Yt(2)

SUM

Yt(M)

SUM

Block Diagram

Input Speech

Front-end Signal Processing

Feature Vectors

Linguistic Decoding and Search Algorithm

Output Sentence

Speech Corpora

Acoustic Model Training

Acoustic Models

Lexicon

Language Model

Language Model Construction

Text Corpora

Lexical Knowledge-base

Grammar

Acoustic Modeling -- HMMs

Hidden Markov Models (HMMs)

x hidden states y observable outputs a transition probabilities b output probabilities

Acoustic Modeling -- Unit Selection

Unit selection for HMMs -phrases, words, syllables, phonemes Phoneme -- the minimum units of speech sound in a language which can serve to distinguish one word from another

Acoustic Modeling

Yt(1) Yt+1(1) Yt+2(1) Yt+3(1) Yt+4(1) Yt(2) Yt+1(2) Yt+2(2) Yt+3(2) Yt+4(2)

Observation Sequence (Feature vectors)

Yt(M)Yt+1(M)Yt+2(M)Yt+3(M)Yt+4(M)

Acoustic Modeling
11 22

a 12 a 13

a 23 a 24

states

o1 o2 o3

o4 o5 o6 o7

o 8

observation sequence state sequence

q1 q 2 q3

q4 q5 q6 q7

q 8

b 1( o )

b 2( o )

b 3( o )

observation probability

Find the probability of observing a sequence

Block Diagram

Input Speech

Front-end Signal Processing

Feature Vectors

Linguistic Decoding and Search Algorithm

Output Sentence

Speech Corpora

Acoustic Model Training

Acoustic Models

Lexicon

Language Model

Language Model Construction

Text Corpora

Lexical Knowledge-base

Grammar

Block Diagram

Input Speech

Front-end Signal Processing

Feature Vectors

Linguistic Decoding and Search Algorithm

Output Sentence

Speech Corpora

Acoustic Model Training

Acoustic Models

Lexicon

Language Model

Language Model Construction

Text Corpora

Lexical Knowledge-base

Grammar

Language Modeling

N-gram -- Find the probability of a word sequence W = (w1, w2, w3, , wi, , wR) :

P(W ) P( w1 ) P( wi | w1 , w2 ,... wi 1 )
i 2

N=1 : unigram N=2 : bigram N=3 : tri-gram N=4 : four-gram

P(wi) P(wi | wi-1) P(wi | wi-2 , wi-1) P(wi | wi-3 , wi-2, wi-1)

Language Modeling

Create a phrase lexicon Make a word graph Class-based language modeling Keyword spotting Smoothing

Summary
t
Syllable Lattice Acoustic Models

P(w1)P(w2 |w1)......

Word Graph

w1

w2 P(w1)P(w2 |w1)......

w1

w2

Lexicon

Language Models

Small Example

Example input sequence this is speech Acoustic models (th-ih-s-ih-z-s-p-ih-ch) Lexicon (th-ih-s) this (ih-z) is (s-p-iy-ch) speech Language models (this) (is) (speech) P(this) P(is | this) P(speech | this is)

Você também pode gostar