Escolar Documentos
Profissional Documentos
Cultura Documentos
Speech Recognition
Simplest method -- Directly compare the input signal and the reference signals E = (Input speech) (Reference speeches) and find the optimum reference speech Difficulties: Speeches are different from time to time. Its impossible for a large database.
Speech Recognition
Input signal
Pattern Matching
Decision Making
Output
Reference signal
Reference Patterns
Basic Approach
Input Speech
Feature Vectors
Output Sentence
Speech Corpora
Acoustic Models
Lexicon
Language Model
Text Corpora
Lexical Knowledge-base
Grammar
Basic Approach
Input Speech
Feature Vectors
Output Sentence
Speech Corpora
Acoustic Models
Lexicon
Language Model
Text Corpora
Lexical Knowledge-base
Grammar
Block Diagram
Input Speech
Feature Vectors
Output Sentence
Speech Corpora
Acoustic Models
Lexicon
Language Model
Text Corpora
Lexical Knowledge-base
Grammar
End point detection (speech/silence discrimination) Noise reduction: clean signal = (input) (environmental noise) How to get environmental noise? Simplest way: Let noise be the first 10 frames
Windowing
Windowing
Frames
DFT
Pre-emphasis --
Mel-Filter Bank
SUM
Yt(2)
SUM
Yt(M)
SUM
Block Diagram
Input Speech
Feature Vectors
Output Sentence
Speech Corpora
Acoustic Models
Lexicon
Language Model
Text Corpora
Lexical Knowledge-base
Grammar
Unit selection for HMMs -phrases, words, syllables, phonemes Phoneme -- the minimum units of speech sound in a language which can serve to distinguish one word from another
Acoustic Modeling
Yt(1) Yt+1(1) Yt+2(1) Yt+3(1) Yt+4(1) Yt(2) Yt+1(2) Yt+2(2) Yt+3(2) Yt+4(2)
Yt(M)Yt+1(M)Yt+2(M)Yt+3(M)Yt+4(M)
Acoustic Modeling
11 22
a 12 a 13
a 23 a 24
states
o1 o2 o3
o4 o5 o6 o7
o 8
q1 q 2 q3
q4 q5 q6 q7
q 8
b 1( o )
b 2( o )
b 3( o )
observation probability
Block Diagram
Input Speech
Feature Vectors
Output Sentence
Speech Corpora
Acoustic Models
Lexicon
Language Model
Text Corpora
Lexical Knowledge-base
Grammar
Block Diagram
Input Speech
Feature Vectors
Output Sentence
Speech Corpora
Acoustic Models
Lexicon
Language Model
Text Corpora
Lexical Knowledge-base
Grammar
Language Modeling
N-gram -- Find the probability of a word sequence W = (w1, w2, w3, , wi, , wR) :
P(W ) P( w1 ) P( wi | w1 , w2 ,... wi 1 )
i 2
P(wi) P(wi | wi-1) P(wi | wi-2 , wi-1) P(wi | wi-3 , wi-2, wi-1)
Language Modeling
Create a phrase lexicon Make a word graph Class-based language modeling Keyword spotting Smoothing
Summary
t
Syllable Lattice Acoustic Models
P(w1)P(w2 |w1)......
Word Graph
w1
w2 P(w1)P(w2 |w1)......
w1
w2
Lexicon
Language Models
Small Example
Example input sequence this is speech Acoustic models (th-ih-s-ih-z-s-p-ih-ch) Lexicon (th-ih-s) this (ih-z) is (s-p-iy-ch) speech Language models (this) (is) (speech) P(this) P(is | this) P(speech | this is)