Escolar Documentos
Profissional Documentos
Cultura Documentos
Benefits of HTK
Support a variety of different input formats Support different features Support almost all common speech recognition technologies
HTK can support a variety of different formats ex : pcm, wav, , ALIEN(unknown), etc. Feature extraction:
Very free HMM definition Training Viterbi (segmentation) Forward/Backward (Baun-Welch) Single model re-estimation (change feature)
3
HMM system refinement Context-dependent model Parameter tying/clustering Regression class tree (MLLR) Language Word grammar and network Bigram language model Decoding
HMM adaptation
Mean/variance
HTK procedures
Data/Setting preparation Define Acoustic units (phone table) Define Dictionary (word) Define grammar/network Collect speech database Generate transcription Feature Extraction Set configuration file for MFCC feature extraction Prepare Script files (corpus file) Define HMMs structure (prototype) Training HMM models Prepare Script files (corpus file) Set configuration file for training, recognition, ,etc. Flat start (uniform segmentation) Viterbi search (forced alignment : segmentation ) Recognition/Performance Evaluation Viterbi search
6
HMM/Data Setting
...
10
11
12
Database Preparation
Transcribe the collected speech database Corpus files (training/test set) Script files
13
Word/Phone-Level Transcriptions
14
EX
IS sil sil DE sp
15
Feature Extraction
HCOPY : Data Copy (with format changing)
16
Script files
codetr.scp
source destination
17
18
HMM Configuration
Config File (command-level) Command C config_file User Defaults > export HCONFIG=my_HTK_config Built-in Defaults ref Chap 18 in HTK manual
19
20
21
22
Training Procedure
Model Initialization
Flat start (unknown segmentation uniform segmentation) Viterbi search (given segmentation) Forward/backward only in word level
Mixture splitting
Model Refinement
23
Training Corpus
Mat4500_train.scp Mat4500_train_phones.mlf
25
Flat start
Viterbi search
Forward/Backward
26
27
28
29
30
Mixture Splitting
31
MU2.hed
32
Recognition/Evaluation Procedure
Recognition
Evaluation
33
Test Corpus
Mat4500_test.scp
Mat4500_test.mlf
34
35
36
Force Alignment
Viterbi decoding
HVite using option -a You can get some statistics of the HMM segmentation Useful for mixture number determined
37
MLLR
In training phase generate the states occupation statistics % HERest s HHed RN models //ReName hmmid LS stats //loads states occupation statistics RC 32 rtree //Regression class = 32 or RC 32 rtree {sil.state[2-4].mix}
38
force alignment of adaptation data %Hvite -a -I adapWords.mlf -m . Find global MLLR %HEAdapt C -g -K global.tmf -I adapPhone.mlf . *.tmf : transform model file Find MLLR regression Tree] %HEAdapt C -J global.tmf K rc.tmf -I adapPhone.mlf Recognition %HVite -J rc.tmf .
39
MAP adaptation
40
Further topics
Model/state tying (HMM definition) Context-dependent model Fast training/search (Beam search) Insertion/Deletion problem Duration constraint word transition penalty Word Lattice output
41
HCompV
Typical arguments HCompV C xxx f 0.01 m S *.scp M output_dir hmm -m : update mean -f f : set varFloor to f*global variance in hmm macro ~o ~v varFloor1 <Variance> 38 ..
42
HERest
Typical arguments HERest C xxx I *.mlf t 250.0 150.0 1000.0 -S *.scp H hmm_macros H hmm_defs M output_dir hmmlist -t f [i l] : set the pruning threshold to f f f+i until f=l -T tracing option octal number, command dependent
HVite
Typical arguments HVite H hmm_macros H hmm_defs S *.scp i output_mlf w wdnet p 0.0 s 5.0 t 250 dict tiedlist -t f [i l] : set the pruning threshold to f f f+i until f=l -m : show model boundaries -a : force alignment, -I input.mlf -p, -s : word insertion penalty, weight for grammar score
44
HResult
Typical arguments HResult I *.mlf hmmlist answer.mlf -n : use NIST -e s t : label t is made equivalent to s
45
HInit
Typical arguments HInit S *.scp M hmm_macro H hmm_defs model Typical arguments HRest S *.scp M hmm_macro H hmm_defs model Use wavesufer.
HRest
HSLab
46