Escolar Documentos
Profissional Documentos
Cultura Documentos
Journal of Scientific & Industrial Research J SCI IND RES VOL 70 APRIL 2011
Vol. 70, April 2011, pp. 270-272
This study presents design and development of an automatic isolated digit recognition system (AIDRS) using Hidden
Markov Model (HMM). Basic model of the system is able to recognize spoken digit utterances in English. Implementation part
has been done by using Hidden Markov Toolkit (HTK). System is found successful and can identify spoken digit at 89.2%
recognition rate, which is well acceptable rate of accuracy for speech recognition.
Keywords: Automatic isolated digit recognition system (AIDRS), Hidden Markov Model (HMM), Speech recognition
Building Word
Network
Data
HMM MODELS
Preparation Recognizing
Test Data
Constructing
RECOGNITION Creating Dictionary
TEST Monophone
MODULE TRANSCRIPTION
SPEECH HMM Data
Preparation
Training Phase
of which 12 were as number of MFCC coefficients, and
Training
of which 0.97 were as pre-emphasis coefficients10 .
Wave
Data
Init ialize
HCopy
Re -estimate
HERest
Modify
HHEd
HMMs were trained on 1400 speech samples and were
Training
used for testing 500 samples from 20 speakers. In HMM
Wave
Data Feature Acoustic
training, it was tested on Baum Welch algorithm and
Extraction
HCopy
Configuration Model Viterbi algorithm. Number of states for each word is 7
Testing and was modeled using Left Right (LR) HMM topologies.
Wave
Data Testing Search
Compare with
answers Accuracy
HTK was used for designing and testing SR systems
Wave HVite
Data
HResult throughout all experiments. Baseline system was initially
designed as a phoneme level recognizer with three active
Construct
Rules and
Wordnet states; one Gaussian mixture per state, continuous,
HParse
Dictionary
left-to-right, and no skip HMM models. Training was done
Verification Phase
for a fixed number of iterations (up to 3 iterations).
Recognition rate of trained HMM is defined as11
Fig. 3 — HTK tools used for isolated digit recognition at Recognition Correctrecognition
Rate = x 100%
different phases Total number of testing samples for research digit
steps to pass through in order as recording data, building Number of correct and incorrect recognition was
word network, constructing a dictionary and recognizing found out (Table 1). and Table 2 shows Accuracy
test data. percentage for each digit was found as follows: 0, 89.36;
1, 80; 2, 90.91; 3, 89.58; 4, 89.28; 5, 94.64; 6, 91.67; 7,
For implementing HMM for present isolated digit 87.23; 8, 89.65; and 9, 87.5%. Table 2 shows digits that
recognition system, Hidden Markov Model Toolkit were picked in case of miss-recognition for all digits were:
(HTK)10 is used. HTK is a toolkit (Fig. 3) for building Among all, digit 1 produced lowest recognition rate,
HMMs. All functionality of HTK is built into library because digit 1 is confused with digit 7 and digit 5 for
modules and tools. most of the time. Digit 1 wrongly recognized as 7 for
15.56% of times and as 5 for 4.44% of times. This has
Results and Discussion probably occurred due to pronunciation of digit 1 is almost
In the experiment, database consists of 10 digits. similar to digit 7. Sound of one in digit 1 and “ven” in digit
System parameters were: sampling rate with a 16 bit 7 leads to similarity of the values of feature vectors for
sample resolution, 16 KHz; Hamming window duration each digit. The system tends to recognize the digit 1 as 7.
(step size, 10 ms), 32 ms; and MFCC coefficients, 22 as Therefore, digit 1 wrongly recognized as digit 7 for 15.56%
length of cepstral leftering and 26 filter bank channels, of times.
272 J SCI IND RES VOL 70 APRIL 2011
Recognized digit
% 1 2 3 4 5 6 7 8 9 0
1 80 0 0 0 4.44 0 15.56 0 0 0
2 0 90.91 0 5.45 0 0 0 0 3.64 0
3 0 4.16 89.58 0 0 4.16 0 0 2.1 0
Uttered digit
Table 2—Digits that were picked in case of miss-recognition for 3 Jurafsky D & Martin J H, Speech and Language Processing:
all digits An Introduction to Natural Language Processing,
Computational Linguistics, and Speech Recognition,
Digit Mostly confused with 2nd edn, 2007: http://www.cs.colorado.edu/~martin/slp2.html.
0 3,6,4 4 Douglas O’S, Interacting with computers by voice: Automatic
1 5,7 speech recognition and synthesis, Proc IEEE, 91 (2003)
2 4,9 1272-1305.
3 2,6,9 5 Rabiner L R, A Tutorial on Hidden Markov Models and
4 0,5 selected applications in speech recognition, Proc IEEE,
5 6 77 (1989) 257-286.
6 3,7 6 Dimov D & Azamanov I, Experimental specifics of using HMM
7 0,4 in isolated word Speech recognition, in Int Conf on Computer
8 4,6 Syst & Technol – CompSysTech (Varna, Bulgaria) 2005,
9 3,5,6 3A.17.1-17.9.
7 Felinek F, Statistical Methods for Speech Recognition
The system is relatively successful, as it can identify (MIT Press, Cambridge, Massachusetts, USA) 1997.
8 Juang B & Rabiner L, Hidden Markov Models for speech
spoken digit at an average rate of 89.2%, which is
recognition, Technometrics, 33 (1991) 251-272.
relatively high given that the size of present training corpus 9 Young S, A review of large-vocabulary continuous-speech
is rather small. Presently, system can be used only for recognition, IEEE Signal Process Mag, 13 (1996) 45-57.
isolated word single digit reorganization and it can be 10 Young S, Evermann G, Gales M, Hain T, Kershaw D et al, The
enhanced to perform user authentication based on speech HTK Book (for HTK Version. 3.4); Cambridge Univ Engg
Dept: http://htk.eng.cam.ac.uk/protdoc/ktkbook.pdf, 2006.
and also accommodate connected digits. The system can 11 Rosdi F & Ainon N, Isolated Malay speech recognition using
be enhanced to a larger vocabulary including alphabets Hidden Markov Models, in Proc for Int Conf on Comput &
and commonly used words. The system can be made Commun Engg (Kuala Lumpur) 2008, 721-725.
robust by using larger database for training. 12 Alotaibi Y A, Alghamdi M & Alotaiby F, Speech recognition
system of arabic digits based on a telephony arabic corpus, in
Proc 4th Int Conf, ICISP 2010 (Trois-Rivières, QC, Canada)
Conclusions 30 Jun - 2 Jul 2010.
A newly designed and developed SR system, 13 Muhammad G, Alotaibi Y A & Huda M N, Automatic speech
AIDRS, is able to recognize spoken digit utterances in recognition for bangia digits: in Proc 12th Int Conf on Comput
English. System is found successful and can identify & Inform Technol (ICCIT 2009) (Dhaka, Bangladesh) 21-23
Dec 2009.
spoken digit at 89.2% recognition rate, which is well 14 Kuria C & Balakrishnan K, Speech recognition of malayalam
acceptable rate of accuracy for speech recognition. numbers, in Int Symp on Innovations In Nat Comput
(INC -2009) (Cochin) 2009.
References 15 Huang X, Alex A & Hon H W, Spoken Language Processing;
1 Sproat R, Multilingual Text-to-Speech Synthesis: The Bell Labs A Guide to Theory, Algorithm and System Development
Approach, Chap 5 (Kluwer Academic Publishers, (Prentice Hall, Upper Saddle River, New Jersey) 2001.
Massachusetts) 1998. 16 Jurafsky D & Martin J H, Speech and Language Processing:
2 Markus F, Why is Speech Recognition Difficult? (Department An Introduction to Natural Language Processing,
of Computing Science, Chalmers University of Technology, Computational Linguistics, and Speech Recognition,
Sweden) 24 Feb 2003. 2nd edn, 2007: ttp://www.cs.colorado.edu/~martin/slp2.html.