Você está na página 1de 4

Comparison of Particle Swarm

Optimization and Genetic Algorithm for HMM Training

Fengqin Yang Changhai Zhang Tieli Sun

College ofComputer Science & College ofComputer College ofComputer
Technology, Jilin University, Changchun, Science & Technology, Science, Northeast Normal
Jilin, 130012, China Jilin University, University, Changchun,
College ofComputer Science, Northeast Changchun, Jilin, Jilin, 130117, China
Normal University, Changchun, Jilin, 130012, China
130117, China

Abstract population based search algorithm where each

individual is referred to as a particle and represents a
Hidden Markov Model (HMM) is the dominant candidate solution[I]. Each particle in PSO flies
technology in speech recognition. The problem of through the search space with an adaptable velocity
optimi=ing model parameters is ofgreat interest to the that is dynamically modified according to its own
researchers in this area. The Baum-Welch (BW) flying experience and also to the flying experience of
algorithm is a popular estimation method due to its the other particles. PSO has been applied to HMM
reliability and efficiency. However, it is easily trapped training in isolated word recognition[2]. Genetic
in local optimum. Recently, Genetic Algorithm (GA) Algorithm (GA) maintains a population of solutions
and Particle Swarm Gptimi=ation (PSG) have attracted coded as chromosomes. Using evolutionary inspired
considerable attention among various modern operators such as fitness, crossover and mutation, the
heuristic optimi=ation techniques. Since the two best solutions are modified and passed on to the next
approaches are supposed to find a solution to a given generation. In this way, the population as a whole
objective function but employ different strategies and moves towards better solutions, ideally to the global
computational effort, it is appropriate to compare their optimum. GA has been applied to HMM training in
performance. This paper presents the application and isolated word recognition[3][4].
performance comparison of PSG and GA for The goal of this article is to compare PSO and GA
continuous HMM optimi=ation in continuous speech for continuous HMM optimization in continuous
recognition. The experimental results demonstrate that speech recognition. The HMMs optimized by both
PSG is superior to GA in respect of the recognition optimization techniques in terms of recognition
performance. capability are compared.

1. Introduction 2. Hidden Markov Model and HMM

As a kind of statistical method, the technique of
Hidden Markov Model (HMM) is widely used for The formal specifications of a continuous HMM are
speech recognition. The training of HMM is completely characterized by the following model
computationally hard and there is no known exact parameters[3] [5].
method that can guarantee optimal training within 1. N , the number of states in the model.
reasonable computing time. The Baum-Welch (BW) 2. M , the number of mixtures in the random
algorithm is a powerful training method. It is fast, but function.
it is frequent convergence to local optimum resulting 3. A , the state transition probability distribution.
from local character of searching. A = {aij }, where aij is the transition probability from
Another possibility is to use some stochastic search
state i to state j . That is
methods. Particle Swarm Optimization (PSO) is a

978-1-4244-2175-6/08/$25.00 ©2008 IEEE

au = P[ qt+ 1 = j I qt = i], 1 ~ i, j ~ N ( 1) 3.1. Encoding mechanism
where qt is the state at time t and au needs to satisfy
In PSG and GA for HMM embedded training, an
the following constraints: HMM is encoded into a string of real numbers. The
au ~ 0, 1 ~ i,j ~ N (2a) real number string has the form shown in Fig. I , where
constant 'V dim' is the size of the observation vector,
L au = 1,
1~ i ~ N (2b) 'p' is the number of phones and other variables are
corresponding to the elements of HMM described in
4. B , the output probability distribution. section 2, mVi,j,k is the kth scalar in the mean vector of
B = {bj(O)} , bj(O) is the random function associated the jth mixture of the random function in state i and
with state j . The general representation of bj (0) is a CVi,j,k,1 is the element in the kth row and the lth column
finite mixture of Gaussian distributions of the form of the covariance matrix of the jth mixture of the
random function in state i .

bj(O) = LCjmG[O,l1jm,Vjm]' l~j~N (3) a(1)

Da(1) IN . a(1)2,1 I
... (1)
m=1 (1)
ell D
... elM (1) I D
mv(1) 111 ... mv(1) 11Vdim
mv 121 L.:...:....:... mv(I) 1MVdim I CV(I) 1111 I
where 0 is the observation vector, cjm is the mixture (I) I I I
coefficient for the mth mixture in state j and G is the
cv(1) 111Vdim I CV(I) 1121 I I CV(I) 1MVdim Vdim
L..:....:....:.I I
Gaussian distribution with mean vector I1j m and (1) I (1)
covariance matrix V jm for the mth mixture component
in state j . The mixture coefficient c jm satisfies the ~===;::====;::::::;::;::::::::;=:::::::;:;:::====;=::=:;::::==::::;:::;::::==I
a(p) 1 D a(P\N I a(P)21 D
1, a(P)NN I
stochastic constraints: (p)
ell D (P)
... elM mv III I (p) D ... mv 11Vdim I
cjm ~O, l~j~N,I~m~M (4a) (p) II (p)
mv 121 L.:...:....:... mv 1MVdim
I (p) II
CV 1111 L:..:..:.J
LCjm =1, l~j~N (4b) CV(p)111Vdim I CV(p)1121 c==J CV(P)1MVdim Vdim I
C 21 (p) I (p) I CV
NM Vdim Vdim I
5. Jr , the initial state distribution, Jr ={Jr
i} where Fig.l. The representation of an HMM
Jri = P[ q 1 = i], 1~ i ~ N (5)
We use the compact notation A = (A, B, Jr) to 3.2. Fitness function
indicate the complete parameter set ofthe model.
The fitness function is defined as:
Given an observation sequence 0 = 01' O2 ,,,,, Or , R
the parameters of HMM can be estimated to maximize I(Ai ) = Llog(p(or I Ai)) (6)
the model likelihood probability P[ 0 I A] . This is r=1

where Ai is the ith HMM, R is the number of

achieved using the BW algorithm. However, the BW
algorithm is liable to become trapped at a local observation sequences, P(Or I A.) may be computed
optimum. For more details on the BW algorithm, see 1

e.g.[5]. by the forward or backward probabilities[5].

3. Experimental procedure 3.3. The velocity and position update rule of

In continuous speech recognition, each HMM
generally corresponds to a sub-word unit such as a At each iteration, the particle movement is
phone. However, the training data for continuous computed as follows:
speech must consist of continuous utterances and, in Xi (t + 1) f- Xi (t) + Vi (I) (7)
general, the boundaries dividing the segments of Vi (t + 1) f- OJVi (t)
speech corresponding to each underlying sub-word +c1rand1(pbesti (t) - Xi (t))
model in the sequence will not be known. So we can
use embedded training[6], which trains several models +c2 rand2 (gbest(t) - Xi (t)) (8)
from a unique source of data by updating all models In Eqs.(7) and (8), Xi (t) is the position of particle i
simultaneously and does not need label boundaries at time t , Vi (I) is the velocity of particle i at time t ,
when appropriate initial models are available.
pbesti (t) is the best position found by particle i itself
so far, gbest(t) is the best position found by the whole versus iteration count and Fig.5 shows the plots of the
swarm so far, (j) is an inertia weight scaling the evaluation criteria on the test set versus iteration count.
From Fig.4 and Fig.5, it is apparent that the
previous time step velocity, cl and c2 are two
PSOBW algorithm is significantly better than the BW
acceleration coefficients that scale the influence of the algorithm and the GABW algorithm with respect to the
best personal position of the particle (pbesti (t)) and Sentence Correct rate, the Word Accuracy rate and the
the best global position (gbest(t)), and randl and Word Correct rate at each iteration.
rand2 are random variables between 0 and 1. Table 2. Recognition performance of the
algorithms on the training set.
Algorithm Sentence Word Word
3.4. The hybrid training algorithms Correct Correct Accuracy
BW 40.56 90.70 75.24
To improve the convergence speed of the PSo- PSOBW 62.22 92.97 87.03
based and the GA-based HMM training, we combine GABW 36.11 87.78 72.22
the PSO algorithm with the BW algorithm to form a
hybrid algorithm (PSOBW) and combine the GA Table 3. Recognition performance of the
algorithm with the BW algorithm to form a hybrid algorithms on the test set.
algorithm (GABW). The hybrid algorithms apply the Algorithm Sentence Word Word
BW algorithm with eight iterations to the individuals in Correct Correct Accuracy
the population every ten generations such that the BW 27.27 88.37 67.44
fitness value of each individual is improved. PSOBW 60.61 91.86 84.30
GABW 36.36 85.47 68.02
4. Experimental results
The training set is composed of 180 sentences, 60
which involve continuous digital strings, extracted Q)
from the training set of the Census (AN4) database. ~
The test set is composed of 33 sentences, which C,)
involve continuous digital strings, extracted from the
test set of the Census (AN4) database. 39-dimension Q) :30
Mel-frequency cepstral coefficients (MFCC), C,)
consisting of 12 cepstral coefficients plus Oth cepstral +-' 20

parameters and their first and second derivatives are Cfl

used. The 3-state left-to-right model with 1 mixture
component in the random function is used as the type 0
of the HMM. Table 1 shows the parameters used for 2 3 4 5 6 7 8 9 10
PSO and GA optimization techniques. iteration
Table 1. Parameters used for PSO and GA.
PSO Parameter GA Parameter Fig.4(a).The Sentence Correct rate on the training set.
Swarm size 18 Population size 18
w 0.7298 Crossover rate 0.95 Q)
Cl 1.49618 Mutation rate 0.2 +-'
ro 80

C2 1.49618 +-' -+-GABWl

60 ......... BW '
We aim to test the performance of the algorithms in ~
0 40 ........... PS(JIJW
terms of the HTK evaluation criteria[6]. The BW U

algorithm terminates when the growth rate of fitness ""0

between two successive iterations is less than or equal ::=:
to 0.015%. In our experiment, the number of iterations 2 3 4 5 6 7 8 9 10
of the BW algorithm is 10 when it terminates. The
PSOBW algorithm and the GABW algorithm terminate
after 10 iterations, too. Experimental results are shown
Fig.4(b).The Word Correct rate on the training set.
in Table 2 and Table 3. To illustrate the changes of the
evaluation criteria of the algorithms, Fig.4 shows the
plots of the evaluation criteria on the training set
100 5. Conclusions
i-I RO
u A comparison between PSG and GA in HMM
ro 60
::::l optimization is done in this work. It is found that the
u 40 hybrid algorithm based PSG and BW is superior to the

20 BW algorithm and the hybrid algorithm based GA and
0 -
BW in terms of the recognition ability.
2 :3 4 5 6 7 H 9 10
iteration Acknowledgements

Fig.4(c).The Word Accuracy rate on the training set. I would like to express my thanks and deepest
appreciation to Prof. Jigui Sun. This work is partially
70 supported by Science Foundation for Young Teachers
of Northeast Normal University (No.20061006),
60 Specialized Research Fund for the Doctoral Program of
ro Higher Education (No.20050183065, No.
u 20070183057).
J) 30 References
cJ) [1] S. Panda, N.P. Padhy. Comparison of particle swarm
10 optimization and genetic algorithm for FACTS-based
controller design. Appl. Soft Comput, 2007,
0 doi: 10.1016/j.asoc.2007.1 0.009.
2 :3 4 5 6 7 R 9 10 [2]L.Xue, J.Yin, ZJi, LJiang. A Particle Swarm
Optimization for Hidden for Hidden Markov Model Training.
In Proceedings ofthe 8th International Conference on Signal
Processing, pp.16-20, 2006.
Fig.5(a).The Sentence Correct rate on the test set. [3]S. Kwong, C. W. Chau, K. F. Man, K. S. Tang.
Optimisation of HMM topology and its model parameters by
100 genetic algorithms. Pattern Recognition, 34(2), pp. 509-522,
ro 2001.
i-I RO
--+-GABW [4]C. W. Chau , S. Kwong, C. K. Diu and W. R. Fahrner.
CJ 60 ___ BW Optimization of HMM by a genetic algorithm. In
0 40 """'-PSOBW
Proceedings of the 1997 IEEE International Conference on
Acoustics, Speech, and Signal Processing, pp. 1727-1730,
i-I 20 1997.
0 [5] L. R. Rabiner. A tutorial on hidden Markov models and
2 :3 4 5 6 7 8 9 10 selected applications in speech recognition. In Proceedings
ofthe IEEE, Vol. 77(2), pp. 257-285, 1989.
[6] S. Young, J. Jansen, J.Odell, D. Ollason and P.Woodland.
The HTK Book, version 3.3.Distributed with the HTK toolkit.
Fig.5(b).The Word Correct rate on the test set.

2 3 4 5 6 8 9 10

Fig.5(c).The Word Accuracy rate on the test set.