Escolar Documentos
Profissional Documentos
Cultura Documentos
The script dhmm_em_demo.m gives an example of how to learn an HMM with discrete
outputs. Let there be Q=2 states and O=3 output symbols. We create random stochastic
matrices as follows.
O = 3;
Q = 2;
prior0 = normalise(rand(Q,1));
transmat0 = mk_stochastic(rand(Q,Q));
obsmat0 = mk_stochastic(rand(Q,O));
Now we sample nex=20 sequences of length T=10 each from this model, to use as training
data.
T=10;
nex=20;
data = dhmm_sample(prior0, transmat0, obsmat0, nex, T);
Here data is 20x10. Now we make a random guess as to what the parameters are,
prior1 = normalise(rand(Q,1));
transmat1 = mk_stochastic(rand(Q,Q));
obsmat1 = mk_stochastic(rand(Q,O));
and improve our guess using 5 iterations of EM...
[LL, prior2, transmat2, obsmat2] = dhmm_em(data, prior1, transmat1,
obsmat1, 'max_iter', 5);
LL(t) is the log-likelihood after iteration t, so we can plot the learning curve.
Sequence classification
To evaluate the log-likelihood of a trained model given test data, proceed as follows:
loglik = dhmm_logprob(data, prior2, transmat2, obsmat2)
Note: the discrete alphabet is assumed to be {1, 2, ..., O}, where O = size(obsmat, 2). Hence
data cannot contain any 0s.
To classify a sequence into one of k classes, train up k HMMs, one per class, and then
compute the log-likelihood that each model gives to the test sequence; if the i'th model is the
most likely, then declare the class of the sequence to be class i.
First you need to evaluate B(i,t) = P(y_t | Q_t=i) for all t,i:
B = multinomial_prob(data, obsmat);
Then you can use
[path] = viterbi_path(prior, transmat, B)
prior0 = normalise(rand(Q,1));
transmat0 = mk_stochastic(rand(Q,Q));
It is possible for p(x) > 1 if p(x) is a probability density function, such as a Gaussian. (The
requirements for a density are p(x)>0 for all x and int_x p(x) = 1.) In practice this usually
means your covariance is shrinking to a point/delta function, so you should increase the width
of the prior (see below), or constrain the matrix to be spherical or diagonal, or clamp it to a
large fixed constant (not learn it at all). It is also very helpful to ensure the components of the
data vectors have small and comparable magnitudes (use e.g., KPMstats/standardize).
This is a well-known pathology of maximum likelihood estimation for Gaussian mixtures: the
global optimum may place one mixture component on a single data point, and give it 0
covariance, and hence infinite likelihood. One usually relies on the fact that EM cannot find
the global optimum to avoid such pathologies.
Since I implicitly add a prior to every covariance matrix (see below), what increases is loglik
+ log(prior), but what I print is just loglik, which may occasionally decrease. This suggests
that one of your mixture components is not getting enough data. Try a better initialization or
fewer clusters (states).
Estimates of the covariance matrix often become singular if you have too little data, or if too
few points are assigned to a cluster center due to a bad initialization of the means. In this case,
you should constrain the covariance to be spherical or diagonal, or adjust the prior (see
below), or try a better initialization.
How do I add a prior to the covariance matrix?
Sequence classification
To classify a sequence (e.g., of speech) into one of k classes (e.g., the digits 0-9), proceed as
in the DHMM case above, but use the following procedure to compute likelihood:
loglik = mhmm_logprob(data, prior, transmat, mu, Sigma, mixmat);
First you need to evaluate B(t,i) = P(y_t | Q_t=i) for all t,i:
B = mixgauss_prob(data(:,:,ex), mu, Sigma, mixmat);
where data(:,:,ex) is OxT where O is the size of the observation vector. Finally, use
[path] = viterbi_path(prior, transmat, B);