Você está na página 1de 3

How to use the HMM toolbox

HMMs with discrete outputs


Maximum likelihood parameter estimation using EM (Baum Welch)

The script dhmm_em_demo.m gives an example of how to learn an HMM with discrete
outputs. Let there be Q=2 states and O=3 output symbols. We create random stochastic
matrices as follows.
O = 3;
Q = 2;
prior0 = normalise(rand(Q,1));
transmat0 = mk_stochastic(rand(Q,Q));
obsmat0 = mk_stochastic(rand(Q,O));
Now we sample nex=20 sequences of length T=10 each from this model, to use as training
data.
T=10;
nex=20;
data = dhmm_sample(prior0, transmat0, obsmat0, nex, T);
Here data is 20x10. Now we make a random guess as to what the parameters are,
prior1 = normalise(rand(Q,1));
transmat1 = mk_stochastic(rand(Q,Q));
obsmat1 = mk_stochastic(rand(Q,O));
and improve our guess using 5 iterations of EM...
[LL, prior2, transmat2, obsmat2] = dhmm_em(data, prior1, transmat1,
obsmat1, 'max_iter', 5);
LL(t) is the log-likelihood after iteration t, so we can plot the learning curve.

Sequence classification

To evaluate the log-likelihood of a trained model given test data, proceed as follows:
loglik = dhmm_logprob(data, prior2, transmat2, obsmat2)
Note: the discrete alphabet is assumed to be {1, 2, ..., O}, where O = size(obsmat, 2). Hence
data cannot contain any 0s.

To classify a sequence into one of k classes, train up k HMMs, one per class, and then
compute the log-likelihood that each model gives to the test sequence; if the i'th model is the
most likely, then declare the class of the sequence to be class i.

Computing the most probable sequence (Viterbi)

First you need to evaluate B(i,t) = P(y_t | Q_t=i) for all t,i:
B = multinomial_prob(data, obsmat);
Then you can use
[path] = viterbi_path(prior, transmat, B)

HMMs with mixture of Gaussians outputs


Maximum likelihood parameter estimation using EM (Baum Welch)
Let us generate nex=50 vector-valued sequences of length T=50; each vector has size O=2.
O = 2;
T = 50;
nex = 50;
data = randn(O,T,nex);
Now let use fit a mixture of M=2 Gaussians for each of the Q=2 states using K-means.
M = 2;
Q = 2;
left_right = 0;

prior0 = normalise(rand(Q,1));
transmat0 = mk_stochastic(rand(Q,Q));

[mu0, Sigma0] = mixgauss_init(Q*M, reshape(data, [O T*nex]), cov_type);


mu0 = reshape(mu0, [O Q M]);
Sigma0 = reshape(Sigma0, [O O Q M]);
mixmat0 = mk_stochastic(rand(Q,M));

Finally, let us improve these parameter estimates using EM.


[LL, prior1, transmat1, mu1, Sigma1, mixmat1] = ...
mhmm_em(data, prior0, transmat0, mu0, Sigma0, mixmat0, 'max_iter', 2);
Since EM only finds a local optimum, good initialisation is crucial. The initialisation
procedure illustrated above is very crude, and is probably not adequate for real
applications... Click here for a real-world example of EM with mixtures of Gaussians using
BNT.

What to do if the log-likelihood becomes positive?

It is possible for p(x) > 1 if p(x) is a probability density function, such as a Gaussian. (The
requirements for a density are p(x)>0 for all x and int_x p(x) = 1.) In practice this usually
means your covariance is shrinking to a point/delta function, so you should increase the width
of the prior (see below), or constrain the matrix to be spherical or diagonal, or clamp it to a
large fixed constant (not learn it at all). It is also very helpful to ensure the components of the
data vectors have small and comparable magnitudes (use e.g., KPMstats/standardize).

This is a well-known pathology of maximum likelihood estimation for Gaussian mixtures: the
global optimum may place one mixture component on a single data point, and give it 0
covariance, and hence infinite likelihood. One usually relies on the fact that EM cannot find
the global optimum to avoid such pathologies.

What to do if the log-likelihood decreases during EM?

Since I implicitly add a prior to every covariance matrix (see below), what increases is loglik
+ log(prior), but what I print is just loglik, which may occasionally decrease. This suggests
that one of your mixture components is not getting enough data. Try a better initialization or
fewer clusters (states).

What to do if the covariance matrix becomes singular?

Estimates of the covariance matrix often become singular if you have too little data, or if too
few points are assigned to a cluster center due to a bad initialization of the means. In this case,
you should constrain the covariance to be spherical or diagonal, or adjust the prior (see
below), or try a better initialization.
How do I add a prior to the covariance matrix?

Buried inside of KPMstats/mixgauss_Mstep you will see that cov_prior is initialized to


0.01*I. This is added to the maximum likelihood estimate after every M step. To change this,
you will need to modify the mhmm_em function so it calls mixgauss_Mstep with a different
value.

Sequence classification

To classify a sequence (e.g., of speech) into one of k classes (e.g., the digits 0-9), proceed as
in the DHMM case above, but use the following procedure to compute likelihood:
loglik = mhmm_logprob(data, prior, transmat, mu, Sigma, mixmat);

Computing the most probable sequence (Viterbi)

First you need to evaluate B(t,i) = P(y_t | Q_t=i) for all t,i:
B = mixgauss_prob(data(:,:,ex), mu, Sigma, mixmat);
where data(:,:,ex) is OxT where O is the size of the observation vector. Finally, use
[path] = viterbi_path(prior, transmat, B);

HMMs with Gaussian outputs


This is just like the mixture of Gaussians case, except we have M=1, and hence there is no
mixing matrix.

Online EM for discrete HMMs/ POMDPs


For some applications (e.g., reinforcement learning/ adaptive control), it is necessary to learn
a model online. The script dhmm_em_online_demo gives an example of how to do this.

Você também pode gostar