Você está na página 1de 16

Markov and Hidden Markov Models

Venu Madhav Reddy Amudala


3 rd year ECE, SASTRA
avmreddy@gmail.com

Kota Praveen Kumar


3 rd year ECE, SASTRA
praveen16104@yahoo.co.in

Abstract
Pattern recognition is of interest in several fields, and Markov models offer a particularly
powerful technique for solving such problems. Hidden Markov models (HMMs) go
a step further, and can paradoxically sometimes be used to find patterns even when it is
not entirely clear as to what one is looking for. In this article we introduce the basic
ideas behind these models and illustrate them through applications to examples. We
construct a Markov model for distinguishing between two languages which use a
common script (as a prototypical problem, with straightforward applications to
analyzing biological sequences, say), and a Hidden Markov model for detecting whether
or not a fair coin was used in a coin toss (as a means of uncovering match fixing, for
instance).

1. Introduction
The use of Markov models and HMMs in pattern recognition is best described
through examples and applications, and in this article, we will try to show how such
analysis can be used in order to detect patterns in symbol strings. The symbol strings we
will deal with could arise from languages (such as English or Italian), from biological
macromolecules
(such as DNA or RNA) or from games of chance, such as the throw of dice or the toss of
coins.
Markov models are extensively used in bioinformatics to analyze DNA, RNA and protein
sequences. Currently, these provide the most powerful methods of finding genes as well
as other biologically important features such as promoter elements, CpG islands, etc.
In the next two Sections we construct a model that can be taught to distinguish
between English and Italian. In Section 4 we discuss Hidden Markov models and
some of the
PDF created with pdfFactory trial version www.pdffactory.com
standard algorithms used. The implementation we discuss is regrettably current: How can
one figure out whether a (hypothetical) cricket match is fixed or played fairly....

2 Markov Chains
Consider a system which, at ay instant t, can exist in one of N distinct states, S1, S2, S3, .
. . , Sn, and let the state be denoted q. At the next instant, the system makes a transition to
state Qt+j. Say the system outputs the symbol Oj when the system is in state S. The result

of the above model is a symbol string


Which is output from the Markov Chain

The “probability” of the symbol string or of the Markov chain is

Where, in standard notation, P(X) denotes the probability of a event, and P(X/Y) denotes
the conditional probability of event X, given that event Y has occurred. In a first order
Markov model the probability of a given symbol is taken to depend only on the
immediately preceding symbol,

While for a Markov model of order n, the dependence is on n preceding symbols, namely

Thus, the probability of a symbol string of length L arising from a first order
Markov process is

In the case where the conditional probabilities are time independent, these are
termed state transition probabilities

PDF created with pdfFactory trial version www.pdffactory.com


These have the property that these are all nonnegative, aij >= 0 sum to unity,

The probability of a given string can thus be computed as

P(Si) being defined as the probability of the system starting in state Si. One can imagine a
special “begin” state, denoted by 0, say, which does not output a symbol but
merely makes a transition to the first state, so that P(Sqo)= aoqo. Similarly, an “end”
state can be included to terminate a symbol string.
Knowledge of all the transition probabilities, the aij’s, uniquely characterizes the model.
An example of a Markov chain is in the following figure

2.1 A First—Order Markov Language Parser


We use a first order Markov chain to implement a tool for distinguishing between English
and Italian texts. To construct a 1st order Markov model for the English language,
one proceeds as follows. Ignoring punctuation, accents, numerals and the distinction
between upper and lower case letters, one can consider the system to consist of 27
states, namely one for each letter of the alphabet and one for the word delimiters, namely
a blank space. The parameters of the model, the transition probabilities are
determined through
‘training”. A corpus, a sufficiently large and representative sample of the English
language is analyzed and the 27x27 matrix of probabilities is computed.
PDF created with pdfFactory trial version www.pdffactory.com
Since the Italian language also uses the same alphabet, a model for Italian would
also consist of the same states, but the transition probabilities would differ and would
have to be determined from an Italian corpus. Denoting the transition probabilities in the
two Languages by and respectively, one can estimate the probability that a
given finite symbol string (a word) is generated by one language or the other:

And

The ratio of the two probabilities gives the likelihood as to which model is more accurate
for a given word. In many applications, it is easier to compute the score or the
log- likelihood ratio,

So that a positive score is indicative of the word being English, while a negative
score indicates Italian.
We apply this Markov model to the following (not entirely hypothetical) problem: Given
a document that consists of random lengths of English and Italian texts interspersed, we
need to separate it into texts consisting of a single language.
The first step is to construct state transition matrices for both English and Italian
languages and the training sets which are used are the English and Italian Bibles, both of
which are available on the Internet. This gives the two 27 x 27 matrices, and
(In order to not rule out infrequent transitions just because they did not occur in
the training sets taken, it is customary to add a ‘pseudo count’ to avoid zero’s in the
matrices, especially since ratios and logarithms are taken.) The matrices can be
made larger by including special characters, punctuation and separate upper and lower
case letters to get more accuracy. The begin and the end states are ignored since their
effect is negligible.

PDF created with pdfFactory trial version www.pdffactory.com


The text to be deciphered is analyzed as follows. The score is computed via Eq. (7) in a
sliding window of N letters. Namely, one calculates the score for the first N characters,
from position 1 to N, then from 2 to N+1, and so on, so that the m th window will be
from character m to m + N. From a graph of the score Score (m) versus the
window location m, it is easy to read off the portions of the text that correspond
to English
(positive scores) and Italian (negative scores), while in windows that include texts in both
languages, the scores hover near zero. Occasionally there are positive scores over a small
length within a region of largely negative scores or vice versa: these can occur due
to ambiguities in the model (as for example in analyzing the phrase I ate pepperoni
pizza this afternoon”, but any program can be trained to ignore these). The shape of the
score graph will depend, naturally, on the window length, N, as well as the slide
length, but with some training, these can be adjusted to give optimal results. An
example of such analysis is shown in Fig. Sample results for different window sizes are
shown in the table below.

As can be appreciated, such analysis well for if each state (English or Italian) is present
for a considerable duration, but if the interspersed parts are very small, detecting these
is more difficult.

PDF created with pdfFactory trial version www.pdffactory.com


3. Hidden Markov Models
Hidden Markov models were originally developed in the field of speech
recognition. These can be adapted and applied to more difficult pattern recognition
problems. The principal difference between Markov models and their hidden
counterparts is that there is a separation between the state of the system and the symbol
that results in the model. In the example just discussed, for example, we can
imagine that there are two states corresponding to the English and Italian languages.
In each state, the model outputs the same set of symbols (the letters a, b, c), but with
different probabilities, characteristic of the state that the system is in.
Thus a HMM is a “doubly embedded stochastic process” in which the states are
not known a priori, hence ‘hidden’: the observed symbol is a probabilistic function
of the state.
Formally, a HMM is characterized by the following parameters:
1. N, the number of distinct states in the model. Generally the model is such that one
can go from one state to all others (ergodicity), and as before we denote the states
as {S} = {S1, S2, S3, , Sn} and the state at time t as qt.
2. The state transition probability matrix with

3. M, the number of distinct observable symbols per state. The observable symbol set

{X is denoted by
4. The emission probability ej (k), namely the probability of emitting the observable
Symbol Xk when the model is in the state Sj.

5. The initial state distribution P = {P1} where Pi = P (q1= Si) and


Complete specification of the hidden Markov model requires specification of all
the above parameters.
The separation between the state and the observed symbol leads to the following
expression for the probability of observing a sequence of symbols {x} given the path or
itinerary {S}, namely the sequence of states that the system goes through,

PDF created with pdfFactory trial version www.pdffactory.com


The interpretation of this above equation is as follows. Starting from the begin
state denoted 0, the system makes a transition to state qi with probability aoqi. The
symbol x1 is emitted with probability eq1 (x1), while the system makes a transition from
state q1 to q2 with probability aq1q2, emits symbol 2 with probability eq2 (X2), and so
on, till state q is reached, and the model stops by making a transition to the end state,
also denoted by
0.
The figure below provides an example of an HMM used to represent gene sequences. The
3 states C1, C2andC3 are the different positions of bases in a codon, and constitute the
hidden states for the HMM, and for each state the symbols a, t, g c are the observable
symbols. The 2 dummy states, start and end denote the beginning and end of the gene.

The observed symbol sequences can thus arise from any one of the possible paths,
though the probabilities will be very different. It thus becomes necessary to try to find the
most probable path or itinerary through the state space that could give rise to the
observed symbol sequence. Trying to identify this most probable path by enumerating all
the possible paths and identifying that with the largest P ({x}, {S}) is computationally an
intractable task, but there is a method based on a dynamic programming technique that
makes this task easier. This is the Viterbi Algorithm, which we now describe.
PDF created with pdfFactory trial version www.pdffactory.com
The Viterbi algorithm computes the most probable path recursively. If Vk(i) is the
probability of the most probable path ending in state 8k with observed symbol xj, then the
probability for extending the symbol sequence by one, to make the observation xi+i
is given by

Namely, one can make a transition from state Sk to any other state Sl, and then emit the
symbol Xi+1. One chooses that state S such that V1 (i + 1) is maximized. All sequences
start in state 0, the begin state, so the initial condition is Vo(O) =1, and the Viterbi scores
can be recursively computed. The exact algorithm is described in the box below.

PDF created with pdfFactory trial version www.pdffactory.com


3.1 A Hidden—Markov “Match Fixing” detector
Can one use a Hidden Markov model to detect match—fixing? In this example we show
how one could in principle (given enough data, that is) determine whether or not
there has been some foul play in a series of cricket matches.
The scenario is the following. Two cricket teams, named I and P, say, play a set of 200
matches against each other. Under normal circumstances, the teams are evenly matched,
so either of them can win with probability. However, there is a powerful bookie
lobby that can persuade team P to throw the match by ‘fixing” it, namely by giving them
some alternate inducement, as a consequence of which in such a fixed match, team I
wins with probability 0.8.
One does not know which match is fixed, but we do know that thanks to surveillance and
other safeguards, the bookies can communicate with the captain of team P only
sporadically, and thus to start with, they have a probability 0.05 of giving the signal
to start fixing the forthcoming matches. The signal to stop fixing can be given a bit
more easily, with probability 0.1. (That is, the probability of not fixing a match is
0.95, but having started fixing, all subsequent matches are fixed with probability 0.9).
Say the results of which team won are available over a series of 200 matches played with
the same set of circumstances (same teams, same bookies), are

PDF created with pdfFactory trial version www.pdffactory.com


Can one determine which of the matches was fixed?
It turns out that one can set up a HMM for this problem quite simply. Clearly, what
is hidden here is the state: the match can either be fixed (F) or not (N), and in either
state, the winner is either I or P, albeit with different probabilities. All this
information is encoded in the diagram:
Using the Viterbi algorithm, it is quite a simple task to figure out which path (through the
state space) was followed; the result is given below, along with a listing of the
“True state” (which was obtained in a signed confession from the bookies in
question as to which of the matches were really fixed).

As can be seen, the HMM can pick up the fixed matches with a fair degree of accuracy
(only 19 errors in 200 matches) in this example. Of course, a lot of data was available:
the probabilities of each team winning in each state, the probabilities of the fixing, etc.,
all of which make it quite a simple task to figure out the turn of events. HMMs can do
more- in fact, even when the probabilities are not given explicitly, by a method
of training using the forward and backward algorithms described above.
Out the turn of events. HMMs can do more- in fact, even when the probabilities are not
given explicitly, by a method of training using the forward and backward
algorithms described above.

PDF created with pdfFactory trial version www.pdffactory.com


4 Applications of Hidden Markov Models
HMMs were first used in the early 80?s in the field of speech recognition. Since then, due
to the fact that many complex problems can be easily represented and solved
using HMMs, it has found application in a number of areas related to pattern
recognition. We discuss a couple of them below.

Speech Recognition:
In the development of a speech recognizer, we assume that we have a vocabulary of V
words to be recognized by the device. Each word is modeled by a distinct HMM. We also
assume that for each word in the vocabulary there are k occurrences of the spoken word,
which form the training set. For each spoken word, the respective observations extracted
through some appropriate representation of the spectral and temporal characteristics
of the word. Thus the implementation of speech recognition consists of 2 basic steps:

1. For each word v in the vocabulary, an HMM ‘is constructed and the model parameter
(A, B, r) are calculated using the Baum Welch Algorithm (Arbitrary parameter values are
taken at the start. These are then used to calculate new parameter values, and this process
continues till change in values becomes infinitesimal). Thus we have V distinct HMMs
created for the vocabulary.

2. When an unknown word to be recognized is encountered, first the observation


sequence 0 is extracted via feature analysis of the speech in the same representation form
as that used for the training set. Then the V model likelihoods are computed for
all models using the forward or backward algorithm m,

PDF created with pdfFactory trial version www.pdffactory.com


Finally the word corresponding to the highest model likelihood os selected and is said to
be recognized that is

The basic concepts of speech recognition can be extended to incorporate


audiovisual pattern evaluation of log likelihood but till now, handwriting
recognition has been restricted to a limited vocabulary. There is however, work going
on for the development of a generic handwriting tool using a two level Viterbi algorithm.

Conclusion
The use of Markov models and HMMs in pattern recognition is best described
through examples and applications, and in this article, we tried to show how such
analysis can be used in order to detect patterns in symbol strings. Hidden Markov models
were originally developed in the field of speech recognition. These can be adapted and
applied to more difficult pattern recognition problems. Thus Markov models and
HMMs may help emerge new, highly efficient and robust Pattern Recognition
Techniques.

References
[1] Time Magazine, July 22, 2002, page 19.
[2] R. L. Rabiner, A tutorial on Hidden Markov models and selected applications
in speech recognition, Proceedings of the IEEE, 77, 257-286 (1989).
[3] R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison, Biological sequence
analysis,
(Cambridge University Press, Cambridge, 1998).
PDF created with pdfFactory trial version www.pdffactory.com

Você também pode gostar