Escolar Documentos
Profissional Documentos
Cultura Documentos
Abstract
Pattern recognition is of interest in several fields, and Markov models offer a particularly
powerful technique for solving such problems. Hidden Markov models (HMMs) go
a step further, and can paradoxically sometimes be used to find patterns even when it is
not entirely clear as to what one is looking for. In this article we introduce the basic
ideas behind these models and illustrate them through applications to examples. We
construct a Markov model for distinguishing between two languages which use a
common script (as a prototypical problem, with straightforward applications to
analyzing biological sequences, say), and a Hidden Markov model for detecting whether
or not a fair coin was used in a coin toss (as a means of uncovering match fixing, for
instance).
1. Introduction
The use of Markov models and HMMs in pattern recognition is best described
through examples and applications, and in this article, we will try to show how such
analysis can be used in order to detect patterns in symbol strings. The symbol strings we
will deal with could arise from languages (such as English or Italian), from biological
macromolecules
(such as DNA or RNA) or from games of chance, such as the throw of dice or the toss of
coins.
Markov models are extensively used in bioinformatics to analyze DNA, RNA and protein
sequences. Currently, these provide the most powerful methods of finding genes as well
as other biologically important features such as promoter elements, CpG islands, etc.
In the next two Sections we construct a model that can be taught to distinguish
between English and Italian. In Section 4 we discuss Hidden Markov models and
some of the
PDF created with pdfFactory trial version www.pdffactory.com
standard algorithms used. The implementation we discuss is regrettably current: How can
one figure out whether a (hypothetical) cricket match is fixed or played fairly....
2 Markov Chains
Consider a system which, at ay instant t, can exist in one of N distinct states, S1, S2, S3, .
. . , Sn, and let the state be denoted q. At the next instant, the system makes a transition to
state Qt+j. Say the system outputs the symbol Oj when the system is in state S. The result
Where, in standard notation, P(X) denotes the probability of a event, and P(X/Y) denotes
the conditional probability of event X, given that event Y has occurred. In a first order
Markov model the probability of a given symbol is taken to depend only on the
immediately preceding symbol,
While for a Markov model of order n, the dependence is on n preceding symbols, namely
Thus, the probability of a symbol string of length L arising from a first order
Markov process is
In the case where the conditional probabilities are time independent, these are
termed state transition probabilities
P(Si) being defined as the probability of the system starting in state Si. One can imagine a
special “begin” state, denoted by 0, say, which does not output a symbol but
merely makes a transition to the first state, so that P(Sqo)= aoqo. Similarly, an “end”
state can be included to terminate a symbol string.
Knowledge of all the transition probabilities, the aij’s, uniquely characterizes the model.
An example of a Markov chain is in the following figure
And
The ratio of the two probabilities gives the likelihood as to which model is more accurate
for a given word. In many applications, it is easier to compute the score or the
log- likelihood ratio,
So that a positive score is indicative of the word being English, while a negative
score indicates Italian.
We apply this Markov model to the following (not entirely hypothetical) problem: Given
a document that consists of random lengths of English and Italian texts interspersed, we
need to separate it into texts consisting of a single language.
The first step is to construct state transition matrices for both English and Italian
languages and the training sets which are used are the English and Italian Bibles, both of
which are available on the Internet. This gives the two 27 x 27 matrices, and
(In order to not rule out infrequent transitions just because they did not occur in
the training sets taken, it is customary to add a ‘pseudo count’ to avoid zero’s in the
matrices, especially since ratios and logarithms are taken.) The matrices can be
made larger by including special characters, punctuation and separate upper and lower
case letters to get more accuracy. The begin and the end states are ignored since their
effect is negligible.
As can be appreciated, such analysis well for if each state (English or Italian) is present
for a considerable duration, but if the interspersed parts are very small, detecting these
is more difficult.
3. M, the number of distinct observable symbols per state. The observable symbol set
{X is denoted by
4. The emission probability ej (k), namely the probability of emitting the observable
Symbol Xk when the model is in the state Sj.
The observed symbol sequences can thus arise from any one of the possible paths,
though the probabilities will be very different. It thus becomes necessary to try to find the
most probable path or itinerary through the state space that could give rise to the
observed symbol sequence. Trying to identify this most probable path by enumerating all
the possible paths and identifying that with the largest P ({x}, {S}) is computationally an
intractable task, but there is a method based on a dynamic programming technique that
makes this task easier. This is the Viterbi Algorithm, which we now describe.
PDF created with pdfFactory trial version www.pdffactory.com
The Viterbi algorithm computes the most probable path recursively. If Vk(i) is the
probability of the most probable path ending in state 8k with observed symbol xj, then the
probability for extending the symbol sequence by one, to make the observation xi+i
is given by
Namely, one can make a transition from state Sk to any other state Sl, and then emit the
symbol Xi+1. One chooses that state S such that V1 (i + 1) is maximized. All sequences
start in state 0, the begin state, so the initial condition is Vo(O) =1, and the Viterbi scores
can be recursively computed. The exact algorithm is described in the box below.
As can be seen, the HMM can pick up the fixed matches with a fair degree of accuracy
(only 19 errors in 200 matches) in this example. Of course, a lot of data was available:
the probabilities of each team winning in each state, the probabilities of the fixing, etc.,
all of which make it quite a simple task to figure out the turn of events. HMMs can do
more- in fact, even when the probabilities are not given explicitly, by a method
of training using the forward and backward algorithms described above.
Out the turn of events. HMMs can do more- in fact, even when the probabilities are not
given explicitly, by a method of training using the forward and backward
algorithms described above.
Speech Recognition:
In the development of a speech recognizer, we assume that we have a vocabulary of V
words to be recognized by the device. Each word is modeled by a distinct HMM. We also
assume that for each word in the vocabulary there are k occurrences of the spoken word,
which form the training set. For each spoken word, the respective observations extracted
through some appropriate representation of the spectral and temporal characteristics
of the word. Thus the implementation of speech recognition consists of 2 basic steps:
1. For each word v in the vocabulary, an HMM ‘is constructed and the model parameter
(A, B, r) are calculated using the Baum Welch Algorithm (Arbitrary parameter values are
taken at the start. These are then used to calculate new parameter values, and this process
continues till change in values becomes infinitesimal). Thus we have V distinct HMMs
created for the vocabulary.
Conclusion
The use of Markov models and HMMs in pattern recognition is best described
through examples and applications, and in this article, we tried to show how such
analysis can be used in order to detect patterns in symbol strings. Hidden Markov models
were originally developed in the field of speech recognition. These can be adapted and
applied to more difficult pattern recognition problems. Thus Markov models and
HMMs may help emerge new, highly efficient and robust Pattern Recognition
Techniques.
References
[1] Time Magazine, July 22, 2002, page 19.
[2] R. L. Rabiner, A tutorial on Hidden Markov models and selected applications
in speech recognition, Proceedings of the IEEE, 77, 257-286 (1989).
[3] R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison, Biological sequence
analysis,
(Cambridge University Press, Cambridge, 1998).
PDF created with pdfFactory trial version www.pdffactory.com