Você está na página 1de 10

Assignment I

pattern recognition and


neural networks

ME(SP)
EE dept
Sr no-04-03-01-35-41-13-1-10668

Document Classifier: Nave Bayes Method

1.

Problem definition:(i)
(ii)

2.

Implement naive Bayes classifier with bag of words representation.


Estimate the model for the above implementation using MLE and
Bayesian method.
Implementation:-

(i)

The document is considered to be a feature vector of dimension


43880 with every feature component independent . Each component is the
number of times the word appears in the document. The bayes classifier
assigns the feature vector to a class for which it has maximum
aposteriori probability. And we use the prior (taken as equal prior for
each class. Given 150 documents are from class one (wheat) and rest
are from the other class, ie corn). Using the prior and class conditional
density of each we compute the likelyhood ratio (log). Futher 0-1 cost
function is assumed.
(iii)

To implement such a classifier we need to estimate the class conditional


densities. Two approaches were adopted :a. ML estimation (nj/n where n is the number of words in the test set
and nj is the number times a word occurs in the test set) of the
class conditional densities: In this case we considered a set of test
document (75 documents) as a bag of words and estimating the
density is equivalent to finding the frequencies of each word (ie.
Feature component).
b. Bayesian estimation( (nj+aj)/(n+a0). Where we choose aj and
a0=43880 as parameters for the prior probabilities for the words.
One can assume higher probabilities for wheat and corn (ie.
Given no data we assume prior that doc based on wheat will have
more number of the word wheat) words based on the knowledge
that we need to classify two documents which are based on
wheat or corn. So

Under class 1
aj=32125; for j=42833 else aj=0.25;

Under class 2
aj=21417; for j=9332 else aj=0.5;
c. In this case if there we get a novel feature vector which has a
word which was not in our training sample than we still assign a
probability ie. 0.25/(n+a0) (under class 1) or 0.5/(n+a0) (under
class2). Which was not the case in MLE. In which case we
considered the feature component in the likelyhood ratio only if it
we had some probability under both the classes.
(iv)

The samples for training were drawn randomly from the given set and in
each case the performance of the classifier was tested by using the
remaining samples. Following were the plots generated showing errors
on the test data under various independent draw of training samples for
various size of the training data. Multiple runs were used to see the
effect of choosing randomly the training data.

(v)

In
the next
set of
plots the
training
set was
fixed to 75 documents was randomly taken from the given data. The
error was computed for the classifier implemented in each case. (both
for MLE model and Bayesian estimated model). The following plots
shows that choice of training data among the available data does matter
in the estimation and hence the classifier design. The question is of
choosing the data that best describes the conditional densities.

Você também pode gostar