Escolar Documentos
Profissional Documentos
Cultura Documentos
ME(SP)
EE dept
Sr no-04-03-01-35-41-13-1-10668
1.
Problem definition:(i)
(ii)
2.
(i)
Under class 1
aj=32125; for j=42833 else aj=0.25;
Under class 2
aj=21417; for j=9332 else aj=0.5;
c. In this case if there we get a novel feature vector which has a
word which was not in our training sample than we still assign a
probability ie. 0.25/(n+a0) (under class 1) or 0.5/(n+a0) (under
class2). Which was not the case in MLE. In which case we
considered the feature component in the likelyhood ratio only if it
we had some probability under both the classes.
(iv)
The samples for training were drawn randomly from the given set and in
each case the performance of the classifier was tested by using the
remaining samples. Following were the plots generated showing errors
on the test data under various independent draw of training samples for
various size of the training data. Multiple runs were used to see the
effect of choosing randomly the training data.
(v)
In
the next
set of
plots the
training
set was
fixed to 75 documents was randomly taken from the given data. The
error was computed for the classifier implemented in each case. (both
for MLE model and Bayesian estimated model). The following plots
shows that choice of training data among the available data does matter
in the estimation and hence the classifier design. The question is of
choosing the data that best describes the conditional densities.