Escolar Documentos
Profissional Documentos
Cultura Documentos
ABSTRACT:
In health domain, the major critical issue is prediction of disease in early stage. Prediction
of disease is mainly based on the experience of physician so many machine learning
approach contribute their work in the prediction of disease. In existing approaches, either
prediction or feature selection has been concentrated. The aim of this paper is to present
the effect of data size and set of features in the prediction of disease in health domain using
Nave Bayes. This shows how each attribute or combination of attribute behaves on
different size of dataset.
1. INTRODUCTION
In health, domain diagnosis of disease is the experience. If the physician has more
very challenging task. Earlier prediction can experience, then he may predict well. if the
made based on some lab test. Using this lab physician has less experience then he may
test report the physician will decide whether predict wrongly.to overcome from this
the patient has disease or not but prediction problem machine learning has many
of disease by physician mainly depend on approaches like KNN, SVM, ANN to
given the class, and it assumes that no particular class. This is defined as
Where
F-Measure (f): it is the harmonic mean of
P(c|x) is the posterior probability of the precision value and recall value
class (high-risk or low-risk) given the
Calculate the F-Measure with respect to a
predictors, calculated as (2), P(c) is the prior
particular class. This is defined as
probability of the class, P(x|c) is the
2*r*p
likelihood which is the probability of the F=-- ----------------------
predictor given the class, and P(x) is the r+p
accuracy. Patients are classified into one of In this paper, we examine the effect of data
two classes: (i) diabetic i or (ii) non - size on feature set using nave Bayes
diabetic. We use 10-fold cross validation in classifier.
training and then we apply the model onto
For each attribute set for example if there
our testing set.
are 8 attribute then 2^8-1=256-1=255 subset
Consider 100% of data means full instances
possible. For each subset graph generated.
then
Which shows performance of each
For each possible subset for features (for
example if there are 8 attribute then 2^8 The following figure shows the effect of
features(0,1,2,4) on different data size
common attribute affect for prediction also Fuzzy Logic Approach for