Escolar Documentos
Profissional Documentos
Cultura Documentos
NCAICN-2013, PRMITR,Badnera
257
International Journal Of Computer Science And Applications Vol. 6, No.2, Apr 2013 ISSN: 0974-1011 (Open Access)
Some standards and terms: For above confusion matrix, true positives for class
a=’YES’ is 33 while false positives is 72 whereas, for
1. True positive (TP): If the outcome from a class b=’NO’, true positives is 170 and false positives
prediction is p and the actual value is also p, is 25 i.e. diagonal elements of matrix 33+170 =203
then it is called a true positive. represents the correct instances classified and other
2. False positive (FP): However if the actual elements 25+72 = 97 represents the incorrect
value is n then it is said to be a false instances.
positive.
3. Precision and recall: Precision is the fraction True positive rate = diagonal element/ sum of
of retrieved instances that are relevant, while relevant row
recall is the fraction of relevant instances
that are retrieved. Both precision and recall False positive rate = non-diagonal element/ sum of
are therefore based on an understanding and relevant row
measure of relevance. Precision can be seen
as a measure of exactness or quality,
whereas recall is a measure of completeness Hence,
or quantity. Recall is nothing but the true TP rate for class a = 33/(33+72) = 0.314
positive rate for the class [10][11].
FP rate for class a = 25/(25+170) = 0.128
TP rate for class b = 170/(25+170) = 0.871
In this paper, we have used weka (Waikato
environment for knowledge analysis) tool for FP rate for class b = 72/(33+72) = 0.685
comparison of naive bayes and J48 algorithm and Average TP rate = 0.677
calculating efficiency based on accuracy regarding
correct and incorrect instances generated with Average FP rate = 0.491
NCAICN-2013, PRMITR,Badnera
258
International Journal Of Computer Science And Applications Vol. 6, No.2, Apr 2013 ISSN: 0974-1011 (Open Access)
Cost/Benefit analysis :
Fig. 3 shows the Cost of Naïve Bayes for class YES
= 105
NCAICN-2013, PRMITR,Badnera
259
International Journal Of Computer Science And Applications Vol. 6, No.2, Apr 2013 ISSN: 0974-1011 (Open Access)
REFERENCES
[1] Margaret H. Danham,S. Sridhar, ” Data mining,
Introductory and Advanced Topics”, Person
education , 1st ed., 2006
[2] Wenke Lee, Salvatore J. Stolfo, Kui W. Mok, “A
Fig. 4 Cost analysis for class NO Data Mining Framework for Building Intrusion
Though results of cost and benefit analysis for Detection Models”
mortgage is same for J48 and Naïve Bayes, but in [3] Aman Kumar Sharma, Suruchi Sahni, “A
case of gender cost/benefit analysis for J48 is lesser Comparative Study of Classification Algorithms for
than that of Naïve Bayes as shown below. Spam Email Data Analysis”, IJCSE, Vol. 3, No. 5,
2011, pp. 1890-1895
NCAICN-2013, PRMITR,Badnera
260
International Journal Of Computer Science And Applications Vol. 6, No.2, Apr 2013 ISSN: 0974-1011 (Open Access)
[7] Seongwook Youn, Dennis McLeod, “A [10] Hong Hu, Jiuyong Li, Ashley Plank, “A
Comparative Study for Email Classification” Comparative Study of Classification Methods for
Microarray Data Analysis”, published in CRPIT, Vol.
[8] Anshul Goyal, Rajni Mehta, “Performance 61, 2006.
Comparison of Naïve Bayes and J48 Classification
Algorithms”, IJAER, Vol. 7, No. 11, 2012, pp. [11] Milan Kumari, Sunila Godara, “Comparative
Study of Data Mining Classification Methods in
[9] Xiang yang Li, Nong Ye, “A Supervised cardiovascular Disease Prediction”, IJCST, Vol. 2,
Clustering and Classification Algorithm for Mining Issue 2, 2011, pp. 304-308
Data With Mixed Variables”, IEEE
TRANSACTIONS ON SYSTEMS, MAN, AND [12]http://www.cs.bme.hu/~kiskat/adatb/bank-data-
CYBERNETICS, Vol. 36, No. 2, 2006, pp. 396-406 train.arff
NCAICN-2013, PRMITR,Badnera
261