Você está na página 1de 20

IRIS DATASET

CLUSTERING AND SPAM


EMAIL SEPARATION
Using K-Means and J48

AKASH.M.SHAHZAD
Roll No:172303
Iris dataset clustering using K-Means (Output)
=== Run information ===

Scheme: weka.clusterers.SimpleKMeans -init 0 -max-candidates 100 -periodic-pruning 10000


-min-density 2.0 -t1 -1.25 -t2 -1.0 -N 3 -A "weka.core.EuclideanDistance -R first-last" -I 500 -num-slots
1 -S 10

Relation: iris

Instances: 150

Attributes: 5

sepallength

sepalwidth

petallength

petalwidth

Ignored:

class

Test mode: Classes to clusters evaluation on training data

=== Clustering model (full training set) ===

kMeans

======

Number of iterations: 6

Within cluster sum of squared errors: 6.998114004826762

Initial starting points (random):

Cluster 0: 6.1,2.9,4.7,1.4

Cluster 1: 6.2,2.9,4.3,1.3

Cluster 2: 6.9,3.1,5.1,2.3
Missing values globally replaced with mean/mode

Final cluster centroids:

Cluster#

Attribute Full Data 0 1 2

(150.0) (61.0) (50.0) (39.0)

=========================================================

sepallength 5.8433 5.8885 5.006 6.8462

sepalwidth 3.054 2.7377 3.418 3.0821

petallength 3.7587 4.3967 1.464 5.7026

petalwidth 1.1987 1.418 0.244 2.0795

Time taken to build model (full training data) : 0.01 seconds

=== Model and evaluation on training set ===

Clustered Instances

0 61 ( 41%)

1 50 ( 33%)

2 39 ( 26%)

Class attribute: class

Classes to Clusters:

0 1 2 <-- assigned to cluster


0 50 0 | Iris-setosa

47 0 3 | Iris-versicolor

14 0 36 | Iris-virginica

Cluster 0 <-- Iris-versicolor

Cluster 1 <-- Iris-setosa

Cluster 2 <-- Iris-virginica

Incorrectly clustered instances : 17.0 11.3333 %

Grophical representation (Overview):


(1)

(2)
(3)

(4)
(5)

(6)
(7)

(8)
(9)

(10)
(11)

(12)
(13)

(14)
(15)

(16)
(17)

(18)
(19)

(20)
(21)

(22)
(23)

(24)
(25)
Spam mail separation using J48 tree
=== Run information ===

Scheme: weka.classifiers.meta.FilteredClassifier -F
"weka.filters.unsupervised.attribute.StringToWordVector -R first-last -W 1000 -prune-rate -1.0 -N 0
-stemmer weka.core.stemmers.NullStemmer -stopwords-handler weka.core.stopwords.Null -M 1
-tokenizer \"weka.core.tokenizers.WordTokenizer -delimiters \\\"
\\\\r\\\\n\\\\t.,;:\\\\\\\'\\\\\\\"()?!\\\"\"" -S 1 -W weka.classifiers.trees.J48 -- -C 0.25 -M 2

Relation: F__Spam_Mails

Instances: 5180

Attributes: 2

text

@@class@@

Test mode: user supplied test set: size unknown (reading incrementally)

=== Classifier model (full training set) ===

FilteredClassifier using weka.classifiers.trees.J48 -C 0.25 -M 2 on data filtered through


weka.filters.unsupervised.attribute.StringToWordVector -R 1 -W 1000 -prune-rate -1.0 -N 0 -stemmer
weka.core.stemmers.NullStemmer -stopwords-handler weka.core.stopwords.Null -M 1 -tokenizer
"weka.core.tokenizers.WordTokenizer -delimiters \" \\r\\n\\t.,;:\\\'\\\"()?!\""

Filtered Header

@relation 'F__Spam_Mails-weka.filters.unsupervised.attribute.StringToWordVector-R1-W1000-
prune-rate-1.0-N0-stemmerweka.core.stemmers.NullStemmer-stopwords-
handlerweka.core.stopwords.Null-M1-tokenizerweka.core.tokenizers.WordTokenizer
-delimiters \" \\r\\n\\t.,;:\\\'\\\"()?!\"'

@attribute @@class@@ {ham,norm,spam}

2004 > 0: spam (122.0/1.0)

Number of Leaves : 145

Size of the tree : 289


Time taken to build model: 92.97 seconds

=== Evaluation on test set ===

Time taken to test model on supplied test set: 1.14 seconds

=== Summary ===

Correctly Classified Instances 5096 98.3784 %

Incorrectly Classified Instances 84 1.6216 %

Kappa statistic 0.9609

Mean absolute error 0.0193

Root mean squared error 0.0983

Relative absolute error 7.0001 %

Root relative squared error 26.4624 %

Total Number of Instances 5180

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class

0.987 0.023 0.990 0.987 0.989 0.961 0.993 0.995 ham

0.625 0.000 1.000 0.625 0.769 0.790 0.991 0.737 norm

0.979 0.013 0.968 0.979 0.973 0.962 0.994 0.987 spam

Weighted Avg. 0.984 0.020 0.984 0.984 0.984 0.961 0.993 0.993

=== Confusion Matrix ===

a b c <-- classified as

3623 0 49 | a = ham
3 5 0 | b = norm

32 0 1468 | c = spam

Você também pode gostar