Iris Dataset Clustering and Spam Email Separation

IRIS DATASET
CLUSTERING AND SPAM

EMAIL SEPARATION
Using K-Means and J48
AKASH.M.SHAHZAD
Roll No:172303
Iris dataset clustering using K-Means (Output)
=== Run information ===
Scheme: weka.clusterers.SimpleKMeans -init 0 -max-candidates 100 -periodic-pruning 10000

-min-density 2.0 -t1 -1.25 -t2 -1.0 -N 3 -A "weka.core.EuclideanDistance -R first-last" -I 500 -num-slots
1 -S 10
Relation: iris
Instances: 150
Attributes: 5
sepallength
sepalwidth
petallength
petalwidth
Ignored:
class
Test mode: Classes to clusters evaluation on training data
=== Clustering model (full training set) ===
kMeans
======
Number of iterations: 6
Within cluster sum of squared errors: 6.998114004826762
Initial starting points (random):
Cluster 0: 6.1,2.9,4.7,1.4
Cluster 1: 6.2,2.9,4.3,1.3
Cluster 2: 6.9,3.1,5.1,2.3
Missing values globally replaced with mean/mode
Final cluster centroids:
Cluster#
Attribute Full Data 0 1 2
(150.0) (61.0) (50.0) (39.0)
=========================================================
sepallength 5.8433 5.8885 5.006 6.8462
sepalwidth 3.054 2.7377 3.418 3.0821
petallength 3.7587 4.3967 1.464 5.7026
petalwidth 1.1987 1.418 0.244 2.0795
Time taken to build model (full training data) : 0.01 seconds
=== Model and evaluation on training set ===
Clustered Instances
0 61 ( 41%)
1 50 ( 33%)
2 39 ( 26%)
Class attribute: class
Classes to Clusters:
0 1 2 <-- assigned to cluster

0 50 0 | Iris-setosa
47 0 3 | Iris-versicolor
14 0 36 | Iris-virginica
Cluster 0 <-- Iris-versicolor
Cluster 1 <-- Iris-setosa
Cluster 2 <-- Iris-virginica
Incorrectly clustered instances : 17.0 11.3333 %
Grophical representation (Overview):

(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
(19)
(20)
(21)
(22)
(23)
(24)
(25)
Spam mail separation using J48 tree
=== Run information ===
Scheme: weka.classifiers.meta.FilteredClassifier -F
"weka.filters.unsupervised.attribute.StringToWordVector -R first-last -W 1000 -prune-rate -1.0 -N 0
-stemmer weka.core.stemmers.NullStemmer -stopwords-handler weka.core.stopwords.Null -M 1
-tokenizer \"weka.core.tokenizers.WordTokenizer -delimiters \\\"
\\\\r\\\\n\\\\t.,;:\\\\\\\'\\\\\\\"()?!\\\"\"" -S 1 -W weka.classifiers.trees.J48 -- -C 0.25 -M 2
Relation: F__Spam_Mails
Instances: 5180
Attributes: 2
text
@@class@@
Test mode: user supplied test set: size unknown (reading incrementally)
=== Classifier model (full training set) ===
FilteredClassifier using weka.classifiers.trees.J48 -C 0.25 -M 2 on data filtered through

weka.filters.unsupervised.attribute.StringToWordVector -R 1 -W 1000 -prune-rate -1.0 -N 0 -stemmer
weka.core.stemmers.NullStemmer -stopwords-handler weka.core.stopwords.Null -M 1 -tokenizer
"weka.core.tokenizers.WordTokenizer -delimiters \" \\r\\n\\t.,;:\\\'\\\"()?!\""
Filtered Header
@relation 'F__Spam_Mails-weka.filters.unsupervised.attribute.StringToWordVector-R1-W1000-
prune-rate-1.0-N0-stemmerweka.core.stemmers.NullStemmer-stopwords-
handlerweka.core.stopwords.Null-M1-tokenizerweka.core.tokenizers.WordTokenizer
-delimiters \" \\r\\n\\t.,;:\\\'\\\"()?!\"'
@attribute @@class@@ {ham,norm,spam}
2004 > 0: spam (122.0/1.0)
Number of Leaves : 145
Size of the tree : 289

Time taken to build model: 92.97 seconds
=== Evaluation on test set ===
Time taken to test model on supplied test set: 1.14 seconds
=== Summary ===
Correctly Classified Instances 5096 98.3784 %
Incorrectly Classified Instances 84 1.6216 %
Kappa statistic 0.9609
Mean absolute error 0.0193
Root mean squared error 0.0983
Relative absolute error 7.0001 %
Root relative squared error 26.4624 %
Total Number of Instances 5180
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0.987 0.023 0.990 0.987 0.989 0.961 0.993 0.995 ham
0.625 0.000 1.000 0.625 0.769 0.790 0.991 0.737 norm
0.979 0.013 0.968 0.979 0.973 0.962 0.994 0.987 spam
Weighted Avg. 0.984 0.020 0.984 0.984 0.984 0.961 0.993 0.993
=== Confusion Matrix ===
a b c <-- classified as
3623 0 49 | a = ham
3 5 0 | b = norm
32 0 1468 | c = spam

Iris Dataset Clustering and Spam Email Separation

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Iris Dataset Clustering and Spam Email Separation

Enviado por

Direitos autorais:

Formatos disponíveis

IRIS DATASET

CLUSTERING AND SPAM

Scheme: weka.clusterers.SimpleKMeans -init 0 -max-candidates 100 -periodic-pruning 10000

Test mode: Classes to clusters evaluation on training data

=== Clustering model (full training set) ===

Within cluster sum of squared errors: 6.998114004826762

Initial starting points (random):

Final cluster centroids:

Attribute Full Data 0 1 2

(150.0) (61.0) (50.0) (39.0)

sepallength 5.8433 5.8885 5.006 6.8462

sepalwidth 3.054 2.7377 3.418 3.0821

petallength 3.7587 4.3967 1.464 5.7026

petalwidth 1.1987 1.418 0.244 2.0795

Time taken to build model (full training data) : 0.01 seconds

=== Model and evaluation on training set ===

Class attribute: class

0 1 2 <-- assigned to cluster

Cluster 0 <-- Iris-versicolor

Cluster 1 <-- Iris-setosa

Cluster 2 <-- Iris-virginica

Incorrectly clustered instances : 17.0 11.3333 %

Grophical representation (Overview):

=== Classifier model (full training set) ===

FilteredClassifier using weka.classifiers.trees.J48 -C 0.25 -M 2 on data filtered through

@attribute @@class@@ {ham,norm,spam}

2004 > 0: spam (122.0/1.0)

Number of Leaves : 145

Size of the tree : 289

=== Evaluation on test set ===

Time taken to test model on supplied test set: 1.14 seconds

=== Summary ===

Correctly Classified Instances 5096 98.3784 %

Incorrectly Classified Instances 84 1.6216 %

Kappa statistic 0.9609

Mean absolute error 0.0193

Root mean squared error 0.0983

Relative absolute error 7.0001 %

Root relative squared error 26.4624 %

Total Number of Instances 5180

=== Detailed Accuracy By Class ===

0.987 0.023 0.990 0.987 0.989 0.961 0.993 0.995 ham

0.625 0.000 1.000 0.625 0.769 0.790 0.991 0.737 norm

0.979 0.013 0.968 0.979 0.973 0.962 0.994 0.987 spam

=== Confusion Matrix ===

Você também pode gostar