Você está na página 1de 16

Chapter 6.

Classification and
Prediction

Classification by decision tree induction

Bayesian classification

Rule-based classification

Classification by back propagation

Support Vector Machines (SVM)

Associative classification
Rule Generation from Decision
Tree
Decision tree classifiers are popular method of
classification due to it is easy understanding

However, decision tree can become large and difficult


to interpret

In comparison with decision tree, the IF-THEN rules


may be easier for humans to understand, particularly
if the decision tree is very large
Rule Generation from Decision
Tree
Rules are easier to understand than large trees
One rule is created for each path from the root to a
leaf
Each attribute-value pair along a path forms a
conjunction: the leaf holds the class prediction
Rule Generation from Decision
Tree
Example: Rule extraction from our buys_computer decision-tree
IF age = young AND student = no THEN buys_computer = no
IF age = young AND student = yes THEN buys_computer = yes
IF age = mid-age THEN buys_computer = yes
IF age = old AND credit_rating = excellent THEN buys_computer = yes
IF age = young AND credit_rating = fair THEN buys_computer = no
Rule Generation from Decision
Tree
Rules are mutually exclusive and exhaustive

Mutually exclusive: we can not have rules


conflict because no two rules will triggered for the
same tuple

Exhaustive: there is one rule for each possible


attribute-value combination, so that the set of
rules does not require a default rule
Chapter 6. Classification and
Prediction

Classification by decision tree induction

Bayesian classification

Rule-based classification

Classification by back propagation

Support Vector Machines (SVM)

Associative classification
Association Classification
Association rules show strong associations between
items that occur frequently in a given data set

The discovery of association rules is based on


frequent itemset mining

The general idea of association classification is


that we can search for strong associations between
frequent patterns and class labels
Association Classification
All association rules must satisfy certain
criteria regarding their:

Support the proportion of the data set


that they actually represent

Confidence their accuracy


Association Classification
Association rules can have any number
of items in the rule antecedent (left-
hand side) and any number of items in
the rule consequent (right hand side)

However, in association classification,


we are only interested in association
rules of the form p1 ^ p2^ => Aclass
Association Classification
Age=young ^ credit=ok => buys_computer=yes
[support=20%, confidence=93%]

The percentage of tuples in D satisfying the rules antecedent and


having class label C is called the support of R
A support of 20% for association rule means that 20% of the
customer in D are young, have an OK credit rating, and belong
to the class buys_ciomputer=yes

The confidence is the accuracy


Association Classification

age income studentcredit_rating


buys_computer
<=30 high no fair no
<=30 high no excellent no
3140 high no fair yes
>40 medium no fair yes
Regard each row as on transaction >40 low yes fair yes
>40 low yes excellent no
3140 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
3140 medium no excellent yes
3140 high yes fair yes
>40 medium no excellent no
Association Classification
A1 B1 C1 D1 N
A1: age<=30 A1 B1 C1 D2 N
A2:age between 31~40 A2 B1 C1 D1 Y
A3: Age >40 A3 B2 C1 D1 Y
B1: high income A3 B3 C2 D1 Y
B2: medium income A3 B3 C2 D2 N
B3: low income A2 B3 C2 D2 Y
C1: not student A1 B2 C1 D1 N
C2: student A1 B3 C2 D1 Y
D1: fair credit A3 B2 C2 D1 Y
D2: excellent credit A1 B2 C2 D2 Y
Y: buy computer A2 B2 C1 D2 Y
N: dont buy computer A2 B1 C2 D1 Y
A3 B2 C1 D2 N
Association Classification
Let support become 20%
14*20%=2.8 therefore minimum
support count=3
Association Classification
A1: 5
A2:4
A3: 5
B1:4
B2:6
B3:4
C1:7
C2: 7
D1:8
D2:6
Y:9
N:5
Association Classification
Generate All 2-item combination:

A lot of combination!!!
Association Classification
A2 Y
Support: 4/14 Confidence(A2=>Y): 4/4

A1 C1 N
Support: 3/14 Confidence(A1, C1=>N): 3/3
A1 C2 Y

Support: 2/14

Você também pode gostar