Você está na página 1de 41

Decision Tree

Under the Guidance of


Dr. C.N. Ravi Kumar
HOD, Dept. of Computer Science
SJCE, Mysore

Nitin Jain
5VZ07SSZ04
IV Sem Software Engineering
SJCE Mysore
1
Content
• Decision Tree
• The Construction Principal
• The Best Split
• Splitting Indices
• Splitting Criteria
• Decision Tree Construction Algorithms
• CART
• ID3
• C4.5

2
Content

• CHAID
• Decision Tree Construction with Presorting
• Rain Forest
• Approximate Method
• CLOUDS
• BOAT
• Pruning Technique
• Integration of Pruning and Construction
• Conclusion

3
Decision Tree
• It’s a classification scheme which generate
a tree and a set of rules, representing the
model of different class, from a given data
set.
• Training set- -derive classifier
• Test set - -to measure accuracy

4
Eg of decision tree

5
Rules to classify

6
A Decision Tree

7
Rule 1= 50%
Rule 2= 50%
Rule 3= 66%

8
Example 2

9
Advantage of Decision tree
Classification
• Easy to understood
• Can handle both numerical & categorical
attribute.
• Provide clear indication of which field is
important for predication and
classification.

10
Shortcoming of Decision tree
Classification

• Some decision tree can deal only with


binary-valued.
• The process of growing tree is expensive

11
Principal to Construct tree
• Construction Phase: Initial Decision tree is
Constructed in this Phase
• Pruning Phase: In this stage lower
branches are removed to improve the
performance
• Processing the Pruned tree to improve the
performance.

12
OVERFIT
• A tree is set to over fit the training data if
there exist some other tree T’ which is a
simplification of T, such as T has smaller
error then T’ over the training set but T’
has a smaller error than T over the entire
distribution of instances.

13
OVERFIT Leads to
• OVERFIT lead to difficulties when there is
a noise in the training data, or when the
number of training example is too small.

• Incorrect model
• Requirement of space be increased and more
computational resource,
• Require unnecessary collection of features
• Difficult to comprehend
14
The Best Split
• Evaluation of splits for each attribute and
the selection of the best split;
determination of the splitting attribute
• Determination of the splitting condition on
the selected splitting attribute and
• Partition the data using best split.

15
The Best Split
• The splitting depend on domain of the
attribute being numerical or categorical.
• The best split is one which does the best
job of separating the record in to group,
which predominates.
• But how to determine the BEST ONE

16
Splitting INDICES
• ENTROPY
• INFORMATION FOR PARTITION
• GAIN
• GINI INDEX

17
ENTROPY
• Entropy provides an information approach to
measure the goodness of a split.
• Entropy (P)=-[p1log(p1)+p2log(p2)+…+pnlog(pn)]

• Entropy gives an idea of how to split an attribute


from a tree.
• ‘yes’ or ‘no’ in our example.

18
Example Entropy

19
INFORMATION FOR PARTITION
• If T is partitioned based on the value of non-class attribute X, into
set T1,T2,….Tn then the information needed to identify the class of
an element T become the weight average of the information to
identify the class of the element of Ti, i.e the Weighted Average

20
GAIN
• The information Gain represents the difference
between the information needed to identify an
element of T and the information needed to
identify an element of T after the value of
attribute X is obtained.
GAIN (X,T)= Info (T)-Info (X,T)

21
GAIN RATIO

GINI INDEX

22
CART Classification And Regression
Tree
• CART (Classification And Regression Tree) is
one of the popular methods of building decision
trees in the machine learning community.
• CART builds a binary decision tree by splitting
the records at each node, according to a
function of a single attribute.
• CART uses the gini index for determining the
best split.
• The initial split produces two nodes, each of
which we now attempt to split in the same
manner as the root node
23
CART
• At the end of the tree-growing process, every
record of the training set has been assigned to
some leaf of the full decision tree. Each leaf can
now be assigned a class and an error rate. The
error rate of a leaf node is the percentage of
incorrect classification at that node. The error
rate (If an entire decision tree is a weighted sum
of the error rates of all the leaves. Each leaf ‘s
contribution to the total is the error rate at that
leaf multiplied by the probability that a record will
end up in there.
24
ID3
• Quinlan introduced the ID3, Iterative Dichotomizer 3, for
constructing the decision trees from data.
• In ID3, each node corresponds to a splitting attribute and
each arc is a possible value of that attribute. At each
node the spIitting attribute is selected to be the most
informative among the attributes not yet considered in
the path from the root.
• Entropy is used to measure how informative is a node.
This algorithm uses the criterion of information gain to
determine the goodness of a split. The attribute with .the
greatest information gain is taken as the splitting
attribute, and the data set is split for all distinct values of
the attribute.
25
C4.5
• C4.5 is an extension of ID3 that accounts for
unavailable values, continuous attribute value
ranges, pruning of decision trees and rule
derivation.
• In building a decision tree, we can deal with
training sets that have records with unknown
attribute values by evaluating the gain, or the
gain ratio, for an attribute by considering only
those records where those attribute values are
available.
26
CHAID

• CHAID, proposed by Kass in 1980, is a


derivative of AID (Automatic Interaction
Detection), proposed by Hartigan in 1975.
CHAID attempts to stop growing the tree
before overfitting occurs, whereas the
above algorithms generate a fully grown
tree and then carry out pruning as post-
processing step. In that sense, CHAID
avoids the pruning phase.
27
Rain Forest

• Rainforest, like other algorithms, is a top-


down decision tree induction scheme. It is
similar to SPRINT, but instead of attribute
lists it uses a different structure called the
A VC-set. The attribute list has one tuple
corresponding to each record of the
training data set. But the A VC-set has
one entry for each distinct value of an
attribute. Thus, the size of the A VC-set for
any attribute is much smaller,
28
CLOUDS (Classification of Large or Out-of-
core Data Sets)
• CLOUDS is a kind of approximate version of the
SPRINT method. It also uses a ,breadth-first
strategy to b,uild the decision tree. CLOUDS
uses the gini index for evaluating the split index
of the attributes. There are different criteria for
splitting the categorical and numerical attributes.
The method of finding the gini index for the
categorical attributes is the same as that used in
thc SPRlNT algorithm. Categorical attributes do
not require any sorting, but it refers to the count
matrix. We discuss the steps in which CLOUDS
differs from the SPRINT method

29
BOAT (Bootstrap Optimistic Algorithm for Tree
Construction

• BOAT (Bootstrap Optimistic Algorithm for Tree


Construction) is another approximate algorithm
based on sampling.
• As the name suggests, it is based on a well-
known' statistical principle called boot straping.
• We take a very large sample D from the training
data set T, so that D fits into the main memory.
• Now, using the boot straping technique, we take
many small samples with replacement of D as
D1, D2, •.• , Dk and construct, by any of the
known methods, decision trees DT1, DT2, ••• ,
DTk, respectively, for each of these samples.
30
BOAT

• From these trees, called bootstrap trees,


we construct a single decision tree for the
sample data set D. We call this tree the
sample tree.

31
Pruning Technique
• The decision tree built using the training set,
deals correctly with most of the records in the
training set. This is inherent to the way it was
built.
• Overfitting is one of unavoidable situations that
may arise due to such construction. Moreover, if
the tree becomes very deep, lopsided or bushy,
the rules yielding from the trees become
unmanageable and difficult to comprehend.
Therefore, a pruning phase is invoked after the
construction to arrive at a (sort of) optimal
decision tree,
32
COST OF ENCODING TREE
• THE COST OF ENCODING THE
STRUCTURE OF THE TREE The
structure of the tree can be encoded by
using a single bit, in order to specify
whether a node of the tree is an internal
node or a leaf node. A non-leaf node can
be represented by 1 and a leaf node by O.

33
Now that we have formulated the cost of a tree, we next
turn our attention to computing the minimum cost subtree
of the tree constructed in the building phase. The main idea
is that if minimum cost at a node is the cost of the node,
then the splitting of the node would result in a higher cost
and hence the subtree at this node is removed.

34
Integration of Pruning and
Construction
• A natural question that arises is whether
we can incorporate the pruning step within
the construction process of the decision
tree. In other words, instead of pruning
being a post-processing step after the tree
is fully built, whether it is possible to
identify a node that need not be expanded
right at the time of construction as it is
likely to be pruned eventually.
35
SUMMARY: AN IDEAL
ALGORITHM
• After studying different strategies for building a
decision tree for a large database let us try to
build an ideal decision tree algorithm based on
these principles. We are given a training data
set.
• First of all, we must discover the concept
hierarchy of each of the attributes. This
technique is essentially similar to dimension
modeling.
• This study helps in removing certain attribute
from the database.
36
AN IDEAL ALGORITHM
• As we move up this hierarchy, some of the numerical
attributes may turn into categorical ones,
• Then we identify the numerical and categorical attribute
Just having the numerical data type does not necessarily
imply that the attribute i numerical (for example,pincode),
• We take a large sample of the training set so that the
sample fits into the memory. Sampling involves select
operations (tuple sampling) an project operations
(attribute removal), Construct the attribute list for each
attribute Construct a generalized attribute list for each
attribute, based on dimension hierarch

37
• #include<stdio.h>
• void main()
• {
• int h,t,windy,out;
• clrscr();
• printf("1 for Sunny\n2 for Overcast\n3 for Rain");
• scanf("%d",&out);
• printf("ENTER THE TEMP,HUMIDITY");
• scanf("%d%d",&t,&h);
• printf("\nENTER 1 for WINDY 0 for not windy");
• scanf("%d",&windy);
• if(out==1 && h<75)
• printf("PLAY");
• else if(out==1 && h>75)
• printf("NO PLAy");
• else if(out==2)
• printf("PLAY");
• else if(out==3 && windy==0)
• printf("PLAy");
• else if(out==3 && windy==1)
• printf("NO PLAY");
• getch();
• }

38
#include<stdio.h>
void main()
{
int humi,temp,windy;
int out;
clrscr();
printf("Enter 1 for Sunny\n2 for Overcast\n3 for Rain");
scanf("%d",&out);

if(out==1)
{
printf("ENTER THE HUMIDITY");
scanf("%d",&humi);
if (humi>75)
printf("no play");
else
printf("play");
}

if(out==2)
{
printf("play");
}

if(out==3)
{
printf("enter wind condition 1 for windy 2 for no wind");
scanf("%d",windy);
if(windy==0)
printf("play");
else
printf("no play");
}
39
getch();
}
BIBLIOGRAPHY
• Data Mining & Ware House by Arun K Pujari.
• SPRINT: A Scalable Parallel Classifier for Data Mining
John Shafer Rakeeh Agrawal Manish Mehta IBM
Almaden Research Center
• Data Mining Classification: Basic Concepts, Decision
Trees, and Model Evaluation Introduction to Data Mining
by Tan, Steinbach, Kumar
• Advanced Data mining Techniques by David L Olson
Dursun Delen
• Data Mining Concept and Techniques by Jiawei Han
Micheline Lamber
• Google, Wekipedia Web RESOURCE

40
THNK YOU

41

Você também pode gostar