Escolar Documentos
Profissional Documentos
Cultura Documentos
ISSN: 2319:2682
Priyanka Gupta
Assistant Professor
Department of computer science
University of Delhi
New Delhi, India
shinepriyanka@gmail.com
common decision tree algorithms in data mining
which use different splitting criteria for splitting the
node at each level to form a homogeneous(i.e. it
contains objectsbelonging to the same category)
node.
II DECISION TREE
A decision tree is a classifier expressed as a recursive
partition of the instance space. The decision tree
consists of nodes that form a rooted tree, meaning it
is a directed tree with a node called root that has no
incoming edges. All other nodes have exactly one
incoming edge. A node with outgoing edges is called
an internal or test node. All other nodes are called
leaves (also known as terminal or decision nodes). In
a decision tree, each internal node splits the instance
space into two or more sub-spaces according to a
certain discrete function of the input attributes values.
Each leaf is assigned to one class representing the
most appropriate targetvalue. Instances are classified
bynavigating them from the root of the tree down to a
leaf, according to theoutcome of the tests along the
path [2].
A decision tree is a tree in which each branch node
represents a choice between a number of alternatives,
and each leaf node represents a classification or
decisionis shown in Fig: 1 below in which each
attribute name is represented in rectangles, attribute
values are represented in ovals and the class
(decision) is represented in diamonds .Decision tree
are commonly used for gaining information for the
purpose of decision -making. Decision tree starts
with a root node on which it is for users to take
actions. From this node, users split each node
recursively according to decision tree learning
algorithm. The final result is a decision tree in
whicheach branch represents a possible scenario of
decision and its outcome [2].
97
Day
O utlook
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Temperature
Sunny
Sunny
Overcast
Rain
Rain
Rain
Overcast
Sunny
Sunny
Rain
Sunny
Overcast
Overcast
Rain
Hot
Hot
Hot
Mild
Cool
Cool
Cool
Mild
Cool
Mild
Mild
Mild
Hot
Mild
ISSN: 2319:2682
Humidity
Wind
Play (CLASS)
High
High
High
High
Normal
Normal
Normal
High
Normal
Normal
Normal
High
Normal
High
Weak
Strong
Weak
Weak
Weak
Strong
Strong
Weak
Weak
Weak
Strong
Strong
Weak
Strong
No
No
Yes
Yes
Yes
No
Yes
No
Yes
Yes
Yes
Yes
Yes
No
O UTLO O K
O vercast
Sunny
Rain
HUMIDITY
WIND
YES
High
NO
Normal
Strong
NO
YES
Weak
YES
98
ISSN: 2319:2682
99
ISSN: 2319:2682
Node N1
CLAS S NO
CLAS S YES
Node N2
N1
0
6
Node N3
N2
1
5
N3
3
3
Impurity
measure
Node 1
Node 2
Node 3
Entropy
-(0/6)log2(0/6)(6/6)log2(6/6)=0
-(1/6)log2(1/6)(5/6)log2(5/6)=0.650
-(3/6)log2(3/6)(3/6)log2(3/6)=1
Gini Index
Index
1-(0/6)2-(6/6)2=0
1-(1/6)2-(5/6)2=0.278
1-(3/6)2-(3/6)2=0.5
Classification
error
1-max[0/6,6/6]=0
1-max[1/6,5/6]=0.167
1-max[0/6,6/6]=0.5
Table: 2
Figure: 2 Computing Impurity M easures
100
ISSN: 2319:2682
101
ISSN: 2319:2682
Characteristic()
Algorithm()
S plitting Criteria
Attribute type
Missing values
Pruning S trategy
Outlier Detection
Susceptible
outliers
ID3
Information Gain
Handles
only
Categorical value
Do not handle
missing values.
No pruning
done
CART
Towing Criteria
Handles
both
Categorical
and
Numeric value
Handle
values.
missing
Cost-Complexity
pruning is used
Can
Outliers
Gain Ratio
Handles
both
Categorical
and
Numeric value
Handle
values.
missing
Error
Based
pruning is used
Susceptible
outliers
C4.5
is
on
handle
on
V CONCLUSION
In this paper we have studied the various basic
properties of the decision tree algorithms which
provides as a better understanding of these algorithms
.We can apply them on different types of data sets
having different types of values and properties and
can attain a best result by knowing that which
algorithm will give the best result on a specific type
of data set.
REFERENCES
[1] Wei Peng, Juhua Chen and Haiping Zhou, of ID3, An
Implementation Decision Tree Learning Algorithm,
University of New South Wales, School of Computer
Science & Engineering, Sydney, NSW 2032, Australia .
[2] LiorRokach and OdedM aiman, Chapter 9 Decision
Trees Department of InduatrialEngineering,Tel-Aviv
University
[3] Vipin Kumar, Pang-Ning Tan and M ichael Steinbach,
Introduction to Data M ining Pearson.
[4]AjanthanRajalingam,
DamandeepM atharu,
KobiVinayagamoorthy,
NarinderpalGhoman,
Data
Mining Course Project: Income Analysis SFWR
ENG/COM SCI 4TF3, December 10, 2002.
102
ISSN: 2319:2682
Authors Profile
Sonia Singh received M.Sc.
Degree from Delhi University in
Computer Science in 2012.
Received bachelors degree from
Delhi University in B.Sc.(H)
Computer Science in 2010.
.Currently working as Assistant
Photo
Professor in Delhi University.
here
Priyanka Gupta received M.Sc.
Degree from Delhi University in
Computer Science in 2012.
Received bachelors degree from
Delhi University in B.Sc. (H)
Computer Science in 2009.
.Currently working as Assistant
Pastein Delhi University.
Professor
here
103