Você está na página 1de 19

Nave Bayes Classifier

Ke Chen
http://intranet.cs.man.ac.uk/mlo/comp20411
/
Modified and extended by Longin Jan Latecki
latecki@temple.edu

Outline
Background
Probability Basics
Probabilistic Classification
Nave Bayes
Example: Play Tennis
Relevant Issues
Conclusions
2

Background
There are three methods to establish a classifier
a) Model a classification rule directly
Examples: k-NN, decision trees, perceptron, SVM

b) Model the probability of class memberships given input data


Example: multi-layered perceptron with the cross-entropy cost

c) Make a probabilistic model of data within each class


Examples: naive Bayes, model based classifiers

a) and b) are examples of discriminative classification


c) is an example of generative classification
b) and c) are both examples of probabilistic classification

Probability Basics
Prior, conditional and joint probability

P(X )
Prior probability:

Conditional probability:
P( X1 |X2 ), P(X2 |X1 )

Joint probability:X ( X1 , X2 ), P( X ) P(X1 ,X2 )

P(X1 ,X2 ) P( X2 |X1 )P( X1 ) P( X1 |X2 )P( X2 )


Relationship:

Independence:
P( X2 |X1 ) P( X2 ), P( X1 |X2 ) P( X1 ), P(X1 ,X2 ) P( X1 )P( X2 )

Bayesian Rule

Likelihood Prior
P( X |C )P(C )
P(C |X )
Posterior
P( X )
Evidence
4

Example by Dieter Fox

Probabilistic Classification
Establishing a probabilistic model for classification

Discriminative model

P(C |X ) C c1 , , c L , X (X1 , , Xn )

Generative model

P( X |C ) C c1 , , c L , X (X1 , , Xn )

MAP classification rule

MAP: Maximum A Posterior

Assign x to c* ifP(C c * |X x ) P(C c |X x) c c * , c c1 , , c L

Generative classification with the MAP rule

P( X |C )P(C )
Apply Bayesian rule to convert:
P(C |X )
P( X |C )P(C )
P( X )
8

Feature Histograms

P(x)
C1

C2

Slide by Stephen Marsland

Posterior Probability
P(C|x)

0
Slide by Stephen Marsland

Nave Bayes
Bayes classification
P(C |X ) P( X |C )P(C ) P( X1 , , Xn |C )P(C )
Difficulty: learning the joint probability
P( X1 , , Xn |C )

Nave Bayes classification

Making the assumption that all input attributes are


independent
P( X , X , , X |C ) P( X |X , , X ; C )P( X , , X |C )
1

P( X1 |C )P( X2 , , Xn |C )
P( X1 |C )P( X2 |C ) P( Xn |C )
*
[ PMAP
( x1 |c *classification
) P( xn |c * )]P( crule
) [ P( x1 |c ) P( xn |c )]P(c ), c c * , c c1 , , c L

11

Nave Bayes
Nave Bayes Algorithm (for discrete input attributes)

Learning Phase: Given a training set S,


For each target value of ci (ci c1 , , c L )

P (C ci ) estimate P(C ci ) with examples in S;

For every attribute value a jk of each attribute x j ( j 1, , n; k 1, , N j )


P ( X j a jk |C ci ) estimate P( X j a jk |C ci ) with examples in S;

Output: conditional probability tables;x j ,for


Nj L
elements

X ( a1 , , an )

Test Phase: Given an unknown instance

*
*
( a |c * up
( a |c * )]to
( cassign
( a the
( a |cc*
to
tables
label
X
if
[ PLook
)

P
P
)

[
P
|
c
)

P
)]
P
(
c
),
c

c
, c c1 , , c L
1
n
1
n

12

Example
Example: Play Tennis

13

Learning Phase
P(Outlook=o|Play=b)
Outlook

P(Temperature=t|Play=b)

Play=Yes Play=No

Temperature

Play=Yes

Play=No

Sunny

2/9

3/5

Hot

2/9

2/5

Overcast

4/9

0/5

Mild

4/9

2/5

Rain

3/9

2/5

Cool

3/9

1/5

P(Humidity=h|Play=b)
Humidity

Play=Yes Play=No

High

3/9

4/5

Normal

6/9

1/5

P(Play=Yes) = 9/14

P(Wind=w|Play=b)
Wind

Play=Yes

Play=No

Strong

3/9

3/5

Weak

6/9

2/5

P(Play=No) = 5/14
14

Example
Test Phase

Given a new instance,


x=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong)
Look up tables
P(Outlook=Sunny|Play=Yes) = 2/9

P(Outlook=Sunny|Play=No) = 3/5

P(Wind=Strong|Play=Yes) = 3/9

P(Wind=Strong|Play=No) = 3/5

P(Temperature=Cool|Play=Yes) = 3/9 P(Temperature=Cool|Play==No) = 1/5


P(Huminity=High|Play=No) = 4/5
P(Huminity=High|Play=Yes) = 3/9
P(Play=Yes) = 9/14

P(Play=No) = 5/14

MAP rule

P(Yes|x): [P(Sunny|Yes)P(Cool|Yes)P(High|Yes)P(Strong|Yes)]P(Play=Yes) =
0.0053
P(No|x): [P(Sunny|No) P(Cool|No)P(High|No)P(Strong|No)]P(Play=No) = 0.0206

Given the fact P(Yes|x) < P(No|x), we label x to be No.


15

Relevant Issues
Violation of Independence Assumption

For many real world tasks,


P( X1 , , Xn |C ) P( X1 |C ) P( Xn |C )

Nevertheless, nave Bayes works surprisingly well


anyway!

Zero conditional probability Problem


X j a jk , P ( X j a jk |C ci ) 0

If no example contains
attribute
( xvalue
P ( x1 |cthe
)

P
(
a
|
c
)

P
n |ci ) 0
i
jk i
In this circumstance,

during test

For a remedy, conditional


probabilities
estimated with
n mp
P ( X j a jk |C ci ) c
nm
Laplace smoothing:

nc : number of training examples for which X j a jk and C ci


n : number of training examples for which C ci

p : prior estimate (usually, p 1 /t for t possible values of X j )

m : weight to prior (number of " virtual" examples, m 1)


16

Homework
Redo the test on Slide 15 using the formula on
Slide 16 with m=1.
Compute P(Play=Yes|x) and P(Play=No|x) with
m=0 and with m=1 for
x=(Outlook=Overcast, Temperature=Cool, Humidity=High, Wind=Strong)

Does the result change?

17

Relevant Issues
Continuous-valued Input Attributes

Numberless values for an attribute

Conditional probability modeled with the normal


distribution
( X )2
1
P ( X j |C ci )

j
ji

exp
2

2 ji
2 ji

ji : mean (avearage) of attribute values X j of examples for which C ci

ji : standard deviation of attribute values X j of examples for which C ci

for X ( X1 , , Xn ), C c1 , , c L
LearningnPhase:
L
P(C ci ) i 1, , L
Output:
and
fornormal
X ( X1distributions
, , Xn )
Test Phase:
Calculate conditional probabilities with all the normal distributions
Apply the MAP rule to make a decision
18

Conclusions
Nave Bayes based on the independence assumption

Training is very easy and fast; just requiring considering each


attribute in each class separately

Test is straightforward; just looking up tables or calculating


conditional probabilities with normal distributions

A popular generative model

Performance competitive to most of state-of-the-art classifiers


even in presence of violating independence assumption

Many successful applications, e.g., spam mail filtering

Apart from classification, nave Bayes can do more

19

Você também pode gostar