CH 8 Bayes

Nave Bayes Classifier
Ke Chen
http://intranet.cs.man.ac.uk/mlo/comp20411
/
Modified and extended by Longin Jan Latecki
latecki@temple.edu
Outline
Background
Probability Basics
Probabilistic Classification
Nave Bayes
Example: Play Tennis
Relevant Issues
Conclusions
2
Background
There are three methods to establish a classifier
a) Model a classification rule directly
Examples: k-NN, decision trees, perceptron, SVM
b) Model the probability of class memberships given input data

Example: multi-layered perceptron with the cross-entropy cost
c) Make a probabilistic model of data within each class

Examples: naive Bayes, model based classifiers
a) and b) are examples of discriminative classification

c) is an example of generative classification
b) and c) are both examples of probabilistic classification
Probability Basics
Prior, conditional and joint probability
P(X )
Prior probability:
Conditional probability:
P( X1 |X2 ), P(X2 |X1 )
Joint probability:X ( X1 , X2 ), P( X ) P(X1 ,X2 )
P(X1 ,X2 ) P( X2 |X1 )P( X1 ) P( X1 |X2 )P( X2 )

Relationship:
Independence:
P( X2 |X1 ) P( X2 ), P( X1 |X2 ) P( X1 ), P(X1 ,X2 ) P( X1 )P( X2 )
Bayesian Rule
Likelihood Prior
P( X |C )P(C )
P(C |X )
Posterior
P( X )
Evidence
4
Example by Dieter Fox
Probabilistic Classification
Establishing a probabilistic model for classification
Discriminative model
P(C |X ) C c1 , , c L , X (X1 , , Xn )
Generative model
P( X |C ) C c1 , , c L , X (X1 , , Xn )
MAP classification rule
MAP: Maximum A Posterior
Assign x to c* ifP(C c * |X x ) P(C c |X x) c c * , c c1 , , c L
Generative classification with the MAP rule
P( X |C )P(C )
Apply Bayesian rule to convert:
P(C |X )
P( X |C )P(C )
P( X )
8
Feature Histograms
P(x)
C1
C2
Slide by Stephen Marsland
Posterior Probability
P(C|x)
0
Slide by Stephen Marsland
Nave Bayes
Bayes classification
P(C |X ) P( X |C )P(C ) P( X1 , , Xn |C )P(C )
Difficulty: learning the joint probability
P( X1 , , Xn |C )
Nave Bayes classification
Making the assumption that all input attributes are

independent
P( X , X , , X |C ) P( X |X , , X ; C )P( X , , X |C )
1
P( X1 |C )P( X2 , , Xn |C )
P( X1 |C )P( X2 |C ) P( Xn |C )
*
[ PMAP
( x1 |c *classification
) P( xn |c * )]P( crule
) [ P( x1 |c ) P( xn |c )]P(c ), c c * , c c1 , , c L
11
Nave Bayes
Nave Bayes Algorithm (for discrete input attributes)
Learning Phase: Given a training set S,

For each target value of ci (ci c1 , , c L )
P (C ci ) estimate P(C ci ) with examples in S;
For every attribute value a jk of each attribute x j ( j 1, , n; k 1, , N j )

P ( X j a jk |C ci ) estimate P( X j a jk |C ci ) with examples in S;
Output: conditional probability tables;x j ,for

Nj L
elements
X ( a1 , , an )
Test Phase: Given an unknown instance
*
*
( a |c * up
( a |c * )]to
( cassign
( a the
( a |cc*
to
tables
label
X
if
[ PLook
)
P
P
)
[
P
|
c
)
P
)]
P
(
c
),
c
c
, c c1 , , c L
1
n
1
n
12
Example
Example: Play Tennis
13
Learning Phase
P(Outlook=o|Play=b)
Outlook
P(Temperature=t|Play=b)
Play=Yes Play=No
Temperature
Play=Yes
Play=No
Sunny
2/9
3/5
Hot
2/9
2/5
Overcast
4/9
0/5
Mild
4/9
2/5
Rain
3/9
2/5
Cool
3/9
1/5
P(Humidity=h|Play=b)
Humidity
Play=Yes Play=No
High
3/9
4/5
Normal
6/9
1/5
P(Play=Yes) = 9/14
P(Wind=w|Play=b)
Wind
Play=Yes
Play=No
Strong
3/9
3/5
Weak
6/9
2/5
P(Play=No) = 5/14
14
Example
Test Phase
Given a new instance,

x=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong)
Look up tables
P(Outlook=Sunny|Play=Yes) = 2/9
P(Outlook=Sunny|Play=No) = 3/5
P(Wind=Strong|Play=Yes) = 3/9
P(Wind=Strong|Play=No) = 3/5
P(Temperature=Cool|Play=Yes) = 3/9 P(Temperature=Cool|Play==No) = 1/5

P(Huminity=High|Play=No) = 4/5
P(Huminity=High|Play=Yes) = 3/9
P(Play=Yes) = 9/14
P(Play=No) = 5/14
MAP rule
P(Yes|x): [P(Sunny|Yes)P(Cool|Yes)P(High|Yes)P(Strong|Yes)]P(Play=Yes) =
0.0053
P(No|x): [P(Sunny|No) P(Cool|No)P(High|No)P(Strong|No)]P(Play=No) = 0.0206
Given the fact P(Yes|x) < P(No|x), we label x to be No.

15
Relevant Issues
Violation of Independence Assumption
For many real world tasks,

P( X1 , , Xn |C ) P( X1 |C ) P( Xn |C )
Nevertheless, nave Bayes works surprisingly well

anyway!
Zero conditional probability Problem

X j a jk , P ( X j a jk |C ci ) 0
If no example contains
attribute
( xvalue
P ( x1 |cthe
)
P
(
a
|
c
)
P
n |ci ) 0
i
jk i
In this circumstance,
during test
For a remedy, conditional

probabilities
estimated with
n mp
P ( X j a jk |C ci ) c
nm
Laplace smoothing:
nc : number of training examples for which X j a jk and C ci

n : number of training examples for which C ci
p : prior estimate (usually, p 1 /t for t possible values of X j )
m : weight to prior (number of " virtual" examples, m 1)

16
Homework
Redo the test on Slide 15 using the formula on
Slide 16 with m=1.
Compute P(Play=Yes|x) and P(Play=No|x) with
m=0 and with m=1 for
x=(Outlook=Overcast, Temperature=Cool, Humidity=High, Wind=Strong)
Does the result change?
17
Relevant Issues
Continuous-valued Input Attributes
Numberless values for an attribute
Conditional probability modeled with the normal

distribution
( X )2
1
P ( X j |C ci )
j
ji
exp
2
2 ji
2 ji
ji : mean (avearage) of attribute values X j of examples for which C ci
ji : standard deviation of attribute values X j of examples for which C ci
for X ( X1 , , Xn ), C c1 , , c L
LearningnPhase:
L
P(C ci ) i 1, , L
Output:
and
fornormal
X ( X1distributions
, , Xn )
Test Phase:
Calculate conditional probabilities with all the normal distributions
Apply the MAP rule to make a decision
18
Conclusions
Nave Bayes based on the independence assumption
Training is very easy and fast; just requiring considering each

attribute in each class separately
Test is straightforward; just looking up tables or calculating

conditional probabilities with normal distributions
A popular generative model
Performance competitive to most of state-of-the-art classifiers

even in presence of violating independence assumption
Many successful applications, e.g., spam mail filtering
Apart from classification, nave Bayes can do more
19

CH 8 Bayes

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

CH 8 Bayes

Enviado por

Direitos autorais:

Formatos disponíveis

Nave Bayes Classifier

b) Model the probability of class memberships given input data

c) Make a probabilistic model of data within each class

a) and b) are examples of discriminative classification

Joint probability:X ( X1 , X2 ), P( X ) P(X1 ,X2 )

P(X1 ,X2 ) P( X2 |X1 )P( X1 ) P( X1 |X2 )P( X2 )

Example by Dieter Fox

MAP classification rule

MAP: Maximum A Posterior

Assign x to c* ifP(C c * |X x ) P(C c |X x) c c * , c c1 , , c L

Generative classification with the MAP rule

Slide by Stephen Marsland

Nave Bayes classification

Making the assumption that all input attributes are

Learning Phase: Given a training set S,

P (C ci ) estimate P(C ci ) with examples in S;

For every attribute value a jk of each attribute x j ( j 1, , n; k 1, , N j )

Output: conditional probability tables;x j ,for

Test Phase: Given an unknown instance

Given a new instance,

P(Temperature=Cool|Play=Yes) = 3/9 P(Temperature=Cool|Play==No) = 1/5

Given the fact P(Yes|x) < P(No|x), we label x to be No.

For many real world tasks,

Nevertheless, nave Bayes works surprisingly well

Zero conditional probability Problem

For a remedy, conditional

nc : number of training examples for which X j a jk and C ci

p : prior estimate (usually, p 1 /t for t possible values of X j )

m : weight to prior (number of " virtual" examples, m 1)

Does the result change?

Numberless values for an attribute

Conditional probability modeled with the normal

ji : mean (avearage) of attribute values X j of examples for which C ci

ji : standard deviation of attribute values X j of examples for which C ci

Training is very easy and fast; just requiring considering each

Test is straightforward; just looking up tables or calculating

A popular generative model

Performance competitive to most of state-of-the-art classifiers

Many successful applications, e.g., spam mail filtering

Apart from classification, nave Bayes can do more

Você também pode gostar