Você está na página 1de 36

Lecture 8: Graphical Models II Machine Learning

Andrew Rosenberg

March 5, 2010

1/1

Today

Graphical Models
Naive Bayes classication Conditional Probability Tables (CPTs) Inference in Graphical Models and Belief Propagation

2/1

Recap of Graphical Models

Graphical Models Graphical representation of the dependency relationships between random variables.

x3 x1 x0 x2 x4 x5

3/1

Topological Graph
Graphical models factorize probabilities
n1 n1

p(x0 , . . . , xn1 ) =
i =0

p(xi |pai ) =
i =0

p(xi |i )

Nodes are generally topologically ordered so that parents, come before children.

x3 x1 x0 x2 x4 x5

4/1

Plate Notation of a Graphical Model

Recall the Naive Bayes Graphical Model y x0 ... xn

There can be many variables xi . Plate notation gives a compact representation of models like this:
' &

$ %
5/1

xi

Naive Bayes Example

Flu Y N Y Y

Fever L M H M

Sinus Y N Y Y y

Ache Y N N N

Swell Y N Y N

Head N N Y Y

x0

x1

x2

x3

x4

6/1

Naive Bayes Example

Flu Y N Y Y

Fever L M H M

Sinus Y N N Y Flu

Ache Y N N N

Swell Y N Y N

Head N N Y Y

Fev p(u) = Y .75 N .25

Sin

Ach Swe Hea

7/1

Naive Bayes Example

Flu Y N Y Y

Fever L M H M

Sinus Y N N Y Flu

Ache Y N N N

Swell Y N Y N

Head N N Y Y

Fev Y p(u) = .75

Sin

Ach Swe Hea L .33 0 M .33 1 H .33 0

N p(fev |u) = Y .25 N

8/1

Naive Bayes Example

Flu Y N Y Y

Fever L M H M

Sinus Y N N Y Flu

Ache Y N N N

Swell Y N Y N

Head N N Y Y

Fev Y p(u) = .75 N .25

Sin

Ach Swe Hea Y .67 1 N .33 0

p(sinus|u) = Y N

9/1

Naive Bayes Example

Flu Y N Y Y

Fever L M H M

Sinus Y N N Y Flu

Ache Y N N N

Swell Y N Y N

Head N N Y Y

Fev Y p(u) = .75 N .25

Sin

Ach Swe Hea Y .33 1 N .67 0

p(ache|u) = Y N

10 / 1

Naive Bayes Example

Flu Y N Y Y

Fever L M H M

Sinus Y N N Y Flu

Ache Y N N N

Swell Y N Y N

Head N N Y Y

Fev Y p(u) = .75 N .25

Sin

Ach Swe Hea Y .67 1 N .33 0

p(swell |u) = Y N

11 / 1

Naive Bayes Example

Flu Y N Y Y

Fever L M H M

Sinus Y N N Y Flu

Ache Y N N N

Swell Y N Y N

Head N N Y Y

Fev Y p(u) = .75 N .25

Sin

Ach Swe Hea Y .67 1 N .33 0

p(head|u) = Y N

12 / 1

Naive Bayes Example


Flu Y N Y Y ? Fever L M H M M Sinus Y N N Y N Ache Y N N N N Swell Y N Y N N Head N N Y Y N

Flu Fev Sin Ach Swe Hea

Find p(u|fever , sinus, ache, swell, head). p(u = Y )p(fev = M|u = Y )p(sin = N|u = Y )p(ach = N|u = Y )p(swe = N|u = Y )p(head = N|u = Y ) Y N p(u) = .75 .25

13 / 1

Naive Bayes Example


Flu Y N Y Y ? Fever L M H M M Sinus Y N N Y N Ache Y N N N N Swell Y N Y N N Head N N Y Y N

Flu Fev Sin Ach Swe Hea

Find p(u|fever , sinus, ache, swell, head). .75 p(fev = M|u = Y )p(sin = N|u = Y )p(ach = N|u = Y )p(swe = N|u = Y )p(head = N|u = Y ) L M H p(fev |u) = Y .33 .33 .33 N 0 1 0
14 / 1

Naive Bayes Example


Flu Y N Y Y ? Fever L M H M M Sinus Y N N Y N Ache Y N N N N Swell Y N Y N N Head N N Y Y N

Flu Fev Sin Ach Swe Hea

Find p(u|fever , sinus, ache, swell, head). .75 .33 p(sin = N|u = Y )p(ach = N|u = Y )p(swe = N|u = Y )p(head = N|u = Y ) Y N p(sinus|u) = Y .67 .33 N 1 0
15 / 1

Naive Bayes Example


Flu Y N Y Y ? Fever L M H M M Sinus Y N N Y N Ache Y N N N N Swell Y N Y N N Head N N Y Y N

Flu Fev Sin Ach Swe Hea

Find p(u|fever , sinus, ache, swell, head). .75 .33 .33 p(ach = N|u = Y )p(swe = N|u = Y )p(head = N|u = Y ) p(ache|u) = Y N Y .33 1 N .67 0
16 / 1

Naive Bayes Example


Flu Y N Y Y ? Fever L M H M M Sinus Y N N Y N Ache Y N N N N Swell Y N Y N N Head N N Y Y N

Flu Fev Sin Ach Swe Hea

Find p(u|fever , sinus, ache, swell, head). .75 .33 .33 .67 p(swe = N|u = Y )p(head = N|u = Y ) Y N p(swell|u) = Y .67 .33 N 1 0
17 / 1

Naive Bayes Example


Flu Y N Y Y ? Fever L M H M M Sinus Y N N Y N Ache Y N N N N Swell Y N Y N N Head N N Y Y N

Flu Fev Sin Ach Swe Hea

Find p(u|fever , sinus, ache, swell, head). .75 .33 .33 .67 .33 p(head = N|u = Y ) Y N p(head|u) = Y .67 .33 N 1 0
18 / 1

Naive Bayes Example

Flu Y N Y Y ?

Fever L M H M M

Sinus Y N N Y N

Ache Y N N N N

Swell Y N Y N N

Head N N Y Y N

Flu Fev Sin Ach Swe Hea

Find p(u|fever , sinus, ache, swell, head). .75 .33 .33 .67 .33 .33 = 0.0060

19 / 1

Completely Observed graphical models


Suppose we have observations for every node. Flu Y N Y Y Fever L M H M Sinus Y N N Y Ache Y N N N Swell Y N Y N Head N N Y Y

In the simplest least general graph, assume each independent. Train 6 separate models. Fl Fe Si Ac Sw He

2nd simplest graph most general assume no independence. Build a 6-dimensional table. (Divide by total count.) Fl Fe Si Ac Sw He

20 / 1

Maximum Likelihood Conditional Probability Tables


Consider this Graphical Model x1 x2 x3 x5 x4

x0

Each node has a conditional probability table i . Given the table, we have a pdf p(x|) =
M1 Y i =0

p(xi |i , i )

We have m variables in x, and N data points, X. Maximum (log) Likelihood


N1 X n=0 M1 Y i =0

= =

argmax ln p(X|)

argmax

ln

p(xin |i )

argmax

N1 X n=0

ln p(Xn |)

argmax

N1 M1 XX n=0 i =0

ln p(xin |i )
21 / 1

Maximum Likelihood CPTs

First, Kroneckers delta function. (xn , xm ) = 1 0 if xn = xm otherwise

Counts: the number of times something appears in the data m(xi ) =


N1 X n=0

(xi , xin )

m(X)

N1 X n=0

(X, Xn ) !!

N=

X
x1

m(x1 ) =

X X
x1 x2

(x1 , x2 )

X X X
x1 x2 x3

(x1 , x2 , x3 )

...

22 / 1

Maximum likelihood CPTs


M1 Y i =0

l()

= =

N1 X X n=0 X

N1 X n=0

N1 X n=0

= ln p(Xn |) Y
X

X
xn xn

m(X) ln

p(xi |i , i )

ln

p(X|)(xn ,X)

X X M1
i =0 M1 X

m(X) ln p(xi |i , i )

(xn , X) ln p(X|) =

i =0 xi ,i X\xi \i

X X X

m(X) ln p(xi |i , i )

X
xn

m(X) ln p(X|)

M1 X

m(xi , i ) ln p(xi |i , i )

i =0 xi ,i

Dene a function: (xi , i ) = p(xi |i , i ) Constraint: X


xi

(xi , i ) = 1
23 / 1

ML With Lagrange Multipliers


Lagrange Multipliers To maximize f (x, y )subject to g (x, y ) = c. Maximize f (x, y ) (g (x, y ) c)
l() l() (xi , i ) (xi , i ) X m(xi , i ) i x
i

= = = = = =

M1 X X X i =0 xi i

m(xi , i ) ln (xi , i )

M1 X X i =0 i

m(xi , i ) i = 0 (xi , i ) m(xi , i ) i 1 the constraint X


xi

i @

X
xi

(xi , i ) 1A

i (xi , i )

m(xi , i ) = m(i )

m(xi , i ) counts! m(i )


24 / 1

Maximum A Posteriori CPT training

For the Bayesians, MAP leads to: (xi , i ) = m(xi , i ) + m(i ) + |xi |

25 / 1

Example of maximum likelihood.

Flu (x0 ) Y N Y Y

Fever (x1 ) L M H M

Sinus (x2 ) Y N Y Y x1

Ache (x3 ) Y N N N x3

Swell (x4 ) Y N Y N

Head (x5 ) N N Y Y

x0 x2 x4

x5

(xi , i ) =

m(xi , i ) m(i )

26 / 1

Conditional Dependence Test.


We also what to be able to check conditional independencies in a graphical model. i.e. Is achiness (x3 ) independent of u (x0 ) given fever (x1 )? i.e. Is achiness (x3 ) independent of sinus infection (x2 ) given fever (x1 )? p(x) = p(x0 )p(x1 |x0 )p(x2 |x0 )p(x3 |x1 )p(x4 |x2 )p(x5 |x1 , x4 ) p(x3 |x0 , x1 , x2 ) = p(x0 , x1 , x2 , x3 ) p(x0 , x1 , x2 ) p(x0 )p(x1 |x0 )p(x2 |x0 )p(x3 |x1 ) = p(x0 )p(x1 |x0 )p(x2 |x0 ) = p(x3 |x1 )

x3 x0 , x2 |x1 No problem, right? What about x0 x5 |x1 , x2 ?


27 / 1

D-separation and Bayes Ball




x0

x1 x2

x3 x5

Intuition: nodes are separated, or blocked by sets of nodes Example: nodes x1 and x2 , block the path from x0 to x5 , then x0 x5 |x2 , x3



x4

28 / 1

D-separation and Bayes Ball




x0

x1 x2

x3 x5

Intuition: nodes are separated, or blocked by sets of nodes Example: nodes x1 and x2 , block the path from x0 to x5 , then x0 x5 |x2 , x3 While this is true in undirected graphs, it is not in directed graphs. We need more than simple Separation We need directed separation D-Separation the D-separation is computed using the Bayes Ball algorithm. Allows us to prove general statements xa xb |xc .
29 / 1



x4

Bayes Ball Algorithm

xa xb |xc Shade nodes xc Place a ball at each node in xa Bounce balls around the graph according to some rules If no balls read xb , then xa xb |xc , else false. Balls can travel along/against edges Pick any path Test to see if the ball goes through or bounces back.

30 / 1

Ten Rules of Bayes Ball

31 / 1

Bayes Ball Example - I

x0 x4 |x2 ?

x3 x1 x0 x2 x4 x5

32 / 1

Bayes Ball Example - II

x0 x5 |x1 , x2 ?

x3 x1 x0 x2 x4 x5

33 / 1

Undirected Graphs

What if we allow undirected graphs? What do they correspond to? Its not cause/eect, or trigger/response, rather, general dependence. Example: Image pixels, where each pixel is a bernouli. Can have a probability over all pixels p(x11 , x1M , xM1 , xMM ) Bright pixels have bright neighbors. No parents, just probabilities. Grid models are called Markov Random Fields.
34 / 1

Undirected Graphs

z x w x y |{w , x} Undirected separation is easy. To check xa xb |xc , check Graph reachability of xa and xb without going through nodes in xc . y

35 / 1

Bye

Next
Representing probabilities in Undirected Graphs.

36 / 1

Você também pode gostar