Andrew Rosenberg - Lecture 8: Graphical Models II Machine Learning

Lecture 8: Graphical Models II Machine Learning
Andrew Rosenberg
March 5, 2010
1/1
Today
Graphical Models
Naive Bayes classication Conditional Probability Tables (CPTs) Inference in Graphical Models and Belief Propagation
2/1
Recap of Graphical Models
Graphical Models Graphical representation of the dependency relationships between random variables.
x3 x1 x0 x2 x4 x5
3/1
Topological Graph
Graphical models factorize probabilities
n1 n1
p(x0 , . . . , xn1 ) =
i =0
p(xi |pai ) =
i =0
p(xi |i )
Nodes are generally topologically ordered so that parents, come before children.
x3 x1 x0 x2 x4 x5
4/1
Plate Notation of a Graphical Model
Recall the Naive Bayes Graphical Model y x0 ... xn
There can be many variables xi . Plate notation gives a compact representation of models like this:
' &
$ %
5/1
xi
Naive Bayes Example
Flu Y N Y Y
Fever L M H M
Sinus Y N Y Y y
Ache Y N N N
Swell Y N Y N
Head N N Y Y
x0
x1
x2
x3
x4
6/1
Naive Bayes Example
Flu Y N Y Y
Fever L M H M
Sinus Y N N Y Flu
Ache Y N N N
Swell Y N Y N
Head N N Y Y
Fev p(u) = Y .75 N .25
Sin
Ach Swe Hea
7/1
Naive Bayes Example
Flu Y N Y Y
Fever L M H M
Sinus Y N N Y Flu
Ache Y N N N
Swell Y N Y N
Head N N Y Y
Fev Y p(u) = .75
Sin
Ach Swe Hea L .33 0 M .33 1 H .33 0
N p(fev |u) = Y .25 N
8/1
Naive Bayes Example
Flu Y N Y Y
Fever L M H M
Sinus Y N N Y Flu
Ache Y N N N
Swell Y N Y N
Head N N Y Y
Fev Y p(u) = .75 N .25
Sin
Ach Swe Hea Y .67 1 N .33 0
p(sinus|u) = Y N
9/1
Naive Bayes Example
Flu Y N Y Y
Fever L M H M
Sinus Y N N Y Flu
Ache Y N N N
Swell Y N Y N
Head N N Y Y
Fev Y p(u) = .75 N .25
Sin
p(ache|u) = Y N
10 / 1
Naive Bayes Example
Flu Y N Y Y
Fever L M H M
Sinus Y N N Y Flu
Ache Y N N N
Swell Y N Y N
Head N N Y Y
Fev Y p(u) = .75 N .25
Sin
p(swell |u) = Y N
11 / 1
Naive Bayes Example
Flu Y N Y Y
Fever L M H M
Sinus Y N N Y Flu
Ache Y N N N
Swell Y N Y N
Head N N Y Y
Fev Y p(u) = .75 N .25
Sin
p(head|u) = Y N
12 / 1
Naive Bayes Example

Flu Y N Y Y ? Fever L M H M M Sinus Y N N Y N Ache Y N N N N Swell Y N Y N N Head N N Y Y N
Flu Fev Sin Ach Swe Hea
Find p(u|fever , sinus, ache, swell, head). p(u = Y )p(fev = M|u = Y )p(sin = N|u = Y )p(ach = N|u = Y )p(swe = N|u = Y )p(head = N|u = Y ) Y N p(u) = .75 .25
13 / 1
Naive Bayes Example

Find p(u|fever , sinus, ache, swell, head). .75 p(fev = M|u = Y )p(sin = N|u = Y )p(ach = N|u = Y )p(swe = N|u = Y )p(head = N|u = Y ) L M H p(fev |u) = Y .33 .33 .33 N 0 1 0
14 / 1
Naive Bayes Example

Find p(u|fever , sinus, ache, swell, head). .75 .33 p(sin = N|u = Y )p(ach = N|u = Y )p(swe = N|u = Y )p(head = N|u = Y ) Y N p(sinus|u) = Y .67 .33 N 1 0
15 / 1
Naive Bayes Example

Find p(u|fever , sinus, ache, swell, head). .75 .33 .33 p(ach = N|u = Y )p(swe = N|u = Y )p(head = N|u = Y ) p(ache|u) = Y N Y .33 1 N .67 0
16 / 1
Naive Bayes Example

Find p(u|fever , sinus, ache, swell, head). .75 .33 .33 .67 p(swe = N|u = Y )p(head = N|u = Y ) Y N p(swell|u) = Y .67 .33 N 1 0
17 / 1
Naive Bayes Example

Find p(u|fever , sinus, ache, swell, head). .75 .33 .33 .67 .33 p(head = N|u = Y ) Y N p(head|u) = Y .67 .33 N 1 0
18 / 1
Naive Bayes Example
Flu Y N Y Y ?
Fever L M H M M
Sinus Y N N Y N
Ache Y N N N N
Swell Y N Y N N
Head N N Y Y N
Find p(u|fever , sinus, ache, swell, head). .75 .33 .33 .67 .33 .33 = 0.0060
19 / 1
Completely Observed graphical models

Suppose we have observations for every node. Flu Y N Y Y Fever L M H M Sinus Y N N Y Ache Y N N N Swell Y N Y N Head N N Y Y
In the simplest least general graph, assume each independent. Train 6 separate models. Fl Fe Si Ac Sw He
2nd simplest graph most general assume no independence. Build a 6-dimensional table. (Divide by total count.) Fl Fe Si Ac Sw He
20 / 1
Maximum Likelihood Conditional Probability Tables

Consider this Graphical Model x1 x2 x3 x5 x4
x0
Each node has a conditional probability table i . Given the table, we have a pdf p(x|) =
M1 Y i =0
p(xi |i , i )
We have m variables in x, and N data points, X. Maximum (log) Likelihood

N1 X n=0 M1 Y i =0
= =
argmax ln p(X|)
argmax
ln
p(xin |i )
argmax
N1 X n=0
ln p(Xn |)
argmax
N1 M1 XX n=0 i =0
ln p(xin |i )
21 / 1
Maximum Likelihood CPTs
First, Kroneckers delta function. (xn , xm ) = 1 0 if xn = xm otherwise
Counts: the number of times something appears in the data m(xi ) =

N1 X n=0
(xi , xin )
m(X)
N1 X n=0
(X, Xn ) !!
N=
X
x1
m(x1 ) =
X X
x1 x2
(x1 , x2 )
X X X
x1 x2 x3
(x1 , x2 , x3 )
...
22 / 1
Maximum likelihood CPTs

M1 Y i =0
l()
= =
N1 X X n=0 X
N1 X n=0
N1 X n=0
= ln p(Xn |) Y
X
X
xn xn
m(X) ln
p(xi |i , i )
ln
p(X|)(xn ,X)
X X M1
i =0 M1 X
m(X) ln p(xi |i , i )
(xn , X) ln p(X|) =
i =0 xi ,i X\xi \i
X X X
m(X) ln p(xi |i , i )
X
xn
m(X) ln p(X|)
M1 X
m(xi , i ) ln p(xi |i , i )
i =0 xi ,i
Dene a function: (xi , i ) = p(xi |i , i ) Constraint: X

xi
(xi , i ) = 1
23 / 1
ML With Lagrange Multipliers

Lagrange Multipliers To maximize f (x, y )subject to g (x, y ) = c. Maximize f (x, y ) (g (x, y ) c)
l() l() (xi , i ) (xi , i ) X m(xi , i ) i x
i
= = = = = =
M1 X X X i =0 xi i
m(xi , i ) ln (xi , i )
M1 X X i =0 i
m(xi , i ) i = 0 (xi , i ) m(xi , i ) i 1 the constraint X

xi
i @
X
xi
(xi , i ) 1A
i (xi , i )
m(xi , i ) = m(i )
m(xi , i ) counts! m(i )

24 / 1
Maximum A Posteriori CPT training
For the Bayesians, MAP leads to: (xi , i ) = m(xi , i ) + m(i ) + |xi |
25 / 1
Example of maximum likelihood.
Flu (x0 ) Y N Y Y
Fever (x1 ) L M H M
Sinus (x2 ) Y N Y Y x1
Ache (x3 ) Y N N N x3
Swell (x4 ) Y N Y N
Head (x5 ) N N Y Y
x0 x2 x4
x5
(xi , i ) =
m(xi , i ) m(i )
26 / 1
Conditional Dependence Test.

We also what to be able to check conditional independencies in a graphical model. i.e. Is achiness (x3 ) independent of u (x0 ) given fever (x1 )? i.e. Is achiness (x3 ) independent of sinus infection (x2 ) given fever (x1 )? p(x) = p(x0 )p(x1 |x0 )p(x2 |x0 )p(x3 |x1 )p(x4 |x2 )p(x5 |x1 , x4 ) p(x3 |x0 , x1 , x2 ) = p(x0 , x1 , x2 , x3 ) p(x0 , x1 , x2 ) p(x0 )p(x1 |x0 )p(x2 |x0 )p(x3 |x1 ) = p(x0 )p(x1 |x0 )p(x2 |x0 ) = p(x3 |x1 )
x3 x0 , x2 |x1 No problem, right? What about x0 x5 |x1 , x2 ?

27 / 1
D-separation and Bayes Ball

x0
x1 x2
x3 x5
Intuition: nodes are separated, or blocked by sets of nodes Example: nodes x1 and x2 , block the path from x0 to x5 , then x0 x5 |x2 , x3

x4
28 / 1
D-separation and Bayes Ball

x0
x1 x2
x3 x5
Intuition: nodes are separated, or blocked by sets of nodes Example: nodes x1 and x2 , block the path from x0 to x5 , then x0 x5 |x2 , x3 While this is true in undirected graphs, it is not in directed graphs. We need more than simple Separation We need directed separation D-Separation the D-separation is computed using the Bayes Ball algorithm. Allows us to prove general statements xa xb |xc .
29 / 1

x4
Bayes Ball Algorithm
xa xb |xc Shade nodes xc Place a ball at each node in xa Bounce balls around the graph according to some rules If no balls read xb , then xa xb |xc , else false. Balls can travel along/against edges Pick any path Test to see if the ball goes through or bounces back.
30 / 1
Ten Rules of Bayes Ball
31 / 1
Bayes Ball Example - I
x0 x4 |x2 ?
x3 x1 x0 x2 x4 x5
32 / 1
Bayes Ball Example - II
x0 x5 |x1 , x2 ?
x3 x1 x0 x2 x4 x5
33 / 1
Undirected Graphs
What if we allow undirected graphs? What do they correspond to? Its not cause/eect, or trigger/response, rather, general dependence. Example: Image pixels, where each pixel is a bernouli. Can have a probability over all pixels p(x11 , x1M , xM1 , xMM ) Bright pixels have bright neighbors. No parents, just probabilities. Grid models are called Markov Random Fields.
34 / 1
Undirected Graphs
z x w x y |{w , x} Undirected separation is easy. To check xa xb |xc , check Graph reachability of xa and xb without going through nodes in xc . y
35 / 1
Bye
Next
Representing probabilities in Undirected Graphs.
36 / 1

Andrew Rosenberg - Lecture 8: Graphical Models II Machine Learning

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Andrew Rosenberg - Lecture 8: Graphical Models II Machine Learning

Enviado por

Direitos autorais:

Formatos disponíveis

Lecture 8: Graphical Models II Machine Learning

Recap of Graphical Models

Plate Notation of a Graphical Model

Recall the Naive Bayes Graphical Model y x0 ... xn

Naive Bayes Example

Naive Bayes Example

Fev p(u) = Y .75 N .25

Ach Swe Hea

Naive Bayes Example

Fev Y p(u) = .75

Ach Swe Hea L .33 0 M .33 1 H .33 0

N p(fev |u) = Y .25 N

Naive Bayes Example

Fev Y p(u) = .75 N .25

Ach Swe Hea Y .67 1 N .33 0

Naive Bayes Example

Fev Y p(u) = .75 N .25

Ach Swe Hea Y .33 1 N .67 0

Naive Bayes Example

Fev Y p(u) = .75 N .25

Ach Swe Hea Y .67 1 N .33 0

Naive Bayes Example

Fev Y p(u) = .75 N .25

Ach Swe Hea Y .67 1 N .33 0

Naive Bayes Example

Flu Fev Sin Ach Swe Hea

Naive Bayes Example

Flu Fev Sin Ach Swe Hea

Naive Bayes Example

Flu Fev Sin Ach Swe Hea

Naive Bayes Example

Flu Fev Sin Ach Swe Hea

Naive Bayes Example

Flu Fev Sin Ach Swe Hea

Naive Bayes Example

Flu Fev Sin Ach Swe Hea

Naive Bayes Example

Flu Fev Sin Ach Swe Hea

Completely Observed graphical models

Maximum Likelihood Conditional Probability Tables

We have m variables in x, and N data points, X. Maximum (log) Likelihood

Maximum Likelihood CPTs

First, Kroneckers delta function. (xn , xm ) = 1 0 if xn = xm otherwise

Counts: the number of times something appears in the data m(xi ) =

Maximum likelihood CPTs

Dene a function: (xi , i ) = p(xi |i , i ) Constraint: X

ML With Lagrange Multipliers

m(xi , i ) i = 0 (xi , i ) m(xi , i ) i 1 the constraint X

m(xi , i ) counts! m(i )

Maximum A Posteriori CPT training

Example of maximum likelihood.

Conditional Dependence Test.

x3 x0 , x2 |x1 No problem, right? What about x0 x5 |x1 , x2 ?

D-separation and Bayes Ball

D-separation and Bayes Ball

Bayes Ball Algorithm

Ten Rules of Bayes Ball

Bayes Ball Example - I

Bayes Ball Example - II

Você também pode gostar