Escolar Documentos
Profissional Documentos
Cultura Documentos
Andrew Rosenberg
March 5, 2010
1/1
Today
Graphical Models
Naive Bayes classication Conditional Probability Tables (CPTs) Inference in Graphical Models and Belief Propagation
2/1
Graphical Models Graphical representation of the dependency relationships between random variables.
x3 x1 x0 x2 x4 x5
3/1
Topological Graph
Graphical models factorize probabilities
n1 n1
p(x0 , . . . , xn1 ) =
i =0
p(xi |pai ) =
i =0
p(xi |i )
Nodes are generally topologically ordered so that parents, come before children.
x3 x1 x0 x2 x4 x5
4/1
There can be many variables xi . Plate notation gives a compact representation of models like this:
' &
$ %
5/1
xi
Flu Y N Y Y
Fever L M H M
Sinus Y N Y Y y
Ache Y N N N
Swell Y N Y N
Head N N Y Y
x0
x1
x2
x3
x4
6/1
Flu Y N Y Y
Fever L M H M
Sinus Y N N Y Flu
Ache Y N N N
Swell Y N Y N
Head N N Y Y
Sin
7/1
Flu Y N Y Y
Fever L M H M
Sinus Y N N Y Flu
Ache Y N N N
Swell Y N Y N
Head N N Y Y
Sin
8/1
Flu Y N Y Y
Fever L M H M
Sinus Y N N Y Flu
Ache Y N N N
Swell Y N Y N
Head N N Y Y
Sin
p(sinus|u) = Y N
9/1
Flu Y N Y Y
Fever L M H M
Sinus Y N N Y Flu
Ache Y N N N
Swell Y N Y N
Head N N Y Y
Sin
p(ache|u) = Y N
10 / 1
Flu Y N Y Y
Fever L M H M
Sinus Y N N Y Flu
Ache Y N N N
Swell Y N Y N
Head N N Y Y
Sin
p(swell |u) = Y N
11 / 1
Flu Y N Y Y
Fever L M H M
Sinus Y N N Y Flu
Ache Y N N N
Swell Y N Y N
Head N N Y Y
Sin
p(head|u) = Y N
12 / 1
Find p(u|fever , sinus, ache, swell, head). p(u = Y )p(fev = M|u = Y )p(sin = N|u = Y )p(ach = N|u = Y )p(swe = N|u = Y )p(head = N|u = Y ) Y N p(u) = .75 .25
13 / 1
Find p(u|fever , sinus, ache, swell, head). .75 p(fev = M|u = Y )p(sin = N|u = Y )p(ach = N|u = Y )p(swe = N|u = Y )p(head = N|u = Y ) L M H p(fev |u) = Y .33 .33 .33 N 0 1 0
14 / 1
Find p(u|fever , sinus, ache, swell, head). .75 .33 p(sin = N|u = Y )p(ach = N|u = Y )p(swe = N|u = Y )p(head = N|u = Y ) Y N p(sinus|u) = Y .67 .33 N 1 0
15 / 1
Find p(u|fever , sinus, ache, swell, head). .75 .33 .33 p(ach = N|u = Y )p(swe = N|u = Y )p(head = N|u = Y ) p(ache|u) = Y N Y .33 1 N .67 0
16 / 1
Find p(u|fever , sinus, ache, swell, head). .75 .33 .33 .67 p(swe = N|u = Y )p(head = N|u = Y ) Y N p(swell|u) = Y .67 .33 N 1 0
17 / 1
Find p(u|fever , sinus, ache, swell, head). .75 .33 .33 .67 .33 p(head = N|u = Y ) Y N p(head|u) = Y .67 .33 N 1 0
18 / 1
Flu Y N Y Y ?
Fever L M H M M
Sinus Y N N Y N
Ache Y N N N N
Swell Y N Y N N
Head N N Y Y N
Find p(u|fever , sinus, ache, swell, head). .75 .33 .33 .67 .33 .33 = 0.0060
19 / 1
In the simplest least general graph, assume each independent. Train 6 separate models. Fl Fe Si Ac Sw He
2nd simplest graph most general assume no independence. Build a 6-dimensional table. (Divide by total count.) Fl Fe Si Ac Sw He
20 / 1
x0
Each node has a conditional probability table i . Given the table, we have a pdf p(x|) =
M1 Y i =0
p(xi |i , i )
= =
argmax ln p(X|)
argmax
ln
p(xin |i )
argmax
N1 X n=0
ln p(Xn |)
argmax
N1 M1 XX n=0 i =0
ln p(xin |i )
21 / 1
(xi , xin )
m(X)
N1 X n=0
(X, Xn ) !!
N=
X
x1
m(x1 ) =
X X
x1 x2
(x1 , x2 )
X X X
x1 x2 x3
(x1 , x2 , x3 )
...
22 / 1
l()
= =
N1 X X n=0 X
N1 X n=0
N1 X n=0
= ln p(Xn |) Y
X
X
xn xn
m(X) ln
p(xi |i , i )
ln
p(X|)(xn ,X)
X X M1
i =0 M1 X
m(X) ln p(xi |i , i )
(xn , X) ln p(X|) =
i =0 xi ,i X\xi \i
X X X
m(X) ln p(xi |i , i )
X
xn
m(X) ln p(X|)
M1 X
m(xi , i ) ln p(xi |i , i )
i =0 xi ,i
(xi , i ) = 1
23 / 1
= = = = = =
M1 X X X i =0 xi i
m(xi , i ) ln (xi , i )
M1 X X i =0 i
i @
X
xi
(xi , i ) 1A
i (xi , i )
m(xi , i ) = m(i )
For the Bayesians, MAP leads to: (xi , i ) = m(xi , i ) + m(i ) + |xi |
25 / 1
Flu (x0 ) Y N Y Y
Fever (x1 ) L M H M
Sinus (x2 ) Y N Y Y x1
Ache (x3 ) Y N N N x3
Swell (x4 ) Y N Y N
Head (x5 ) N N Y Y
x0 x2 x4
x5
(xi , i ) =
m(xi , i ) m(i )
26 / 1
x0
x1 x2
x3 x5
Intuition: nodes are separated, or blocked by sets of nodes Example: nodes x1 and x2 , block the path from x0 to x5 , then x0 x5 |x2 , x3
x4
28 / 1
x0
x1 x2
x3 x5
Intuition: nodes are separated, or blocked by sets of nodes Example: nodes x1 and x2 , block the path from x0 to x5 , then x0 x5 |x2 , x3 While this is true in undirected graphs, it is not in directed graphs. We need more than simple Separation We need directed separation D-Separation the D-separation is computed using the Bayes Ball algorithm. Allows us to prove general statements xa xb |xc .
29 / 1
x4
xa xb |xc Shade nodes xc Place a ball at each node in xa Bounce balls around the graph according to some rules If no balls read xb , then xa xb |xc , else false. Balls can travel along/against edges Pick any path Test to see if the ball goes through or bounces back.
30 / 1
31 / 1
x0 x4 |x2 ?
x3 x1 x0 x2 x4 x5
32 / 1
x0 x5 |x1 , x2 ?
x3 x1 x0 x2 x4 x5
33 / 1
Undirected Graphs
What if we allow undirected graphs? What do they correspond to? Its not cause/eect, or trigger/response, rather, general dependence. Example: Image pixels, where each pixel is a bernouli. Can have a probability over all pixels p(x11 , x1M , xM1 , xMM ) Bright pixels have bright neighbors. No parents, just probabilities. Grid models are called Markov Random Fields.
34 / 1
Undirected Graphs
z x w x y |{w , x} Undirected separation is easy. To check xa xb |xc , check Graph reachability of xa and xb without going through nodes in xc . y
35 / 1
Bye
Next
Representing probabilities in Undirected Graphs.
36 / 1