Você está na página 1de 28

Nicholson & Korb 1 Nicholson & Korb 2

Overview
Bayesian AI
1. Introduction to Bayesian AI (20 min)
AI’99, Sydney
2. Bayesian networks (50 min)
6 December 1999
Break (10 min)
Ann E. Nicholson and Kevin B. Korb
3. Applications (50 min)

School of Computer Science Break (10 min)


and Software Engineering
4. Learning Bayesian networks (50 min)
Monash University, Clayton, VIC 3168
AUSTRALIA 5. Current research issues (10 min)

fannn,korbg@csse.monash.edu.au 6. Bayesian Net Lab (60 min: Optional)

7. Dinner (Optional)

Bayesian AI Tutorial Bayesian AI Tutorial

Nicholson & Korb 3 Nicholson & Korb 4

Introduction to Bayesian AI Reasoning under uncertainty

 Reasoning under uncertainty


Uncertainty: The quality or state of being not
clearly known.
 Probabilities This encompasses most of what we understand
 Alternative formalisms about the world — and most of what we would
like our AI systems to understand.
– Fuzzy logic
Distinguishes deductive knowledge (e.g.,
– MYCIN’s certainty factors
mathematics) from inductive belief (e.g.,
– Default Logic science).
 Bayesian philosophy
Sources of uncertainty
– Dutch book arguments
 Ignorance
– Bayes’ Theorem (which side of this coin is up?)
– Conditionalization
 Physical randomness
– Confirmation theory (which side of this coin will land up?)
 Bayesian decision theory  Vagueness
 Towards a Bayesian AI
(which tribe am I closest to genetically?
Picts? Angles? Saxons? Celts?)

Bayesian AI Tutorial Bayesian AI Tutorial


Nicholson & Korb 5 Nicholson & Korb 6

Fuzzy Logic

Probabilities Designed to cope with vagueness:


Is Fido a Labrador or a Shepard?
Classic approach to reasoning under uncertainty. Fuzzy set theory:
(Blaise Pascal and Fermat). m(Fido 2 Labrador) = m(Fido 2 Shepard) = 0:5
Kolmogorov’s Axioms: Extended to fuzzy logic, which takes intermediate
truth values: T (Labrador(Fido)) = 0:5.
1. P (U ) = 1
Combination rules:
2. 8X  U P (X )  0
 T (p ^ q) = min(T (p); T (q))
3. 8X; Y  U
if X \ Y = ;  T (p _ q) = max(T (p); T (q))
then P (X ^ Y ) = P (X ) + P (Y )
 T (:p) = 1 T (p)
Conditional Probability P (X jY ) = P (X ^Y )
P (Y ) Not suitable for coping with randomness or
Independence X q Y i P (X jY ) = P (X ) ignorance. Obviously not:
Uncertainty(inclement weather) =
max(Uncertainty(rain),Uncertainty(hail),. . . )

Bayesian AI Tutorial Bayesian AI Tutorial

Nicholson & Korb 7 Nicholson & Korb 8

MYCIN’s Certainty Factors


Default Logic
Uncertainty formalism developed for the early
expert system MYCIN (Buchanan and Shortliffe, Intended to reflect “stereotypical” reasoning
1984): under uncertainty (Reiter 1980). Example:
Elicit for (h; e):
Bird(Tweety) : Bird(x) ! Flies(x)
 measure of belief:MB (h; e) 2 [0; 1] Flies(Tweety)
 measure of disbelief: MD(h; e) 2 [0; 1] Problems:

CF (h; e) = MB (h; e) MD(h; e) 2 [ 1; 1]  Best semantics for default rules are


probabilistic (Pearl 1988, Korb 1995).
Special functions provided for combining  Mishandles combinations of low probability
evidence. events. E.g.,

ApplyforJob(me) : ApplyforJob(x) ! Reject(x)


Problems:

 No semantics ever given for ‘belief’/‘disbelief’ Reject(me)


 Heckerman (1986) proved that restrictions I.e., the dole always looks better than
required for a probabilistic semantics imply applying for a job!
absurd independence assumptions.

Bayesian AI Tutorial Bayesian AI Tutorial


Nicholson & Korb 9 Nicholson & Korb 10

Probability Theory A Dutch Book

Payoff table on a bet for h


So, why not use probability theory to represent
(Odds = p=1 p; S = betting unit)
uncertainty?
That’s what it was invented for. . . dealing with h Payoff
physical randomness and degrees of ignorance. T $(1-p)  S
Furthermore, if you make bets which violate F -$p  S
probability theory, you are subject to Dutch
Given a fair bet, the expected value from such a
books:
payoff is always $0.

A Dutch book is a sequence of “fair” bets Now, let’s violate the probability axioms.
which collectively guarantee a loss.
Example
Fair bets are bets based upon the standard
Say, P (A) = 0:1 (violating A2)
odds-probability relation:
Payoff table against A (inverse of: for A),
O(h) = 1 P (Ph()h) with S = 1:
:A Payoff
P (h) = 1 +O(Oh()h) T $pS = -$0.10
F -$(1-p)S = -$1.10

Bayesian AI Tutorial Bayesian AI Tutorial

Nicholson & Korb 11 Nicholson & Korb 12

Bayes’ Theorem; Bayesian Decision Theory


Conditionalization
— Frank Ramsey (1931)
— Due to Reverend Thomas Bayes (1764) Decision making under uncertainty: what action
to take (plan to adopt) when future state of the
P (hje) = P (ePjh()eP) (h) world is not known.
Bayesian answer: Find utility of each possible
Conditionalization: P 0 (h) = P (hje) outcome (action-state pair) and take the action
that maximizes expected utility.
Or, read Bayes’ theorem as:
Example

Posterior = Likelihood  Prior Action Rain (p = .4) Shine (1 - p = .6)


Prob of evidence Take umbrella 30 10
Leave umbrella -100 50
Assumptions:
Expected utilities:
1. Joint priors over fhi g and e exist. E(Take umbrella) = (30)(.4) + (10)(.6) = 18

2. Total evidence: e, and only e, is learned.


E(Leave umbrella) = (-50)(.4) + (100)(.6) = 40

Bayesian AI Tutorial Bayesian AI Tutorial


Nicholson & Korb 13 Nicholson & Korb 14

Bayesian AI

A Bayesian conception of an AI is:


An autonomous agent which Bayesian Networks: Overview
 Has a utility structure (preferences)
 Syntax
 Can learn about its world and the relation
between its actions and future states  Semantics
(probabilities)  Evaluation methods
 Maximizes its expected utility  Influence diagrams (Decision Networks)

The techniques used in learning about the world  Dynamic Bayesian Networks
are (primarily) statistical. . . Hence

Bayesian data mining

Bayesian AI Tutorial Bayesian AI Tutorial

Nicholson & Korb 15 Nicholson & Korb 16

Bayesian Networks
Example: Earthquake
 Data Structure which represents the
(Pearl,R&N)
dependence between variables;

 Gives concise specification of the joint  You have a new burglar alarm installed.
probability distribution.  It is reliable about detecting burglary, but
 A Bayesian Network is a graph in which the responds to minor earthquakes.
following holds:  Two neighbours (John, Mary) promise to call
1. A set of random variables makes up the you at work when they hear the alarm.
nodes in the network. – John always calls when hears alarm, but
2. A set of directed links or arrows connects confuses alarm with phone ringing (and
pairs of nodes. calls then also)
3. Each node has a conditional probability – Mary likes loud music and sometimes
table that quantifies the effects the misses alarm!
parents have on the node.
 Given evidence about who has and hasn’t
4. Directed, acyclic graph (DAG), i.e. no called, estimate the probability of a burglary.
directed cycles.

Bayesian AI Tutorial Bayesian AI Tutorial


Nicholson & Korb 17 Nicholson & Korb 18

Earthquake Example: Notes

 Assumptions: John and Mary don’t perceive


Earthquake Example: burglary directly; they do not feel minor
earthquakes.
Network Structure
 Note: no info about loud music or telephone
ringing and confusing John. Summarised in
uncertainty in links from Alarm to
JohnCalls and MaryCalls.
P(E)


Burglary Earthquake
0.02
Once specified topology, need to specify
P(B)
B E P(A|B,E) conditional probability table (CPT) for
0.01 Alarm
T T 0.95 each node.
T F 0.94
F T 0.29
F F 0.001
– Each row contains the cond prob of each
JohnCalls MaryCalls
node value for a conditioning case.
A P(J|A) A P(M|A)
– Each row must sum to 1.
T 0.90 T 0.70
F 0.05 F 0.01 – A table for a Boolean var with n Boolean
parents contain 2n+1 probs.
– A node with no parents has one row (the
prior probabilities)

Bayesian AI Tutorial Bayesian AI Tutorial

Nicholson & Korb 19 Nicholson & Korb 20

Representing the joint


probability distribution
Semantics of Bayesian
P (X1 = x1 ; X2 = x2 ; :::; Xn = xn )
Networks
= P (x1 ; x2 ; :::; xn)

 A (more compact) representation of the joint = P (x1 )  P (x2 jx1 ):::  P (xn jx1 ^ :::xn 1 )
probability distribution.

– helpful in understanding how to construct = i P (xi jx1 ^ :::xi 1 )


network

 Encoding a collection of conditional = i P (xi j(Xi ))


independence statements.

– helpful in understanding how to design Example: P (J ^ M ^ A ^ :B ^ :E )


inference procedures
= P (J jA)P (M jA)P (Aj:B ^ :E )P (:B )P (:E )

= 0:9  0:7  0:001  0:999  0:998 = 0:0067:

Bayesian AI Tutorial Bayesian AI Tutorial


Nicholson & Korb 21 Nicholson & Korb 22

Compactness and Node


Network Construction Ordering

1. Choose the set of relevant variables Xi that  Compactness of BN is an example of a locally


describe the domain. structured (or sparse) system.

2. Choose an ordering for the variables.  The correct order to add nodes is to add the
“root causes” first, then the variable they
3. While there are variables left: influence, so on until “leaves” reached.

(a) Pick a variable Xi and add a node to the  Examples of wrong ordering (which still
network for it. represent same joint distribution):
(b) Set  (Xi ) to some minimal set of nodes
already in the net such that the 1. MaryCalls, JohnCalls, Alarm, Burglary,
conditional independence property is Earthquake.
satisfied. MaryCalls

P (XijXi 1; :::; X1) = P (Xij(Xi))


JohnCalls

(c) Define the CPT for Xi . Alarm

Burglary Earthquake

Bayesian AI Tutorial Bayesian AI Tutorial

Nicholson & Korb 23 Nicholson & Korb 24

Conditional Independence:
Compactness and Node Causal Chains
Ordering (cont.)

2. MaryCalls, JohnCalls, Earthquake, Burglary,


Causal chains give rise to conditional
Alarm.
independence:

MaryCalls
JohnCalls
A B C

Earthquake

P (C jA ^ B ) = P (C jB )
Burglary Alarm

Example

More probabilities than the full joint!  A = Jack’s flu

 B = severe cough
See below for why.
 C = Jill’s flu

Bayesian AI Tutorial Bayesian AI Tutorial


Nicholson & Korb 25 Nicholson & Korb 26

Conditional Independence: Conditional Dependence:


Common Causes Common Effects

Causal causes (or ancestors) also give rise to Common effects (or their descendants) give rise
conditional independence: to conditional dependence:
A C

A C

P (AjC ^ B ) 6= P (A)P (C )
P (C jA ^ B ) = P (C jB ) Example

Example  A = flu

 B = severe cough
 A = Jack’s flu
 C = tuberculosis
 B = Joe’s flu
Given a severe cough, flu “explains away”
 C = Jill’s flu tuberculosis.

Bayesian AI Tutorial Bayesian AI Tutorial

Nicholson & Korb 27 Nicholson & Korb 28

D-separation

 Graph-theoretic criterion of conditional


Causal Ordering
independence.

 We can determine whether a set of nodes X is


independent of another set Y, given a set of Why does variable order affect network density?
evidence nodes E, i.e., X q Y jE ”.
Because
 Earthquake example
 Using the causal order allows direct
Burglary Earthquake
representation of conditional independencies

 Violating causal order requires new arcs to


Alarm
re-establish conditional independencies

JohnCalls MaryCalls

Bayesian AI Tutorial Bayesian AI Tutorial


Nicholson & Korb 29 Nicholson & Korb 30

Inference in Bayesian
Causal Ordering (cont’d) Networks

 Basic task for any probabilistic inference


Flu TB system:
Compute the posterior probability
distribution for a set of query variables,
Cough given values for some evidence variables.

Flu and TB are marginally independent.  Also called Belief Updating.

 Types of Inference:
Given the ordering: Cough, Flu, TB:
Q E Q E E
Cough

TB
E Q
Flu

(Explaining Away)
Marginal independence of Flu and TB must be
re-established by adding Flu ! TB or Flu
Intercausal
TB E Q E

Diagnostic Causal Mixed

Bayesian AI Tutorial Bayesian AI Tutorial

Nicholson & Korb 31 Nicholson & Korb 32

Kinds of Inference
Inference Algorithms:
 Diagnostic inferences: from effect to causes. Overview
P(Burglary|JohnCalls)

 Causal Inferences: from causes to effects.  Exact inference

P(JohnCalls|Burglary) – Trees and polytrees: message-passing


P(MaryCalls|Burglary) algorithm
– Multiply-connected networks:
 Intercausal Inferences: between causes of a
common effect.  Clustering

P(Burglary|Alarm)  Approximate Inference


P(Burglary|Alarm ^ Earthquake) – Large, complex networks:
 Mixed Inference: combining two or more of  Stochastic Simulation
above.  Other approximation methods

P(Alarm|JohnCalls ^ :EarthQuake)  In the general case, both sorts of inference


P(Burglary|JohnCalls ^ are computationally complex (“NP-hard”).
:EarthQuake)

Bayesian AI Tutorial Bayesian AI Tutorial


Nicholson & Korb 33 Nicholson & Korb 34

Message Passing Example Inference in multiply


P(B)
Burglary Earthquake
P(E) connected networks
0.01 0.02

PhoneRings Alarm
B
T T
E P(A)
0.95
 Networks where two nodes are connected by
P(Ph)
T F 0.94 more than one path
F T 0.29
0.05 F F 0.001
JohnCalls MaryCalls – Two or more possible causes which share a
P A P(J) A P(M) common ancestor
T T 0.95 T 0.70
T F 0.5 F 0.01
– One variable can influence another
F T 0.90
F F 0.01 through more than one causal mechanism
 Example: Cancer network
Metastatic Cancer
π(Β) = (.001,.999) π(Ε) = (.002,.998) A
λ (Β) = (1,1) λ (Ε) = (1,1)
bel(B) = (.001, .999) bel(E) = (.002, .998)
Brain tumour
B λ A (B) E B C
λ A (E)
bel(Ph) = (.05, .95) Increased total
π A (B) π A (E)
π(Ph) = (.05,.95) serum calcium
D E
λ(Ph) = (1,1) Ph λ J (Ph) A
λ J (A) π M(A) Severe Headaches
π J (Ph) λ M(A) Coma
π J (A)
J M
 Message passing doesn’t work - evidence gets
λ (J) = (1,1) λ (M) = (1,0)
“counted twice”

Bayesian AI Tutorial Bayesian AI Tutorial

Nicholson & Korb 35 Nicholson & Korb 36

Clustering methods
Clustering methods (cont.)
 Transform network into a probabilistically
equivalent polytree by merging (clustering)
offending nodes  Jensen Join-tree (Jensen, 1996) version the
current most efficient algorithm in this class
 Cancer example: new node Z combining B (e.g. used in Hugin, Netica).
and C
 Network evaluation done in two stages
A
– Compile into join-tree
 May be slow
 May require too much memory if
Z=B,C original network is highly connected
– Do belief updating in join-tree (usually
E fast)
D
Caveat: clustered nodes have increased
P (z ja) = P (b; cja) = P (bja)P (cja) complexity; updates may be computationally
complex
P (ejz ) = P (ejb; c) = P (ejc)
P (djz ) = P (djb; c)
Bayesian AI Tutorial Bayesian AI Tutorial
Nicholson & Korb 37 Nicholson & Korb 38

Approximate inference with


stochastic simulation
Making Decisions
 Use the network to generate a large number
of cases that are consistent with the network  Bayesian networks can be extended to
distribution. support decision making.

 Evaluation may not converge to exact values  Preferences between different outcomes of
(in reasonable time). various plans.

 Usually converges to close to exact solution – Utility theory


quickly if the evidence is not too unlikely.
 Decision theory = Utility theory +
 Performs better when evidence is nearer to Probability theory.
root nodes, however in real domains, evidence
tends to be near leaves (Nicholson&Jitnah,
1998)

Bayesian AI Tutorial Bayesian AI Tutorial

Nicholson & Korb 39 Nicholson & Korb 40

Type of Nodes
Decision Networks
Chance nodes: (ovals) represent random
A Decision network represents information about variables (same as Bayesian networks). Has
an associated CPT. Parents can be decision
 the agent’s current state nodes and other chance nodes.

 its possible actions Decision nodes: (rectangles) represent points


where the decision maker has a choice of
 the state that will result from the agent’s actions.
action
Utility nodes: (diamonds) represent the agent’s
 the utility of that state utility function (also called value nodes in
the literature). Parents are variables
Also called, Influence Diagrams describing the outcome state that directly
(Howard&Matheson, 1981). affect utility. Has an associated table
representing multi-attribute utility function.

Bayesian AI Tutorial Bayesian AI Tutorial


Nicholson & Korb 41 Nicholson & Korb 42

Example: Umbrella
Evaluating Decision
Weather
Networks: Algorithm

Forecast U 1. Set the evidence variables for the current


state.
Take Umbrella
2. For each possible value of the decision node

P (Weather = Rainj) = 0:3 (a) Set the decision node to that value.
P (Forecast = RainyjWeather = Rain) = 0:60 (b) Calculate the posterior probabilities for
P (Forecast = CloudyjWeather = Rain) = 0:25 the parent nodes of the utility node (as for
P (Forecast = SunnyjWeather = Rain) = 0:15
BNs).

P (Forecast = RainyjWeather = NoRain) = 0:1 (c) Calculate the resulting (expected) utility
P (Forecast = CloudyjWeather = NoRain) = 0:2 for the action.
P (Forecast = SunnyjWeather = NoRain) = 0:7 3. Return the action with the highest expected
utility.
U (NoRain; TakeUmbrella) = 20
U (NoRain; LeaveAtHome) = 100 Simple for single decision, less so when executing
U (Rain; TakeIt) = 70 several actions in sequence (i.e. a plan).
U (Rain; LeaveAtHome) = 0
Bayesian AI Tutorial Bayesian AI Tutorial

Nicholson & Korb 43 Nicholson & Korb 44

Dynamic Belief Networks

State evolution model


Dynamic Decision Network
State t-2 State t-1 State t State t+1 State t+2

Obs t-2 Obs t-1 Obs t Obs t+1

Sensor model
Obs t+2
 Similarly, Decision Networks can be extended
to include temporal aspects.

 The values of state variables at time t depend  Sequence of decisions taken = Plan.
only on the values at t 1.

 Can calculate distributions for St+1 and


further: probabilistic projection. Dt Dt+1 Dt+1 Dt+1

 Can be done using standard BN updating


State t State t+1 State t+2 State t+3
algorithms

 This type of DBN gets very large, very Obs t Obs t+1 Obs t+2 Obs t+3
Ut+3

quickly.

 Usually only keep two time slices of the


network.

Bayesian AI Tutorial Bayesian AI Tutorial


Nicholson & Korb 45 Nicholson & Korb 46

Bayesian Networks: Summary

Uses of Bayesian Networks  Bayes’ rule allows unknown probabilities to


be computed from known ones.

1. Calculating the belief in query variables  Conditional independence (due to causal


given values for evidence variables (above). relationships) allows efficient updating

2. Predicting values in dependent variables  Bayesian networks are a natural way to


given values for independent variables. represent conditional independence info.
– links between nodes: qualitative aspects;
3. Decision making based on probabilities in the
network and on the agent’s utilities – conditional probability tables:
(Influence Diagrams [Howard and Matheson quantitative aspects.
1981]).  Inference means computer the probability
distribution for a set of query variables, given
4. Deciding which additional evidence should be
a set of evidence variables.
observed in order to gain useful information.
 Inference in Bayesian networks is very
5. Sensitivity analysis to test impact of changes flexible: can enter evidence about any node
in probabilities or utilities on decisions.
and update beliefs in any other nodes.

 The speed of inference in practice depends on


the structure of the network: how many

Bayesian AI Tutorial Bayesian AI Tutorial

Nicholson & Korb 47 Nicholson & Korb 48

Applications: Overview

 (Simple) Example Networks


loops; numbers of parents; location of
evidence and query nodes.  Applications

 Bayesian networks can be extended with – Medical Decision Making: Survey of


decision nodes and utility nodes to support applications
decision making: Decision Networks or – Planning and Plan Recognition
Influence Diagrams.
– Natural Language Generation (NAG)
 Bayesian and Decision networks can be – Bayesian poker
extended to allow explicit reasoning about
changes over time.  Deployed Bayesian Networks (See Handout
for details)

 BN Software

 Web Resources

Bayesian AI Tutorial Bayesian AI Tutorial


Nicholson & Korb 49 Nicholson & Korb 50

Example: Cancer
Metastatic cancer is a possible cause of a brain
Example: Asia
tumor and is also an explanation for increased
total serum calcium. In turn, either of these A patient presents to a doctor with shortness of
could explain a patient falling into a coma. breath. The doctor considers that possibles
Severe headache is also possibly associated with causes are tuberculosis, lung cancer and
a brain tumor. (Example from (Pearl, 1988).) bronchitis. Other additional information that is
Metastatic Cancer relevant is whether the patient has recently
A visited Asia (where tuberculosis is more
Brain tumour prevalent), whether or not the patient is a
B C smoker (which increases the chances of cancer
Increased total
serum calcium and bronchitis). A positive xray would indicate
D E
either TB or lung cancer. (Example from
Coma Severe Headaches
(Lauritzen, 1988).)
P (a) = 0:2 visit to Asia smoking

P (bja) = 0:80 P (bj:a) = 0:20 tuberculosis lung cancer bronchitis

P (cja) = 0:20 P (cj:a) = 0:05 either tub or


lung cancer

P (djb; c) = 0:80 P (dj:b; c) = 0:80 positive X-ray dyspnoea

P (djb; :c) = 0:80 P (dj:b; :c) = 0:05


P (ejc) = 0:80 P (ej:c) = 0:60
Bayesian AI Tutorial Bayesian AI Tutorial

Nicholson & Korb 51 Nicholson & Korb 52

Probabilistic reasoning in
Example: A Lecturer’s Life medicine

Dr. Ann Nicholson spends 60% of her work time in her office.
The rest of her work time is spent elsewhere. When Ann is in
 See handout from (Dean et al., 1993).
her office, half the time her light is off (when she is trying to
hide from students and get some real work done). When she
 Simplest tree-structured network for
is not in her office, she leaves her light on only 5% of the time. diagnostic reasoning
80% of the time she is in her office, Ann is logged onto the
computer. Because she sometimes logs onto the computer – H = disease hypothesis; F = findings
from home, 10% of the time she is not in her office, she is still (symptoms, test results)
logged onto the computer. Suppose a student checks Dr.
Nicholson’s login status and sees that she is logged on. What
effect does this have on the student’s belief that Dr.
Nicholson’s light is on? (Example from (Nicholson, 1999))

in-office

lights-on logged-on
 Multiply-connected network (QMR structure)

– B = background information (e.g. age, sex


of patient)

Bayesian AI Tutorial Bayesian AI Tutorial


Nicholson & Korb 53 Nicholson & Korb 54

Medical Applications

 Pathfinder case study: see handout using


material from (Russell&Norvig, 1995,
pp.457-458).

 QMR (Quick Medical Reference): 600


diseases, 4,000 findings, 40,000 arcs.
(Dean&Wellman, 1991)

 MUNIN (Andreassen et al., 1989):


neuromuscular disorders, about 1000 nodes;
exact computation < 5 seconds.

 Glucose prediction and insulin dose


adjustment (DBN application) (Andreassen et
al., 1991).

 CPSC project (Pradham et al., 1994)

– 448 nodes, 906 links, 8254 conditional


probability values
– LW algorithm - answers in 35 mins (1994)

Bayesian AI Tutorial Bayesian AI Tutorial

Nicholson & Korb 55 Nicholson & Korb 56

 Application of LW to medical diagnosis


(Shwe&Cooper, 1990). Robot Navigation and
 Forecasting sleep apnea (Dagum et al., 1993).
Tracking
 ALARM (Beinlich et al., 1989): 37 nodes, 42
arcs. (See Netica examples.)  Example of a Dynamic Decision Network

MinVolSet (3)
.976
 Dean&Wellman, 1991.
Ventmach (4) Disconnect (2)
1.158 .617
PulmEmbolus(2) Intubation (3) .141 VentTube (4) 1.146 KinkedTube(4)
.288 1.180 .227
.369 .140
.428
PAP (3) Shunt (2) Press (4) .098 VentLung (4)
.067 .100 1.201
1.189
FiO2 (2) VentAlv (4) MinVol (4)
.411 .213 .891
.805 .743
PVSat (3) ArtCO2 (3) .362
Anaphylaxis (2)
ExpCO2 (4)
.054 .239
InsuffAnesth (2) SaO2 (3)
.246 TPR (3)
.092 .066
Catechol (2)

.470 LVFailure(2) Hypovolemia (2)


.547 .360
.479 .538
.137
ErrCauter (2) HR (3) ErrLowOutput(2) History (2) StrokeVolume (3) LVEDVolume(3)
.888
.324 .344 .874
.888 .948 .251 .724 .746
HRSat (3) .324 HREKG (3) HRBP (3) CO (3) CVP (3) PCPW (3)

.485
.199
BP (3)

Bayesian AI Tutorial Bayesian AI Tutorial


Nicholson & Korb 57 Nicholson & Korb 58

Plan Recognition Applications Traffic Monitoring:


BATmobile
 Keyhole plan recognition in an Adventure
game (Albrecht et al., 1998).
A A A A A A A A
 (Forbes et al., 1995)
0 1 2 3 0 1 2 3

 Example of a DBN
Q Q’ Q Q’

L L L L L L L L
0 1 2 3 0 1 2 3

(a) mainModel (b) indepModel

A A A A
0 1 2 3 Q Q’

Q Q’ L L L L
0 1 2 3

(c) actionModel (d) locationModel

 Traffic plan recognition (Pynadeth&Wellman,


1995).

Bayesian AI Tutorial Bayesian AI Tutorial

Nicholson & Korb 59 Nicholson & Korb 60

Natural Language Generation

NAG (McConachy et al., 1999) – A Nice


Argument Generator – uses two Bayesian
networks to generate and assess natural
%cE
language arguments: %  EE cc
Higher level

%% cc  ‘motivation’
concepts like

 E
or

%  EE cc
‘ability’
Lower level
%  c
 % E 
cc
concepts like

%%  E
‘Grade Point Average’ 
Normative Model: Represents our best %

  E
E 
c
 c
E HB HH  ccc

 +
@

Semantic
%  
  Network

 E 
@
understanding of the domain; proper @
@%
% 
EA
E c
2nd layer

%@@ 
E
Q
EE QQBB
B H HH
 c
 cc

E A
 Q B
A

(constrained) Bayesian updating, given Semantic


%% -@@ 
E
E Q  cc
% R
E
E C   c
Network

%% EE C 
1st layer
 

HH   
premises. % % 
 H
HH  EE C  C 
  Bayesian

  
%     E  C 
%%
E
EE 
 Network



   
User Model: Represents our best %
% 6 E

understanding of the human; Bayesian Proposition, e.g., [publications authored by person X cited >5 times]

updating modified to reflect human biases


(e.g., overconfidence; Korb, McConachy,
Zukerman, 1997).

BNs are embedded in a semantic hierarchy


1

 supports attentional modeling

 constrained updating

Bayesian AI Tutorial Bayesian AI Tutorial


Nicholson & Korb 61 Nicholson & Korb 62

Bayesian Poker
Bayesian Poker BN
 (Korb et al., 1999)
 Bayesian network provides an estimate of
 Poker is ideal for testing automated reasoning
winning at any point in the hand.
under uncertainty

– Physical randomisation
 Betting curves based on pot-odds used to
determine action (bet/call, pass or fold).
– Incomplete hand information
– Incomplete opponent info (strategies, bluffing, BPP Win

etc)

 Bayesian networks are a good representation for


OPP Final BPP Final
complex game playing.
M M
 Our Bayesian Poker Player (BPP) plays 5-Card
C|F C|F

stud poker at the level of a good amateur human OPP Current BPP Current

player. To play: M M
A|C U|C
telnet indy13.cs.monash.edu.au
login: poker OPP Action OPP Upcards

password: maverick

Bayesian AI Tutorial Bayesian AI Tutorial

Nicholson & Korb 63 Nicholson & Korb 64

Bayesian Poker BN (cont.)

Hand Types
Bayesian Poker BN (cont.)
 Initial 9 hand types too coarse.

 Different networks (matrices) for each round.


 We use a finer granularity for most common
hands (busted and a pair):
 OPP Current, BPP Current: (partial) hand
– low, medium, Q-high, K-high, A-high
types with cards dealt so far.
– results in 17 hand-types
 OPP Final, BPP Final: hand types after all 5
cards dealt. Conditional Probability Matrices

 Observation nodes:
 MAjC : probability of opponent’s action given
– OPP Upcards: All opponent’s cards except current hand type learned from observed
first are visible to BPP. showdown data.

– OPP Action: BPP knows opponent’s action.  MU jC and MCjF estimated by dealing out 107
poker hands.

Belief Updating: Since network is a polytree,


simple fast propagation updating algorithm used.

Bayesian AI Tutorial Bayesian AI Tutorial


Nicholson & Korb 65 Nicholson & Korb 66

Current Status, Possible Deployed BNs


Extensions
 From Web Site database: See handout for
 BPP outperforms automated opponents, is details.
fairly even with ave amateur humans, and
loses to experienced humans.
 TRACS: Predicting reliability of military
vehicles.
 Learning the OPP Action CPTs does not (yet)
 Andes: intelligent tutoring system for
appear to improve performance.
physics.
 BN Improvements
 Distributed Virtual Agents advising online
– Refine action nodes users on web sites.
– Further refinement of hand types
 Information extraction from natural
– Improve network structure language text
– Adding bluffing to the opponent model
 DXPLAIN: decision support for medical
– Improved learning of opponent model diagnosis.
 More complex poker: multi-opponent games,  Illiad: teaching tool for medical students.
table stake games.
 Microsoft Health Produce: “find by symptom”
 DBN model to represent changes over time
feature.

Bayesian AI Tutorial Bayesian AI Tutorial

Nicholson & Korb 67 Nicholson & Korb 68

 Weapons scheduling. BN Software: Issues


 Monitoring power generation.

 Processor fault diagnosis.


 Functionality

– Especially application vs API


 Knowledge Industries applications: (a) in
medicine, sleep disorders, pathology, trauma  Price
care, hand and wrist evaluations, – Many free for demo versions or
dermatology, and home-based health educational use
evaluations (b) in capital equipment,
– Commercial licence costs.
locomotives, gas-turbine engines for aircraft
and land-based power production, the space  Availability (platforms)
shuttle, and office equipment.
 Quality
 Software debuggin.
– GUI
 Vista: decision support system used at NASA – Documentation and Help
Mission Control Center.
 Leading edge
 MS: (a) Answer Wizard (Office 95),
 Robustness
Information retrieval; (b) Print
Troubleshooter; (c) Aladdin, troubleshooting – software
customer support. – company

Bayesian AI Tutorial Bayesian AI Tutorial


Nicholson & Korb 69 Nicholson & Korb 70

Web Resources
BN Software

 Bayesian Belief Network site (Russell


 Analytica: www.lumina.com
Greiner):
 Hugin: www.hugin.com
www.cs.ualberta.ca/ greiner/bn.html
 Netica: www.norsys.com
 Bayesian Network Repository (Nir Friedman)
Above 3 available during tutorial lab session.
www-
 JavaBayes: nt.cs.berkeley.edu/home/nir/public html/Repository/index.htm

http://www.cs.cmu.edu/ javabayes/Home/  Summary of BN software and links to


software sites (Kevin Murphy)
 Many other packages (see next slide)
HTTP.CS.Berkeley.EDU/ murphyk/Bayes/bnsoft.html

Bayesian AI Tutorial Bayesian AI Tutorial

Nicholson & Korb 71 Nicholson & Korb 72

Learning Bayesian Networks


 Linear and Discrete Models
 Learning Network Parameters
– Linear Coefficients
– Learning Probability Tables
Applications: Summary
 Learning Causal Structure
 Conditional Independence Learning
 Various BN structures are available to – Statistical Equivalence
compactly and accurately represent certain – TETRAD II
types of domain features.  Bayesian Learning of Bayesian Networks

 Bayesian networks have been used for a wide – Cooper & Herskovits: K2

range of AI applications. – Learning Variable Order


– Statistical Equivalence Learners
 Robust and easy to use Bayesian network
 Full Causal Learners
software is now readily available.
 Minimum Encoding Methods
– Lam & Bacchus’s MDL learner
– MML metrics
– MML search algorithms
– MML Sampling
 Empirical Results

Bayesian AI Tutorial Bayesian AI Tutorial


Nicholson & Korb 73 Nicholson & Korb 74

Linear and Discrete Models

Linear Models: Used in biology & social


sciences since Sewall Wright (1921)
Learning Linear Parameters
Linear models represent causal relationships as
Maximum likelihood methods have been
sets of linear functions of “independent”
available since Wright’s path model analysis
variables.
(1921).

X1 X2

Equivalent methods:

X3
 Simon-Blalock method (Simon, 1954; Blalock,
1964)
Equivalently (assuming linear parameters):
 Ordinary least squares multiple regression
X3 = a13 X1 + a23 X2 + 1 (OLS)

Discrete models: “Bayesian nets” replace


vectors of linear coefficients with CPTs.

Bayesian AI Tutorial Bayesian AI Tutorial

Nicholson & Korb 75 Nicholson & Korb 76

Learning Conditional
Probability Tables
Spiegelhalter & Lauritzen (1990):

 assume parameter independence

 each CPT cell i = a parameter in a Dirichlet


distribution

D[ 1; : : : ; i ; : : : ; K ]  Dual log-linear and full CPT models (Neil,


for K parents Wallace, Korb 1999).

 prob of outcome i is i =K


k=1 k
 observing outcome i update D to

D[ 1; : : : ; i + 1; : : : ; K ]
Others are looking at learning without
parameter independence. E.g.,

 Decision trees to learn structure within CPTs


(Boutillier et al. 1996).

Bayesian AI Tutorial Bayesian AI Tutorial


Nicholson & Korb 77 Nicholson & Korb 78

Learning Causal Structure


This is the real problem; parameterizing models Statistical Equivalence
is essentially numerical computing.
Verma and Pearl’s rules identify the set of causal
models which are statistically equivalent —
There are two basic methods:

 Learning from conditional independencies


(CI learning) Two causal models H1 and H2 are
statistically equivalent iff they contain
 Learning using a scoring metric the same variables and joint samples
(Metric learning) over them provide no statistical grounds
for preferring one over the other.
CI learning (Verma and Pearl, 1991)
Examples

Suppose you have an Oracle who can answer yes  All fully connected models are equivalent.
or no to any question of the type:
 A !B !C and A B C.
X q Y jS?  A !B !D C and A B !D C.
Then you can learn the correct causal model, up
to statistical equivalence.

Bayesian AI Tutorial Bayesian AI Tutorial

Nicholson & Korb 79 Nicholson & Korb 80

Statistical Equivalence TETRAD II

— Spirtes, Glymour and Scheines (1993)


Chickering (1995):
Replace the Oracle with statistical tests:
 Any two causal models over the same
variables which have the same skeleton  for linear models a significance test on partial
(undirected arcs) and the same directed correlation
v-structures are statistically equivalent.
X q Y jS i XY S = 0
 If H1 and H2 are statistically equivalent,
then they have the same maximum  for discrete models a 2 test on the difference
likelihoods relative to any joint samples: between CPT counts expected with
independence (Ei ) and observed (Oi )
max P (ejH1 ; 1) = max P (ejH2 ; 2)
 O 2
X q Y jS i i
where i is a parameterization of Hi i Oi ln Ei  0

Bayesian AI Tutorial Bayesian AI Tutorial


Nicholson & Korb 81 Nicholson & Korb 82

Bayesian LBN: Cooper &


Herskovits

— Cooper & Herskovits (1991, 1992)


TETRAD II
Compute P (hi je) by brute force, under the
assumptions:
 Asymptotically finds causal structure to
within the statistical equivalence class of the 1. All variables are discrete.
true model.
2. Samples are i.i.d.
 Requires larger sample sizes than MML (Dai,
3. No missing values.
Korb, Wallace & Wu, 1997):
4. All values of child variables are uniformly
Statistical tests are not robust given
distributed.
weak causal interactions and/or small
samples. 5. Priors over hypotheses are uniform.

 Cheap, and easy to use.


With these assumptions, Cooper & Herskovits
reduce the computation of PCH (h; e) to a
polynomial time counting problem.

Bayesian AI Tutorial Bayesian AI Tutorial

Nicholson & Korb 83 Nicholson & Korb 84

Learning Variable Order


Cooper & Herskovits
Reliance upon a given variable order is a major
drawback to K2
But the hypothesis space is exponential; they go
for dramatic simplification: And many other algorithms (Buntine
1991, Bouckert 1994, Suzuki 1996,
6. Assume we know the temporal ordering of Madigan & Raftery 1994)
the variables.
What’s wrong with that?
Now for any pair of variables:
 We want autonomous AI (data mining). If

 either they are connected by an arc


experts can order the variables they can
likely supply models.
 or they are not.
 Determining variable ordering is half the
 Further, cycles are impossible. problem. If we know A comes before B , the
only remaining issue is whether there is a
New hypothesis space has size only 2(n n)=2
2
link between the two.
(still exponential).
 The number of orderings consistent with
Algorithm “K2” does a greedy search through this dags is apparently exponential (Brightwell &
reduced space. Winkler 1990). So iterating over all possible
orderings will not scale up.

Bayesian AI Tutorial Bayesian AI Tutorial


Nicholson & Korb 85 Nicholson & Korb 86

Statistical Equivalence
Statistical Equivalence
Learners
Learners
Wallace & Korb (1999): This is not right!

Heckerman & Geiger (1995) advocate learning


only up to statistical equivalence classes (a la
 These are causal models; they are
distinguishable on experimental data.
TETRAD II).
– Failure to collect some data is no reason to
Since observational data cannot change prior probabilities.
distinguish btw equivalent models, E.g., If your thermometer topped out at
there’s no point trying to go futher. 35 , you wouldn’t treat  35 and 34 as
equally likely.
)Madigan, Andersson, Perlman &  Not all equivalence classes are created equal:
Volinsky (1996) follow this advice, use
f A B !C, A !B !C, A B Cg
uniform prior over equivalence classes.
f A !B Cg
)Geiger and Heckerman (1994) define
Bayesian metrics for linear and discrete  Within classes some dags should have greater
equivalence classes of models (BGe and priors than others. . . E.g.,
BDe) LightsOn !InOffice !LoggedOn v.
LightsOn InOffice !LoggedOn

Bayesian AI Tutorial Bayesian AI Tutorial

Nicholson & Korb 87 Nicholson & Korb 88

Full Causal Learners

So. . . a full causal learner is an algorithm that: MDL

1. Learns causal connectedness. Minimum Description Length (MDL) inference —

2. Learns v-structures.
 Invented by Rissanen (1978)
Hence, learns equivalence classes.
based upon Minimum Message Length
3. Learns full variable order.
(MML) invented by Wallace (Wallace
Hence, learns full causal structure (order +
and Boulton, 1968).
connectedness).
 Plays trade-off btw
 TETRAD II: 1, 2.
– model simplicity
 Madigan et al.: 1, 2. – model fit to the data
 Cooper & Herskovits’ K2: 1. by minimizing the length of a joint
 Lam and Bacchus MDL: 1, 2 (partial), 3 description of model and data given the
(partial). model.

 Wallace, Neil, Korb MML: 1, 2, 3.

Bayesian AI Tutorial Bayesian AI Tutorial


Nicholson & Korb 89 Nicholson & Korb 90

Lam & Bacchus (1993)


MDL encoding of causal models:
Lam & Bacchus
 Network:
n
i=1 [ki log(n) + d(si 1) j 2(i) sj ]
Search algorithm:

– ki log(n) for specifying ki parents for ith  Initial constraints taken from domain expert:
node partial variable order, direct connections
ki
– d(si 1) j=1 sj for specifying the CPT:  Greedy search: every possible arc addition is
d is the fixed bit-length per probability tested, best MDL measure used to add one
si is the number of states for node i (Note: no arcs are deleted)
 Data given network:
 Local arcs checked for improved MDL via arc
n n
N i=1 M (Xi ; (i))N i=1 H (Xi ) reversal

– M (Xi; (i)) is mutual information btw Xi  Iterate until MDL fails to improve
and its parent set
– H (Xi ) is entropy of variable Xi )Results similar to K2, but without full variable
ordering
(NB: This code is not efficient. E.g., treats every
node as equally likely to be a parent; assumes
knowledge of all ki .)

Bayesian AI Tutorial Bayesian AI Tutorial

Nicholson & Korb 91 Nicholson & Korb 92

MML
MML Metric for Linear Models
Minimum Message Length (Wallace & Boulton
1968) uses Shannon’s measure of information:  Network:

I (m) = log P (m) log n! + n(n2 1) log E


Applied in reverse, we can compute P (h; e) from – log n! for variable order
I (h; e). – n(n 1)
for connectivity
2
Given an efficient joint encoding method for the – log E restore efficiency by subtracting
hypothesis & evidence space (i.e., satisfying cost of selecting a linear extension
Shannon’s law), MML:
 Parameters given dag h:
Searches fhi g for that hypothesis h that
minimizes I (h) + I (ejh).
f (j jh)
log p
Xj F (j )
Equivalent to that h that maximizes P (h)P (ejh)
where j are the parameters for Xj and F (j )
— i.e., P (hje).
is the Fisher information. f (j jh) is assumed
The other significant difference from to be N (0; j ).
MDL: MML takes parameter estimation (Cf. with MDL’s fixed length for parms)
seriously.

Bayesian AI Tutorial Bayesian AI Tutorial


Nicholson & Korb 93 Nicholson & Korb 94

MML Metric for discrete


models
MML Metric for Linear Models We can use PCH (hi ; e) (from Cooper &
Herskovits) to define an MML metric for discrete
models.
 Sample for Xj given h and j :
Difference between MML and Bayesian metrics:
K
log P (ejh; j ) = k=1 p1 e  = 
2
jk 2 j2
2j MML partitions the parameter space and
selects optimal parameters.
where K is the number of sample values and
jk is the difference between the observed Equivalent to a penalty of 12 log e
6
per parameter
value of Xj and its linear prediction. (Wallace & Freeman 1987); hence:

I (e; hi) = j2sj log e


6 log PCH (hi ; e) (1)

Applied in MML Sampling algorithm.

Bayesian AI Tutorial Bayesian AI Tutorial

Nicholson & Korb 95 Nicholson & Korb 96

MML search algorithms


MML metrics need to be combined with search.
This has been done three ways:
MML Sampling
1. Wallace, Korb, Dai (1996): greedy search
(linear). Search space of totally ordered models (TOMs).

 Brute force computation of linear


Sampled via a Metropolis algorithm (Metropolis
et al. 1953).
extensions (small models only).
From current model M , find the next model M 0
2. Neil and Korb (1999): genetic algorithms by:
(linear).
 Asymptotic estimator of linear extensions  Randomly select a variable; attempt to swap
order with its predecessor.
 GA chromosomes = causal models
 Genetic operators manipulate them  Or, randomly select a pair; attempt to
 Selection pressure is based on MML
add/delete an arc.

3. Wallace and Korb (1999): MML sampling Attempts succeed whenever P (M 0 )=P (M ) > U
(linear, discrete). (per MML metric), where U is uniformly random
 Stochastic sampling through space of from [0 : 1].
totally ordered causal models
 No counting of linear extensions required

Bayesian AI Tutorial Bayesian AI Tutorial


Nicholson & Korb 97 Nicholson & Korb 98

Empirical Results
MML Sampling A weakness in this area — and AI generally.

Metropolis: this procedure samples TOMs with a  Paper publications based upon very small
frequency proportional to their posterior models, loose comparisons.
probability.
 ALARM net often used — everything sets it
To find posterior of dag h: keep count of visits to to within 1 or 2 arcs.
all TOMs consistent with h
Neil and Korb (1999) compared MML and BGe
Estimated by counting visits to all TOMs (Heckerman & Geiger’s Bayesian metric over
with identical max likelihoods to h equivalence classes), using identical GA search
over linear models:
Output: Probabilities of
 On KL distance and topological distance from
 Top dags
the true model, MML and BGe performed
 Top statistical equivalence classes nearly the same.

 Top MML equivalence classes  On test prediction accuracy on strict effect


nodes (those with no children), MML clearly
outperformed BGe.

Bayesian AI Tutorial Bayesian AI Tutorial

Nicholson & Korb 99 Nicholson & Korb 100

Current Research Issues

 size and complexity

 difficulties with elicitation

 combinations of discrete and continuous (i.e.


mixing node types) (Other) Limitations
 Learning issues

– Missing data
 inappropriate problems (deterministic
systems, legal rules)
– Latent variables
– Experimental data
– Learning CPT structure
– Multi-structure models
 continuous & discrete
 CPTs w/ & w/o parm independence

Bayesian AI Tutorial Bayesian AI Tutorial


Nicholson & Korb 101 Nicholson & Korb 102

R. Neapolitan (1990) Probabilistic Reasoning in Expert


References Systems. Wiley.
C HAPTERS 1, 2 AND 4 COVER SOME OF THE RELEVANT
HISTORY.
Introduction to Bayesian AI
J. Pearl (1988) Probabilistic Reasoning in Intelligent
T. Bayes (1764) “An Essay Towards Solving a Problem in the Systems, Morgan Kaufmann.
Doctrine of Chances.” Phil Trans of the Royal Soc of
London. Reprinted in Biometrika, 45 (1958), 296-315.
F. P. Ramsey (1931) “Truth and Probability” in The
B. Buchanan and E. Shortliffe (eds.) (1984) Rule-Based Foundations of Mathematics and Other Essays. NY:
Expert Systems: The MYCIN Experiments of the Stanford Humanities Press.
Heuristic Programming Project. Addison-Wesley. T HE ORIGIN OF MODERN BAYESIANISM . I NCLUDES
LOTTERY- BASED ELICITATION AND D UTCH - BOOK
B. de Finetti (1964) “Foresight: Its Logical Laws, Its ARGUMENTS FOR THE USE OF PROBABILITIES.
Subjective Sources,” in Kyburg and Smokler (eds.)
Studies in Subjective Probability. NY: Wiley. R. Reiter (1980) “A logic for default reasoning,” Artificial
Intelligence, 13, 81-132.
D. Heckerman (1986) “Probabilistic Interpretations for
MYCIN’s Certainty Factors,” in L.N. Kanal and J.F. J. von Neumann and O.Morgenstern (1947) Theory of Games
Lemmer (eds.) Uncertainty in Artificial Intelligence. and Economic Behavior, 2nd ed. Princeton Univ.
North-Holland. S TANDARD REFERENCE ON ELICITING UTILITIES VIA
LOTTERIES.
C. Howson and P. Urbach (1993) Scientific Reasoning: The
Bayesian Approach. Open Court.
A MODERN REVIEW OF BAYESIAN THEORY. Bayesian Networks

K.B. Korb (1995) “Inductive learning and defeasible E. Charniak (1991) “Bayesian Networks Without Tears”,
inference,” Jrn for Experimental and Theoretical AI, 7, Artificial Intelligence Magazine, pp. 50-63, Vol 12.
291-324. A N ELEMENTARY INTRODUCTION.

Bayesian AI Tutorial Bayesian AI Tutorial

Nicholson & Korb 103 Nicholson & Korb 104

D. D’Ambrosio (1999) “Inference in Bayesian Networks”. Applications


Artificial Intelligence Magazine, Vol 20, No. 2.
D.W. Albrecht, I. Zukerman and Nicholson, A.E. (1998)
P. Haddaway (1999) “An Overview of Some Recent
Bayesian Models for Keyhole Plan Recognition in an
Developments in Bayesian Problem-Solving Techniques”.
Adventure Game. User Modeling and User-Adapted
Artificial Intelligence Magazine, Vol 20, No. 2.
Interaction, 8(1-2), 5-47, Kluwer Academic Publishers.
Howard & Matheson (1981) “Influence Diagrams,” Principles
S. Andreassen, F.V. Jensen, S.K. Andersen, B. Falck, U.
and Applications of Decision Analysis.
Kjærulff, M. Woldbye, A.R. Sørensen, A. Rosenfalck and
F. V. Jensen (1996) An Introduction to Bayesian Networks, F. Jensen (1989) “MUNIN — An Expert EMG Assistant”,
Springer. Computer-Aided Electromyography and Expert Systems,
Chapter 21, J.E. Desmedt (Ed.), Elsevier.
R. Neapolitan (1990) Probabilistic Reasoning in Expert
Systems. Wiley. S.A. Andreassen, J.J Benn, R. Hovorks, K.G. Olesen and
S IMILAR COVERAGE TO THAT OF P EARL ; MORE R.E. Carson (1991) “A Probabilistic Approach to Glucose
EMPHASIS ON PRACTICAL ALGORITHMS FOR NETWORK Prediction and Insulin Dose Adjustment: Description of
UPDATING. Metabolic Model and Pilot Evaluation Study”.

J. Pearl (1988) Probabilistic Reasoning in Intelligent I. Beinlich, H. Suermondt, R. Chavez and G. Cooper (1992)
Systems, Morgan Kaufmann. “The ALARM monitoring system: A case study with two
T HIS IS THE CLASSIC TEXT INTRODUCING BAYESIAN probabilistic inference techniques for belief networks”,
NETWORKS TO THE AI COMMUNITY. Proc. of the 2nd European Conf. on Artificial Intelligence
in medicine, pp. 689-693.
Poole, D., Mackworth, A., and Goebel, R. (1998)
Computational Intelligence: a logical approach. Oxford T.L Dean and M.P. Wellman (1991) Planning and control,
University Press. Morgan Kaufman.

Russell & Norvig (1995) Artificial Intelligence: A Modern T.L. Dean, J. Allen and J. Aloimonos (1994) Artificial
Approach, Prentice Hall. Intelligence: Theory and Practice, Benjamin/Cummings.

Bayesian AI Tutorial Bayesian AI Tutorial


Nicholson & Korb 105 Nicholson & Korb 106

P. Dagum, A. Galper and E. Horvitz (1992) “Dynamic M. Shwe and G. Cooper (1990) “An Empirical Analysis of
Network Models for Forecasting”, Proceedings of the 8th Likelihood-Weighting Simulation on a Large, Multiply
Conference on Uncertainty in Artificial Intelligence, pp. Connected Belief Network”, Proceedings of the Sixth
41-48. Workshop on Uncertainty in Artificial Intelligence, pp.
498-508, 1990.
J. Forbes, T. Huang, K. Kanazawa and S. Russell (1995) “The
BATmobile: Towards a Bayesian Automated Taxi”, L.C. van der Gaag, S. Renooij, C.L.M. Witteman, B.M.P.
Proceedings of the 14th Int. Joint Conf. on Artificial Aleman, B.G. “Tall (1999) How to Elicit Many
Intelligence (IJCAI’95), pp. 1878-1885. Probabilities”, Laskey & Prade (eds) UAI99, 647-654.
S.L Lauritzen and D.J. Spiegelhalter (1988) “Local Zukerman, I., McConachy, R., Korb, K. and Pickett, D. (1999)
Computations with Probabilities on Graphical “Exploratory Interaction with a Bayesian Argumentation
Structures and their Application to Expert Systems”, System,” in IJCAI-99 Proceedings – the Sixteenth
Journal of the Royal Statistical Society, 50(2), pp. International Joint Conference on Artificial Intelligence,
157-224. pp. 1294-1299, Stockholm, Sweden, Morgan Kaufmann.
McConachy et al (1999)

A.E. Nicholson (1999) “CSE2309/3309 Artificial Intelligence, Learning Bayesian Networks


Monash University, Lecture Notes”,
http://www.csse.monash.edu.au/ãnnn/2-3309.html. H. Blalock (1964) Causal Inference in Nonexperimental
Research. University of North Carolina.
M. Pradham, G. Provan, B. Middleton and M. Henrion
(1994) “Knowledge engineering for large belief R. Bouckeart (1994) Probabilistic network construction using
networks”, Proceedings of the 10th Conference on the minimum description length principle. Technical
Uncertainty in Artificial Intelligence. Report RUU-CS-94-27, Dept of Computer Science,
Utrecht University.
D. Pynadeth and M. P. Wellman (1995) “Accounting for
Context in Plan Recogniition, with Application to Traffic C. Boutillier, N. Friedman, M. Goldszmidt, D. Koller (1996)
Monitoring”, Proceedings of the 11th Conference on “Context-specific independence in Bayesian networks,” in
Uncertainty in Artificial Intelligence, pp.472-481. Horvitz & Jensen (eds.) UAI 1996, 115-123.

Bayesian AI Tutorial Bayesian AI Tutorial

Nicholson & Korb 107 Nicholson & Korb 108

G. Brightwell and P. Winkler (1990) Counting linear H. Dai, K.B. Korb, C.S. Wallace and X. Wu (1997) “A study of
extensions is #P-complete. Technical Report DIMACS casual discovery with weak links and small samples.”
90-49, Dept of Computer Science, Rutgers Univ. Proceedings of the Fifteenth International Joint
Conference on Artificial Intelligence (IJCAI),
W. Buntine (1991) “Theory refinement on Bayesian pp. 1304-1309. Morgan Kaufmann.
networks,” in D’Ambrosio, Smets and Bonissone (eds.)
UAI 1991, 52-69. N. Friedman (1997) “The Bayesian Structural EM
Algorithm,” in D. Geiger and P.P. Shenoy (eds.)
W. Buntine (1996) “A Guide to the Literature on Learning Proceedings of the Thirteenth Conference on Uncertainty
Probabilistic Networks from Data,” IEEE Transactions in Artificial Intelligence (pp. 129-138). San Francisco:
on Knowledge and Data Engineering,8, 195-210. Morgan Kaufmann.

D.M. Chickering (1995) “A Tranformational Geiger and Heckerman (1994) “Learning Gaussian
Characterization of Equivalent Bayesian Network networks,” in Lopes de Mantras and Poole (eds.) UAI
Structures,” in P. Besnard and S. Hanks (eds.) 1994, 235-243.
Proceedings of the Eleventh Conference on Uncertainty in
D. Heckerman and D. Geiger (1995) “Learning Bayesian
Artificial Intelligence (pp. 87-98). San Francisco: Morgan
networks: A unification for discrete and Gaussian
Kaufmann. domains,” in Besnard and Hankds (eds.) UAI 1995,
STATISTICAL EQUIVALENCE .
274-284.
G.F. Cooper and E. Herskovits (1991) “A Bayesian Method D. Heckerman, D. Geiger, and D.M. Chickering (1995)
for Constructing Bayesian Belief Networks from “Learning Bayesian Networks: The Combination of
Databases,” in D’Ambrosio, Smets and Bonissone (eds.) Knowledge and Statistical Data,” Machine Learning, 20,
UAI 1991, 86-94. 197-243.
BAYESIAN LEARNING OF STATISTICAL EQUIVALENCE
G.F. Cooper and E. Herskovits (1992) “A Bayesian Method
CLASSES.
for the Induction of Probabilistic Networks from Data,”
Machine Learning, 9, 309-347. K. Korb (1999) “Probabilistic Causal Structure” in H.
A N EARLY BAYESIAN CAUSAL DISCOVERY METHOD. Sankey (ed.) Causation and Laws of Nature:

Bayesian AI Tutorial Bayesian AI Tutorial


Nicholson & Korb 109 Nicholson & Korb 110

Australasian Studies in History and Philosophy of Structure Priors,” in N. Zhong and L. Zhous (eds.)
Science 14. Kluwer Academic. Methodologies for Knowledge Discovery and Data
I NTRODUCTION TO THE RELEVANT PHILOSOPHY OF Mining: Third Pacific-Asia Conference (pp. 432-437).
CAUSATION FOR LEARNING BAYESIAN NETWORKS. Springer Verlag.
G ENETIC ALGORITHMS FOR CAUSAL DISCOVERY;
P. Krause (1998) Learning Probabilistic Networks.
STRUCTURE PRIORS.
http : ==www:auai:org=bayes USKrause:ps:gz
BASIC INTRODUCTION TO BN S, PARAMETERIZATION J.R. Neil, C.S. Wallace and K.B. Korb (1999) “Learning
AND LEARNING CAUSAL STRUCTURE . Bayesian networks with restricted causal interactions,”
W. Lam and F. Bacchus (1993) “Learning Bayesian belief in Laskey and Prade (eds.) UAI 99, 486-493.
networks: An approach based on the MDL principle,” Jrn
J. Rissanen (1978) “Modeling by shortest data description,”
Comp Intelligence, 10, 269-293.
Automatica, 14, 465-471.
D. Madigan, S.A. Andersson, M.D. Perlman & C.T. Volinsky
H. Simon (1954) “Spurious Correlation: A Causal
(1996) “Bayesian model averaging and model selection
Interpretation,” Jrn Amer Stat Assoc, 49, 467-479.
for Markov equivalence classes of acyclic digraphs,”
Comm in Statistics: Theory and Methods, 25, 2493-2519. D. Spiegelhalter & S. Lauritzen (1990) “Sequential Updating
D. Madigan and A. E. Raftery (1994) “Model selection and of Conditional Probabilities on Directed Graphical
accounting for model uncertainty in graphical modesl Structures,” Networks, 20, 579-605.
using Occam’s window,” Jrn AMer Stat Assoc, 89,
P. Spirtes, C. Glymour and R. Scheines (1990) “Causality
1535-1546.
from Probability,” in J.E. Tiles, G.T. McKee and G.C.
N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Dean Evolving Knowledge in Natural Science and
Teller and E. Teller (1953) “Equations of state Artificial Intelligence. London: Pitman. A N
calculations by fast computing machines,” Jrn Chemical ELEMENTARY INTRODUCTION TO STRUCTURE
Physics, 21, 1087-1091. LEARNING VIA CONDITIONAL INDEPENDENCE .

J.R. Neil and K.B. Korb (1999) “The Evolution of Causal P. Spirtes, C. Glymour and R. Scheines (1993) Causation,
Models: A Comparison of Bayesian Metrics and Prediction and Search: Lecture Notes in Statistics 81.

Bayesian AI Tutorial Bayesian AI Tutorial

Nicholson & Korb 111 Nicholson & Korb 112

Springer Verlag.
A THOROUGH PRESENTATION OF THE ORTHODOX
STATISTICAL APPROACH TO LEARNING CAUSAL
STRUCTURE .
C. S. Wallace, K. B. Korb, and H. Dai (1996) “Causal
J. Suzuki (1996) “Learning Bayesian Belief Networks Based Discovery via MML,” in L. Saitta (ed.) Proceedings of the
on the Minimum Description Length Principle,” in L. Thirteenth International Conference on Machine
Saitta (ed.) Proceedings of the Thirteenth International Learning (pp. 516-524). San Francisco: Morgan
Conference on Machine Learning (pp. 462-470). San Kaufmann.
Francisco: Morgan Kaufmann. I NTRODUCES AN MML METRIC FOR CAUSAL MODELS.

T.S. Verma and J. Pearl (1991) “Equivalence and Synthesis S. Wright (1921) “Correlation and Causation,” Jrn
of Causal Models,” in P. Bonissone, M. Henrion, L. Kanal Agricultural Research, 20, 557-585.
and J.F. Lemmer (eds) Uncertainty in Artificial
Intelligence 6 (pp. 255-268). Elsevier. S. Wright (1934) “The Method of Path Coefficients,” Annals
T HE GRAPHICAL CRITERION FOR STATISTICAL of Mathematical Statistics, 5, 161-215.
EQUIVALENCE .

C.S. Wallace and D. Boulton (1968) “An information measure Current Research
for classification,” Computer Jrn, 11, 185-194.

C.S. Wallace and P.R. Freeman (1987) “Estimation and


inference by compact coding,” Jrn Royal Stat Soc (Series
B), 49, 240-252.
Bayesian Network URL’s
C. S. Wallace and K. B. Korb (1999) “Learning Linear Causal
Models by MML Sampling,” in A. Gammerman (ed.)
Causal Models and Intelligent Data Management.
Springer Verlag.
S AMPLING APPROACH TO LEARNING CAUSAL MODELS ;
DISCUSSION OF STRUCTURE PRIORS.

Bayesian AI Tutorial Bayesian AI Tutorial

Você também pode gostar