Escolar Documentos
Profissional Documentos
Cultura Documentos
Overview
Bayesian AI
1. Introduction to Bayesian AI (20 min)
AI’99, Sydney
2. Bayesian networks (50 min)
6 December 1999
Break (10 min)
Ann E. Nicholson and Kevin B. Korb
3. Applications (50 min)
7. Dinner (Optional)
Fuzzy Logic
A Dutch book is a sequence of “fair” bets Now, let’s violate the probability axioms.
which collectively guarantee a loss.
Example
Fair bets are bets based upon the standard
Say, P (A) = 0:1 (violating A2)
odds-probability relation:
Payoff table against A (inverse of: for A),
O(h) = 1 P (Ph()h) with S = 1:
:A Payoff
P (h) = 1 +O(Oh()h) T $pS = -$0.10
F -$(1-p)S = -$1.10
Bayesian AI
The techniques used in learning about the world Dynamic Bayesian Networks
are (primarily) statistical. . . Hence
Bayesian Networks
Example: Earthquake
Data Structure which represents the
(Pearl,R&N)
dependence between variables;
Gives concise specification of the joint You have a new burglar alarm installed.
probability distribution. It is reliable about detecting burglary, but
A Bayesian Network is a graph in which the responds to minor earthquakes.
following holds: Two neighbours (John, Mary) promise to call
1. A set of random variables makes up the you at work when they hear the alarm.
nodes in the network. – John always calls when hears alarm, but
2. A set of directed links or arrows connects confuses alarm with phone ringing (and
pairs of nodes. calls then also)
3. Each node has a conditional probability – Mary likes loud music and sometimes
table that quantifies the effects the misses alarm!
parents have on the node.
Given evidence about who has and hasn’t
4. Directed, acyclic graph (DAG), i.e. no called, estimate the probability of a burglary.
directed cycles.
Burglary Earthquake
0.02
Once specified topology, need to specify
P(B)
B E P(A|B,E) conditional probability table (CPT) for
0.01 Alarm
T T 0.95 each node.
T F 0.94
F T 0.29
F F 0.001
– Each row contains the cond prob of each
JohnCalls MaryCalls
node value for a conditioning case.
A P(J|A) A P(M|A)
– Each row must sum to 1.
T 0.90 T 0.70
F 0.05 F 0.01 – A table for a Boolean var with n Boolean
parents contain 2n+1 probs.
– A node with no parents has one row (the
prior probabilities)
A (more compact) representation of the joint = P (x1 ) P (x2 jx1 )::: P (xn jx1 ^ :::xn 1 )
probability distribution.
2. Choose an ordering for the variables. The correct order to add nodes is to add the
“root causes” first, then the variable they
3. While there are variables left: influence, so on until “leaves” reached.
(a) Pick a variable Xi and add a node to the Examples of wrong ordering (which still
network for it. represent same joint distribution):
(b) Set (Xi ) to some minimal set of nodes
already in the net such that the 1. MaryCalls, JohnCalls, Alarm, Burglary,
conditional independence property is Earthquake.
satisfied. MaryCalls
Burglary Earthquake
Conditional Independence:
Compactness and Node Causal Chains
Ordering (cont.)
MaryCalls
JohnCalls
A B C
Earthquake
P (C jA ^ B ) = P (C jB )
Burglary Alarm
Example
B = severe cough
See below for why.
C = Jill’s flu
Causal causes (or ancestors) also give rise to Common effects (or their descendants) give rise
conditional independence: to conditional dependence:
A C
A C
P (AjC ^ B ) 6= P (A)P (C )
P (C jA ^ B ) = P (C jB ) Example
Example A = flu
B = severe cough
A = Jack’s flu
C = tuberculosis
B = Joe’s flu
Given a severe cough, flu “explains away”
C = Jill’s flu tuberculosis.
D-separation
JohnCalls MaryCalls
Inference in Bayesian
Causal Ordering (cont’d) Networks
Types of Inference:
Given the ordering: Cough, Flu, TB:
Q E Q E E
Cough
TB
E Q
Flu
(Explaining Away)
Marginal independence of Flu and TB must be
re-established by adding Flu ! TB or Flu
Intercausal
TB E Q E
Kinds of Inference
Inference Algorithms:
Diagnostic inferences: from effect to causes. Overview
P(Burglary|JohnCalls)
PhoneRings Alarm
B
T T
E P(A)
0.95
Networks where two nodes are connected by
P(Ph)
T F 0.94 more than one path
F T 0.29
0.05 F F 0.001
JohnCalls MaryCalls – Two or more possible causes which share a
P A P(J) A P(M) common ancestor
T T 0.95 T 0.70
T F 0.5 F 0.01
– One variable can influence another
F T 0.90
F F 0.01 through more than one causal mechanism
Example: Cancer network
Metastatic Cancer
π(Β) = (.001,.999) π(Ε) = (.002,.998) A
λ (Β) = (1,1) λ (Ε) = (1,1)
bel(B) = (.001, .999) bel(E) = (.002, .998)
Brain tumour
B λ A (B) E B C
λ A (E)
bel(Ph) = (.05, .95) Increased total
π A (B) π A (E)
π(Ph) = (.05,.95) serum calcium
D E
λ(Ph) = (1,1) Ph λ J (Ph) A
λ J (A) π M(A) Severe Headaches
π J (Ph) λ M(A) Coma
π J (A)
J M
Message passing doesn’t work - evidence gets
λ (J) = (1,1) λ (M) = (1,0)
“counted twice”
Clustering methods
Clustering methods (cont.)
Transform network into a probabilistically
equivalent polytree by merging (clustering)
offending nodes Jensen Join-tree (Jensen, 1996) version the
current most efficient algorithm in this class
Cancer example: new node Z combining B (e.g. used in Hugin, Netica).
and C
Network evaluation done in two stages
A
– Compile into join-tree
May be slow
May require too much memory if
Z=B,C original network is highly connected
– Do belief updating in join-tree (usually
E fast)
D
Caveat: clustered nodes have increased
P (z ja) = P (b; cja) = P (bja)P (cja) complexity; updates may be computationally
complex
P (ejz ) = P (ejb; c) = P (ejc)
P (djz ) = P (djb; c)
Bayesian AI Tutorial Bayesian AI Tutorial
Nicholson & Korb 37 Nicholson & Korb 38
Evaluation may not converge to exact values Preferences between different outcomes of
(in reasonable time). various plans.
Type of Nodes
Decision Networks
Chance nodes: (ovals) represent random
A Decision network represents information about variables (same as Bayesian networks). Has
an associated CPT. Parents can be decision
the agent’s current state nodes and other chance nodes.
Example: Umbrella
Evaluating Decision
Weather
Networks: Algorithm
P (Weather = Rainj) = 0:3 (a) Set the decision node to that value.
P (Forecast = RainyjWeather = Rain) = 0:60 (b) Calculate the posterior probabilities for
P (Forecast = CloudyjWeather = Rain) = 0:25 the parent nodes of the utility node (as for
P (Forecast = SunnyjWeather = Rain) = 0:15
BNs).
P (Forecast = RainyjWeather = NoRain) = 0:1 (c) Calculate the resulting (expected) utility
P (Forecast = CloudyjWeather = NoRain) = 0:2 for the action.
P (Forecast = SunnyjWeather = NoRain) = 0:7 3. Return the action with the highest expected
utility.
U (NoRain; TakeUmbrella) = 20
U (NoRain; LeaveAtHome) = 100 Simple for single decision, less so when executing
U (Rain; TakeIt) = 70 several actions in sequence (i.e. a plan).
U (Rain; LeaveAtHome) = 0
Bayesian AI Tutorial Bayesian AI Tutorial
Sensor model
Obs t+2
Similarly, Decision Networks can be extended
to include temporal aspects.
The values of state variables at time t depend Sequence of decisions taken = Plan.
only on the values at t 1.
This type of DBN gets very large, very Obs t Obs t+1 Obs t+2 Obs t+3
Ut+3
quickly.
Applications: Overview
BN Software
Web Resources
Example: Cancer
Metastatic cancer is a possible cause of a brain
Example: Asia
tumor and is also an explanation for increased
total serum calcium. In turn, either of these A patient presents to a doctor with shortness of
could explain a patient falling into a coma. breath. The doctor considers that possibles
Severe headache is also possibly associated with causes are tuberculosis, lung cancer and
a brain tumor. (Example from (Pearl, 1988).) bronchitis. Other additional information that is
Metastatic Cancer relevant is whether the patient has recently
A visited Asia (where tuberculosis is more
Brain tumour prevalent), whether or not the patient is a
B C smoker (which increases the chances of cancer
Increased total
serum calcium and bronchitis). A positive xray would indicate
D E
either TB or lung cancer. (Example from
Coma Severe Headaches
(Lauritzen, 1988).)
P (a) = 0:2 visit to Asia smoking
Probabilistic reasoning in
Example: A Lecturer’s Life medicine
Dr. Ann Nicholson spends 60% of her work time in her office.
The rest of her work time is spent elsewhere. When Ann is in
See handout from (Dean et al., 1993).
her office, half the time her light is off (when she is trying to
hide from students and get some real work done). When she
Simplest tree-structured network for
is not in her office, she leaves her light on only 5% of the time. diagnostic reasoning
80% of the time she is in her office, Ann is logged onto the
computer. Because she sometimes logs onto the computer – H = disease hypothesis; F = findings
from home, 10% of the time she is not in her office, she is still (symptoms, test results)
logged onto the computer. Suppose a student checks Dr.
Nicholson’s login status and sees that she is logged on. What
effect does this have on the student’s belief that Dr.
Nicholson’s light is on? (Example from (Nicholson, 1999))
in-office
lights-on logged-on
Multiply-connected network (QMR structure)
Medical Applications
MinVolSet (3)
.976
Dean&Wellman, 1991.
Ventmach (4) Disconnect (2)
1.158 .617
PulmEmbolus(2) Intubation (3) .141 VentTube (4) 1.146 KinkedTube(4)
.288 1.180 .227
.369 .140
.428
PAP (3) Shunt (2) Press (4) .098 VentLung (4)
.067 .100 1.201
1.189
FiO2 (2) VentAlv (4) MinVol (4)
.411 .213 .891
.805 .743
PVSat (3) ArtCO2 (3) .362
Anaphylaxis (2)
ExpCO2 (4)
.054 .239
InsuffAnesth (2) SaO2 (3)
.246 TPR (3)
.092 .066
Catechol (2)
.485
.199
BP (3)
Example of a DBN
Q Q’ Q Q’
L L L L L L L L
0 1 2 3 0 1 2 3
A A A A
0 1 2 3 Q Q’
Q Q’ L L L L
0 1 2 3
%% cc ‘motivation’
concepts like
E
or
% EE cc
‘ability’
Lower level
% c
% E
cc
concepts like
%% E
‘Grade Point Average’
Normative Model: Represents our best %
E
E
c
c
E HB HH ccc
+
@
Semantic
%
Network
E
@
understanding of the domain; proper @
@%
%
EA
E c
2nd layer
%@@
E
Q
EE QQBB
B H HH
c
cc
E A
Q B
A
%% EE C
1st layer
HH
premises. % %
H
HH EE C C
Bayesian
% E C
%%
E
EE
Network
User Model: Represents our best %
% 6 E
understanding of the human; Bayesian Proposition, e.g., [publications authored by person X cited >5 times]
constrained updating
Bayesian Poker
Bayesian Poker BN
(Korb et al., 1999)
Bayesian network provides an estimate of
Poker is ideal for testing automated reasoning
winning at any point in the hand.
under uncertainty
– Physical randomisation
Betting curves based on pot-odds used to
determine action (bet/call, pass or fold).
– Incomplete hand information
– Incomplete opponent info (strategies, bluffing, BPP Win
etc)
stud poker at the level of a good amateur human OPP Current BPP Current
player. To play: M M
A|C U|C
telnet indy13.cs.monash.edu.au
login: poker OPP Action OPP Upcards
password: maverick
Hand Types
Bayesian Poker BN (cont.)
Initial 9 hand types too coarse.
Observation nodes:
MAjC : probability of opponent’s action given
– OPP Upcards: All opponent’s cards except current hand type learned from observed
first are visible to BPP. showdown data.
– OPP Action: BPP knows opponent’s action. MU jC and MCjF estimated by dealing out 107
poker hands.
Web Resources
BN Software
Bayesian networks have been used for a wide – Cooper & Herskovits: K2
X1 X2
Equivalent methods:
X3
Simon-Blalock method (Simon, 1954; Blalock,
1964)
Equivalently (assuming linear parameters):
Ordinary least squares multiple regression
X3 = a13 X1 + a23 X2 + 1 (OLS)
Learning Conditional
Probability Tables
Spiegelhalter & Lauritzen (1990):
D[1; : : : ; i + 1; : : : ; K ]
Others are looking at learning without
parameter independence. E.g.,
Suppose you have an Oracle who can answer yes All fully connected models are equivalent.
or no to any question of the type:
A !B !C and A B C.
X q Y jS? A !B !D C and A B !D C.
Then you can learn the correct causal model, up
to statistical equivalence.
Statistical Equivalence
Statistical Equivalence
Learners
Learners
Wallace & Korb (1999): This is not right!
2. Learns v-structures.
Invented by Rissanen (1978)
Hence, learns equivalence classes.
based upon Minimum Message Length
3. Learns full variable order.
(MML) invented by Wallace (Wallace
Hence, learns full causal structure (order +
and Boulton, 1968).
connectedness).
Plays trade-off btw
TETRAD II: 1, 2.
– model simplicity
Madigan et al.: 1, 2. – model fit to the data
Cooper & Herskovits’ K2: 1. by minimizing the length of a joint
Lam and Bacchus MDL: 1, 2 (partial), 3 description of model and data given the
(partial). model.
– ki log(n) for specifying ki parents for ith Initial constraints taken from domain expert:
node partial variable order, direct connections
ki
– d(si 1) j=1 sj for specifying the CPT: Greedy search: every possible arc addition is
d is the fixed bit-length per probability tested, best MDL measure used to add one
si is the number of states for node i (Note: no arcs are deleted)
Data given network:
Local arcs checked for improved MDL via arc
n n
N i=1 M (Xi ; (i))N i=1 H (Xi ) reversal
– M (Xi; (i)) is mutual information btw Xi Iterate until MDL fails to improve
and its parent set
– H (Xi ) is entropy of variable Xi )Results similar to K2, but without full variable
ordering
(NB: This code is not efficient. E.g., treats every
node as equally likely to be a parent; assumes
knowledge of all ki .)
MML
MML Metric for Linear Models
Minimum Message Length (Wallace & Boulton
1968) uses Shannon’s measure of information: Network:
3. Wallace and Korb (1999): MML sampling Attempts succeed whenever P (M 0 )=P (M ) > U
(linear, discrete). (per MML metric), where U is uniformly random
Stochastic sampling through space of from [0 : 1].
totally ordered causal models
No counting of linear extensions required
Empirical Results
MML Sampling A weakness in this area — and AI generally.
Metropolis: this procedure samples TOMs with a Paper publications based upon very small
frequency proportional to their posterior models, loose comparisons.
probability.
ALARM net often used — everything sets it
To find posterior of dag h: keep count of visits to to within 1 or 2 arcs.
all TOMs consistent with h
Neil and Korb (1999) compared MML and BGe
Estimated by counting visits to all TOMs (Heckerman & Geiger’s Bayesian metric over
with identical max likelihoods to h equivalence classes), using identical GA search
over linear models:
Output: Probabilities of
On KL distance and topological distance from
Top dags
the true model, MML and BGe performed
Top statistical equivalence classes nearly the same.
– Missing data
inappropriate problems (deterministic
systems, legal rules)
– Latent variables
– Experimental data
– Learning CPT structure
– Multi-structure models
continuous & discrete
CPTs w/ & w/o parm independence
K.B. Korb (1995) “Inductive learning and defeasible E. Charniak (1991) “Bayesian Networks Without Tears”,
inference,” Jrn for Experimental and Theoretical AI, 7, Artificial Intelligence Magazine, pp. 50-63, Vol 12.
291-324. A N ELEMENTARY INTRODUCTION.
J. Pearl (1988) Probabilistic Reasoning in Intelligent I. Beinlich, H. Suermondt, R. Chavez and G. Cooper (1992)
Systems, Morgan Kaufmann. “The ALARM monitoring system: A case study with two
T HIS IS THE CLASSIC TEXT INTRODUCING BAYESIAN probabilistic inference techniques for belief networks”,
NETWORKS TO THE AI COMMUNITY. Proc. of the 2nd European Conf. on Artificial Intelligence
in medicine, pp. 689-693.
Poole, D., Mackworth, A., and Goebel, R. (1998)
Computational Intelligence: a logical approach. Oxford T.L Dean and M.P. Wellman (1991) Planning and control,
University Press. Morgan Kaufman.
Russell & Norvig (1995) Artificial Intelligence: A Modern T.L. Dean, J. Allen and J. Aloimonos (1994) Artificial
Approach, Prentice Hall. Intelligence: Theory and Practice, Benjamin/Cummings.
P. Dagum, A. Galper and E. Horvitz (1992) “Dynamic M. Shwe and G. Cooper (1990) “An Empirical Analysis of
Network Models for Forecasting”, Proceedings of the 8th Likelihood-Weighting Simulation on a Large, Multiply
Conference on Uncertainty in Artificial Intelligence, pp. Connected Belief Network”, Proceedings of the Sixth
41-48. Workshop on Uncertainty in Artificial Intelligence, pp.
498-508, 1990.
J. Forbes, T. Huang, K. Kanazawa and S. Russell (1995) “The
BATmobile: Towards a Bayesian Automated Taxi”, L.C. van der Gaag, S. Renooij, C.L.M. Witteman, B.M.P.
Proceedings of the 14th Int. Joint Conf. on Artificial Aleman, B.G. “Tall (1999) How to Elicit Many
Intelligence (IJCAI’95), pp. 1878-1885. Probabilities”, Laskey & Prade (eds) UAI99, 647-654.
S.L Lauritzen and D.J. Spiegelhalter (1988) “Local Zukerman, I., McConachy, R., Korb, K. and Pickett, D. (1999)
Computations with Probabilities on Graphical “Exploratory Interaction with a Bayesian Argumentation
Structures and their Application to Expert Systems”, System,” in IJCAI-99 Proceedings – the Sixteenth
Journal of the Royal Statistical Society, 50(2), pp. International Joint Conference on Artificial Intelligence,
157-224. pp. 1294-1299, Stockholm, Sweden, Morgan Kaufmann.
McConachy et al (1999)
G. Brightwell and P. Winkler (1990) Counting linear H. Dai, K.B. Korb, C.S. Wallace and X. Wu (1997) “A study of
extensions is #P-complete. Technical Report DIMACS casual discovery with weak links and small samples.”
90-49, Dept of Computer Science, Rutgers Univ. Proceedings of the Fifteenth International Joint
Conference on Artificial Intelligence (IJCAI),
W. Buntine (1991) “Theory refinement on Bayesian pp. 1304-1309. Morgan Kaufmann.
networks,” in D’Ambrosio, Smets and Bonissone (eds.)
UAI 1991, 52-69. N. Friedman (1997) “The Bayesian Structural EM
Algorithm,” in D. Geiger and P.P. Shenoy (eds.)
W. Buntine (1996) “A Guide to the Literature on Learning Proceedings of the Thirteenth Conference on Uncertainty
Probabilistic Networks from Data,” IEEE Transactions in Artificial Intelligence (pp. 129-138). San Francisco:
on Knowledge and Data Engineering,8, 195-210. Morgan Kaufmann.
D.M. Chickering (1995) “A Tranformational Geiger and Heckerman (1994) “Learning Gaussian
Characterization of Equivalent Bayesian Network networks,” in Lopes de Mantras and Poole (eds.) UAI
Structures,” in P. Besnard and S. Hanks (eds.) 1994, 235-243.
Proceedings of the Eleventh Conference on Uncertainty in
D. Heckerman and D. Geiger (1995) “Learning Bayesian
Artificial Intelligence (pp. 87-98). San Francisco: Morgan
networks: A unification for discrete and Gaussian
Kaufmann. domains,” in Besnard and Hankds (eds.) UAI 1995,
STATISTICAL EQUIVALENCE .
274-284.
G.F. Cooper and E. Herskovits (1991) “A Bayesian Method D. Heckerman, D. Geiger, and D.M. Chickering (1995)
for Constructing Bayesian Belief Networks from “Learning Bayesian Networks: The Combination of
Databases,” in D’Ambrosio, Smets and Bonissone (eds.) Knowledge and Statistical Data,” Machine Learning, 20,
UAI 1991, 86-94. 197-243.
BAYESIAN LEARNING OF STATISTICAL EQUIVALENCE
G.F. Cooper and E. Herskovits (1992) “A Bayesian Method
CLASSES.
for the Induction of Probabilistic Networks from Data,”
Machine Learning, 9, 309-347. K. Korb (1999) “Probabilistic Causal Structure” in H.
A N EARLY BAYESIAN CAUSAL DISCOVERY METHOD. Sankey (ed.) Causation and Laws of Nature:
Australasian Studies in History and Philosophy of Structure Priors,” in N. Zhong and L. Zhous (eds.)
Science 14. Kluwer Academic. Methodologies for Knowledge Discovery and Data
I NTRODUCTION TO THE RELEVANT PHILOSOPHY OF Mining: Third Pacific-Asia Conference (pp. 432-437).
CAUSATION FOR LEARNING BAYESIAN NETWORKS. Springer Verlag.
G ENETIC ALGORITHMS FOR CAUSAL DISCOVERY;
P. Krause (1998) Learning Probabilistic Networks.
STRUCTURE PRIORS.
http : ==www:auai:org=bayes USKrause:ps:gz
BASIC INTRODUCTION TO BN S, PARAMETERIZATION J.R. Neil, C.S. Wallace and K.B. Korb (1999) “Learning
AND LEARNING CAUSAL STRUCTURE . Bayesian networks with restricted causal interactions,”
W. Lam and F. Bacchus (1993) “Learning Bayesian belief in Laskey and Prade (eds.) UAI 99, 486-493.
networks: An approach based on the MDL principle,” Jrn
J. Rissanen (1978) “Modeling by shortest data description,”
Comp Intelligence, 10, 269-293.
Automatica, 14, 465-471.
D. Madigan, S.A. Andersson, M.D. Perlman & C.T. Volinsky
H. Simon (1954) “Spurious Correlation: A Causal
(1996) “Bayesian model averaging and model selection
Interpretation,” Jrn Amer Stat Assoc, 49, 467-479.
for Markov equivalence classes of acyclic digraphs,”
Comm in Statistics: Theory and Methods, 25, 2493-2519. D. Spiegelhalter & S. Lauritzen (1990) “Sequential Updating
D. Madigan and A. E. Raftery (1994) “Model selection and of Conditional Probabilities on Directed Graphical
accounting for model uncertainty in graphical modesl Structures,” Networks, 20, 579-605.
using Occam’s window,” Jrn AMer Stat Assoc, 89,
P. Spirtes, C. Glymour and R. Scheines (1990) “Causality
1535-1546.
from Probability,” in J.E. Tiles, G.T. McKee and G.C.
N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Dean Evolving Knowledge in Natural Science and
Teller and E. Teller (1953) “Equations of state Artificial Intelligence. London: Pitman. A N
calculations by fast computing machines,” Jrn Chemical ELEMENTARY INTRODUCTION TO STRUCTURE
Physics, 21, 1087-1091. LEARNING VIA CONDITIONAL INDEPENDENCE .
J.R. Neil and K.B. Korb (1999) “The Evolution of Causal P. Spirtes, C. Glymour and R. Scheines (1993) Causation,
Models: A Comparison of Bayesian Metrics and Prediction and Search: Lecture Notes in Statistics 81.
Springer Verlag.
A THOROUGH PRESENTATION OF THE ORTHODOX
STATISTICAL APPROACH TO LEARNING CAUSAL
STRUCTURE .
C. S. Wallace, K. B. Korb, and H. Dai (1996) “Causal
J. Suzuki (1996) “Learning Bayesian Belief Networks Based Discovery via MML,” in L. Saitta (ed.) Proceedings of the
on the Minimum Description Length Principle,” in L. Thirteenth International Conference on Machine
Saitta (ed.) Proceedings of the Thirteenth International Learning (pp. 516-524). San Francisco: Morgan
Conference on Machine Learning (pp. 462-470). San Kaufmann.
Francisco: Morgan Kaufmann. I NTRODUCES AN MML METRIC FOR CAUSAL MODELS.
T.S. Verma and J. Pearl (1991) “Equivalence and Synthesis S. Wright (1921) “Correlation and Causation,” Jrn
of Causal Models,” in P. Bonissone, M. Henrion, L. Kanal Agricultural Research, 20, 557-585.
and J.F. Lemmer (eds) Uncertainty in Artificial
Intelligence 6 (pp. 255-268). Elsevier. S. Wright (1934) “The Method of Path Coefficients,” Annals
T HE GRAPHICAL CRITERION FOR STATISTICAL of Mathematical Statistics, 5, 161-215.
EQUIVALENCE .
C.S. Wallace and D. Boulton (1968) “An information measure Current Research
for classification,” Computer Jrn, 11, 185-194.