Escolar Documentos
Profissional Documentos
Cultura Documentos
=
A p
A B p
A B p
=
Negnevitsky, Pearson Education, 2005 12
Hence,
or
Substituting the last equation into the equation
yields the Bayesian rule:
A p A B p A B p =
A p A B p B A p =
B p
B A p
B A p
=
Negnevitsky, Pearson Education, 2005 13
where:
p(A,B) is the conditional probability that event A
occurs given that event B has occurred;
p(B,A) is the conditional probability of event B
occurring given that event A has occurred;
p(A) is the probability of event A occurring;
p(B) is the probability of event B occurring.
Bayesian rule
B p
A p A B p
B A p
=
Negnevitsky, Pearson Education, 2005 14
The joint probability
i
n
i
i
n
i
i
B p B A p B A p =
= = 1 1
A
B
4
B
3
B
1
B
2
Negnevitsky, Pearson Education, 2005 15
If the occurrence of event A depends on only two
mutually exclusive events, B and NOT B , we obtain:
where is the logical function NOT.
Similarly,
Substituting this equation into the Bayesian rule yields:
p(A) = p(A,B) p(B) + p(AB) p(B)
p(B) = p(BA) p(A) + p(B,A) p(A)
A p A B p A p A B p
A p A B p
B A p
+
=
Negnevitsky, Pearson Education, 2005 16
Suppose all rules in the knowledge base are
represented in the following form:
This rule implies that if event E occurs, then the
probability that event H will occur is p.
In expert systems, H usually represents a hypothesis
and E denotes evidence to support this hypothesis.
Bayesian reasoning
IF E is true
THEN H is true {with probability p}
Negnevitsky, Pearson Education, 2005 17
The Bayesian rule expressed in terms of hypotheses
and evidence looks like this:
where:
p(H) is the prior probability of hypothesis H being true;
p(E,H) is the probability that hypothesis H being true will
result in evidence E;
p(H) is the prior probability of hypothesis H being
false;
p(E,H ) is the probability of finding evidence E even
when hypothesis H is false.
H p H E p H p H E p
H p H E p
E H p
+
=
Negnevitsky, Pearson Education, 2005 18
In expert systems, the probabilities required to solve
a problem are provided by experts. An expert
determines the prior probabilities for possible
hypotheses p(H) and p(H), and also the
conditional probabilities for observing evidence E
if hypothesis H is true, p(E,H), and if hypothesis H
is false, p(E,H).
Users provide information about the evidence
observed and the expert system computes p(H,E) for
hypothesis H in light of the user-supplied evidence
E. Probability p(H,E) is called the posterior
probability of hypothesis H upon observing
evidence E.
Negnevitsky, Pearson Education, 2005 19
We can take into account both multiple hypotheses
H
1
, H
2
,..., H
m
and multiple evidences E
1
, E
2
,..., E
n
.
The hypotheses as well as the evidences must be
mutually exclusive and exhaustive.
Single evidence E and multiple hypotheses follow:
Multiple evidences and multiple hypotheses follow:
=
m
k
k k
i i
i
H p H E p
H p H E p
E H p
1
=
m
k
k k n
i i n
n i
H p H E . . . E E p
H p H E . . . E E p
E . . . E E H p
1
2 1
2 1
2 1
Negnevitsky, Pearson Education, 2005 20
This requires to obtain the conditional probabilities
of all possible combinations of evidences for all
hypotheses, and thus places an enormous burden
on the expert.
Therefore, in expert systems, conditional
independence among different evidences assumed.
Thus, instead of the unworkable equation, we
attain:
=
m
k
k k n k k
i i n i i
n i
H p H E p . . . H E p H E p
H p H E p H E p H E p
E . . . E E H p
1
2 1
2 1
2 1
. . .
=
Negnevitsky, Pearson Education, 2005 21
Let us consider a simple example.
Suppose an expert, given three conditionally
independent evidences E
1
, E
2
and E
3
, creates three
mutually exclusive and exhaustive hypotheses H
1
, H
2
and H
3
, and provides prior probabilities for these
hypotheses p(H
1
), p(H
2
) and p(H
3
), respectively.
The expert also determines the conditional
probabilities of observing each evidence for all
possible hypotheses.
Ranking potentially true hypotheses
Negnevitsky, Pearson Education, 2005 22
The prior and conditional probabilities
Assume that we first observe evidence E
3
. The expert
system computes the posterior probabilities for all
hypotheses as
H y p o t h esi s
Probabilit y
= 1 i = 2 i = 3 i
0.40
0.9
0.6
0.3
0.35
0.0
0.7
0.8
0.25
0.7
0.9
0.5
i
H p
i
H E p
1
i
H E p
2
i
H E p
3
Negnevitsky, Pearson Education, 2005 23
After evidence E
3
is observed, belief in hypothesis H
1
decreases and becomes equal to belief in hypothesis
H
2
. Belief in hypothesis H
3
increases and even nearly
reaches beliefs in hypotheses H
1
and H
2
.
Thus,
3 2, 1, = ,
3
1
3
3
3
i
H p H E p
H p H E p
E H p
k
k k
i i
i
=
0.34
25 . 0 .9 0 + 35 . 0 7 . 0 + 0.40 0.6
0.40 0.6
3 1
=
= E H p
0.34
25 . 0 .9 0 + 35 . 0 7 . 0 + 0.40 0.6
35 . 0 7 . 0
3 2
=
= E H p
0.32
25 . 0 .9 0 + 35 . 0 7 . 0 + 0.40 0.6
25 . 0 9 . 0
3 3
=
= E H p
Negnevitsky, Pearson Education, 2005 24
Suppose now that we observe evidence E
1
. The
posterior probabilities are calculated as
Hence,
3 2, 1, = ,
3
1
3 1
3 1
3 1
i
H p H E p H E p
H p H E p H E p
E E H p
k
k k k
i i i
i
=
=
0.19
25 . 0 0.5 + 35 . 0 7 . 0 0.8 + 0.40 0.6 0.3
0.40 0.6 0.3
3 1 1
=
= E E H p
0.52
25 . 0 0.5 + 35 . 0 7 . 0 0.8 + 0.40 0.6 0.3
35 . 0 7 . 0 0.8
3 1 2
=
= E E H p
0.29
25 . 0 0.5 + 35 . 0 7 . 0 0.8 + 0.40 0.6 0.3
25 . 0 9 . 0 0.5
3 1 3
=
= E E H p
Hypothesis H
2
has now become the most likely one.
Negnevitsky, Pearson Education, 2005 25
After observing evidence E
2
, the final posterior
probabilities for all hypotheses are calculated:
Although the initial ranking was H
1
, H
2
and H
3
, only
hypotheses H
1
and H
3
remain under consideration
after all evidences (E
1
, E
2
and E
3
) were observed.
3 2, 1, = ,
3
1
3 2 1
3 2 1
3 2 1
i
H p H E p H E p H E p
H p H E p H E p H E p
E E E H p
k
k k k k
i i i i
i
=
=
0.45
25 . 0 9 . 0 0.5 0.7
0.7
0.7
0.5
+ .35 0 7 . 0 0.0 0.8 + 0.40 0.6 0.9 0.3
0.40 0.6 0.9 0.3
3 2 1 1
=
= E E E H p
0
25 . 0 9 . 0 + .35 0 7 . 0 0.0 0.8 + 0.40 0.6 0.9 0.3
35 . 0 7 . 0 0.0 0.8
3 2 1 2
=
= E E E H p
0.55
25 . 0 9 . 0 0.5 + .35 0 7 . 0 0.0 0.8 + 0.40 0.6 0.9 0.3
25 . 0 9 . 0 0.7 0.5
3 2 1 3
=
= E E E H p
Negnevitsky, Pearson Education, 2005 26
The framework for Bayesian reasoning requires
probability values as primary inputs. The
assessment of these values usually involves human
judgement. However, psychological research
shows that humans cannot elicit probability values
consistent with the Bayesian rules.
This suggests that the conditional probabilities may
be inconsistent with the prior probabilities given by
the expert.
Bias of the Bayesian method
Negnevitsky, Pearson Education, 2005 27
Consider, for example, a car that does not start and
makes odd noises when you press the starter. The
conditional probability of the starter being faulty if
the car makes odd noises may be expressed as:
IF the symptom is odd noises
THEN the starter is bad {with probability 0.7}
Consider, for example, a car that does not start and
makes odd noises when you press the starter. The
conditional probability of the starter being faulty if
the car makes odd noises may be expressed as:
P(starter is not bad,odd noises) =
= p(starter is good,odd noises) = 10.7 = 0.3
Negnevitsky, Pearson Education, 2005 28
Therefore, we can obtain a companion rule that states
IF the symptom is odd noises
THEN the starter is good {with probability 0.3}
Domain experts do not deal with conditional
probabilities and often deny the very existence of the
hidden implicit probability (0.3 in our example).
We would also use available statistical information
and empirical studies to derive the following rules:
IF the starter is bad
THEN the symptom is odd noises {probability 0.85}
IF the starter is bad
THEN the symptom is not odd noises {probability 0.15}
Negnevitsky, Pearson Education, 2005 29
To use the Bayesian rule, we still need the prior
probability, the probability that the starter is bad if
the car does not start. Suppose, the expert supplies us
the value of 5 per cent. Now we can apply the
Bayesian rule to obtain:
The number obtained is significantly lower
than the experts estimate of 0.7 given at the
beginning of this section.
The reason for the inconsistency is that the expert
made different assumptions when assessing the
conditional and prior probabilities.
0.23
0.95 0.15 + 0.05 0.85
0.05 0.85
=
MB(H, E) =
1
max |1, 0|
p(H)
max |p(H
,
E), p(H)|
p(H)
otherwise
if p(H) = 0
MD(H, E) =
1
min |1, 0| p(H)
min |p(H,E), p(H)| p(H)
otherwise
Negnevitsky, Pearson Education, 2005 34
The values of MB(H, E) and MD(H, E) range
between 0 and 1. The strength of belief or
disbelief in hypothesis H depends on the kind of
evidence E observed. Some facts may increase the
strength of belief, but some increase the strength of
disbelief.
The total strength of belief or disbelief in a
hypothesis:
E H, MD , E H, MB min -
E H, MD E H, MB
= cf
1
E
2
E
n
) = min [cf (E
1
), cf (E
2
),...,cf (E
n
)]
cf
For example,
IF sky is clear
AND the forecast is sunny
THEN the action is wear sunglasses {cf 0.8}
and the certainty of sky is clear is 0.9 and the certainty of the
forecast of sunny is 0.7, then
cf (H,E
1
E
2
) = min [0.9, 0.7] 0.8 = 0.7 0.8 = 0.56
IF <evidenceE
1
>
.
.
.
AND <evidenceE
n
>
THEN
<
hypothesisH>
{ cf }
Negnevitsky, Pearson Education, 2005 38
For disjunctive rules such as
the certainty of hypothesis H , is established as follows:
cf (H,E
1
E
2
E
n
) = max [cf (E
1
), cf (E
2
),...,cf (E
n
)] cf
For example,
IF sky is overcast
OR the forecast is rain
THEN the action is take an umbrella {cf 0.9}
and the certainty of sky is overcast is 0.6 and the certainty of the
forecast of rain is 0.8, then
cf (H,E
1
E
2
) = max [0.6, 0.8] 0.9 = 0.8 0.9 = 0.72
IF <evidenceE
1
>
.
.
.
OR <evidenceE
n
>
THEN <hypothesi s H> { cf }
Negnevitsky, Pearson Education, 2005 39
When the same consequent is obtained as a result of
the execution of two or more rules, the individual
certainty factors of these rules must be merged to
give a combined certainty factor for a hypothesis.
Suppose the knowledge base consists of the following
rules:
Rule 1: IF A is X
THEN C is Z {cf 0.8}
Rule 2: IF B is Y
THEN C is Z {cf 0.6}
What certainty should be assigned to object C
having value Z if both Rule 1 and Rule 2 are fired?
Negnevitsky, Pearson Education, 2005 40
Common sense suggests that, if we have two
pieces of evidence (A is X and B is Y) from
different sources (Rule 1 and Rule 2) supporting
the same hypothesis (C is Z), then the confidence
in this hypothesis should increase and become
stronger than if only one piece of evidence had
been obtained.
Negnevitsky, Pearson Education, 2005 41
To calculate a combined certainty factor we can
use the following equation:
where:
cf
1
is the confidence in hypothesis H established by Rule 1;
cf
2
is the confidence in hypothesis H established by Rule 2;
,cf
1
, and ,cf
2
, are absolute magnitudes of cf
1
and cf
2
,
respectively.
if cf 0 andcf 0
cf
1
+
cf
2
(1
cf
1
)
cf (cf
1
, cf
2
) =
1 min |,cf
1
,, ,cf
2
,|
cf
1
+ cf
2
cf
1
+
cf
2
(1+
cf
1
)
if cf 0 or cf 0
if cf < 0 andcf < 0
Negnevitsky, Pearson Education, 2005 42
The certainty factors theory provides a practical
alternative to Bayesian reasoning. The heuristic
manner of combining certainty factors is different
from the manner in which they would be combined
if they were probabilities. The certainty theory is
not mathematically pure but does mimic the
thinking process of a human expert.
Negnevitsky, Pearson Education, 2005 43
Comparison of Bayesian reasoning
and certainty factors
Probability theory is the oldest and best-established
technique to deal with inexact knowledge and
random data. It works well in such areas as
forecasting and planning, where statistical data is
usually available and accurate probability
statements can be made.
Negnevitsky, Pearson Education, 2005 44
However, in many areas of possible applications of
expert systems, reliable statistical information is not
available or we cannot assume the conditional
independence of evidence. As a result, many
researchers have found the Bayesian method
unsuitable for their work. This dissatisfaction
motivated the development of the certainty factors
theory.
Although the certainty factors approach lacks the
mathematical correctness of the probability theory,
it outperforms subjective Bayesian reasoning in
such areas as diagnostics.
Negnevitsky, Pearson Education, 2005 45
Certainty factors are used in cases where the
probabilities are not known or are too difficult or
expensive to obtain. The evidential reasoning
mechanism can manage incrementally acquired
evidence, the conjunction and disjunction of
hypotheses, as well as evidences with different
degrees of belief.
The certainty factors approach also provides better
explanations of the control flow through a rule-
based expert system.
Negnevitsky, Pearson Education, 2005 46
The Bayesian method is likely to be the most
appropriate if reliable statistical data exists, the
knowledge engineer is able to lead, and the expert is
available for serious decision-analytical
conversations.
In the absence of any of the specified conditions,
the Bayesian approach might be too arbitrary and
even biased to produce meaningful results.
The Bayesian belief propagation is of exponential
complexity, and thus is impractical for large
knowledge bases.
Negnevitsky, Pearson Education, 2005 47
Bayesian Classification
Bayesian classifiers are statistical classifiers. They
can predict class membership probabilities, such
as the probability that a given sample belongs to a
particular class.
Nave Bayesian classifiers assume that the effect
of an attribute value on a given class is
independent of values of the other attributes. (class
conditional independence)
Bayes Theorem
P(H|X)=P(X|H)P(H) / P(X)
Negnevitsky, Pearson Education, 2005 48
Nave Bayesian Classification
Each data sample is represented as X=(x
1
,x
2
,,x
n
)
Suppose that there are m classes, C
1
,C
2
,C
m
. Given an
unknown data sample, X, the classifier will predict that X
belongs to the class having the highest posterior
probability.
X assigned to C
i
if and only if
P(C
i
|X)>P(C
j
|X) for 1 j m, j=i.
P(C
i
|X)=P(X|C
i
)P(C
i
) / P(X)
P(X) is constant for all classes
P(C
i
)=s
i
/ s
Assume class conditional independence
[
=
=
n
k
i k i
C x P C X P
1
) | ( ) | (
Negnevitsky, Pearson Education, 2005 49
Example
RID Age Income Student Credit_rating Class: buys_computer
1 <=30 High No Fair No
2 <=30 High No Excellent No
3 31..40 High No Fair Yes
4 >40 Medium No Fair Yes
5 >40 Low Yes Fair Yes
6 >40 Low Yes Excellent No
7 31..40 Low Yes Excellent Yes
8 <=30 Medium No Fair No
9 <=30 Low Yes Fair Yes
10 >40 Medium Yes Fair Yes
11 <=30 Medium Yes Excellent Yes
12 31..40 Medium No Excellent Yes
13 31..40 High Yes Fair Yes
14 >40 Medium No Excellent No
Negnevitsky, Pearson Education, 2005 50
Nave Bayesian Classification(cont.)
X=(age=<=30,income=medium, student=yes,credit_rating=fair)
Maximize P(X|Ci)P(Ci) for i=1,2.
P(buys_computer=yes)=9/14=0.643
P(buys_computer=no)=5/14=0.357
P(age=<30|buy_computer=yes)=2/9=0.222
P(age=<30|buy_computer=no)=3/5=0.6
P(income=medium|buys_computer=yes)=4/9=0.444
P(income=medium|buys_computer=no)=2/5=0.4
P(student=yes|buys_computer=yes)=6/9=0.667
P(student=yes|buys_computer=no)=1/5=0.2
P(credit_rating=fair|buys_computer=yes)=6/9=0.667
P(credit_rating=fair|buys_computer=no)=2/5=0.4
P(X|buys_computer=yes)=0.222*0.444*0.667*0.667=0.044
P(X|buys_computer=no)=0.6*0.4*0.2*0.4=0.019
P(X|buys_computer=yes)P(buys_computer=yes)=0.044*0.643=0.028
P(X|buys_computer=no)P(buys_computer=no)=0.019*0.357=0.007