SlideShare ist ein Scribd-Unternehmen logo
1 von 92
Downloaden Sie, um offline zu lesen
Oct 15th, 2001Copyright © 2001, Andrew W. Moore
Bayes Nets for representing
and reasoning about
uncertainty
Andrew W. Moore
Associate Professor
School of Computer Science
Carnegie Mellon University
www.cs.cmu.edu/~awm
awm@cs.cmu.edu
412-268-7599
Note to other teachers and users of
these slides. Andrew would be delighted
if you found this source material useful in
giving your own lectures. Feel free to use
these slides verbatim, or to modify them
to fit your own needs. PowerPoint
originals are available. If you make use
of a significant portion of these slides in
your own lecture, please include this
message, or the following link to the
source repository of Andrew’s tutorials:
http://www.cs.cmu.edu/~awm/tutorials .
Comments and corrections gratefully
received.
Bayes Nets: Slide 2Copyright © 2001, Andrew W. Moore
What we’ll discuss
• Recall the numerous and dramatic benefits
of Joint Distributions for describing uncertain
worlds
• Reel with terror at the problem with using
Joint Distributions
• Discover how Bayes Net methodology
allows us to built Joint Distributions in
manageable chunks
• Discover there’s still a lurking problem…
• …Start to solve that problem
Bayes Nets: Slide 3Copyright © 2001, Andrew W. Moore
Why this matters
• In Andrew’s opinion, the most important
technology in the Machine Learning / AI field
to have emerged in the last 10 years.
• A clean, clear, manageable language and
methodology for expressing what you’re
certain and uncertain about
• Already, many practical applications in
medicine, factories, helpdesks:
P(this problem | these symptoms)
anomalousness of this observation
choosing next diagnostic test | these observations
Bayes Nets: Slide 4Copyright © 2001, Andrew W. Moore
Why this matters
• In Andrew’s opinion, the most important
technology in the Machine Learning / AI field
to have emerged in the last 10 years.
• A clean, clear, manageable language and
methodology for expressing what you’re
certain and uncertain about
• Already, many practical applications in
medicine, factories, helpdesks:
P(this problem | these symptoms)
anomalousness of this observation
choosing next diagnostic test | these observations
Anomaly
Detection
Inference
Active Data
Collection
Bayes Nets: Slide 5Copyright © 2001, Andrew W. Moore
Ways to deal with Uncertainty
• Three-valued logic: True / False / Maybe
• Fuzzy logic (truth values between 0 and 1)
• Non-monotonic reasoning (especially
focused on Penguin informatics)
• Dempster-Shafer theory (and an extension
known as quasi-Bayesian theory)
• Possibabilistic Logic
• Probability
Bayes Nets: Slide 6Copyright © 2001, Andrew W. Moore
Discrete Random Variables
• A is a Boolean-valued random variable if A
denotes an event, and there is some degree
of uncertainty as to whether A occurs.
• Examples
• A = The US president in 2023 will be male
• A = You wake up tomorrow with a headache
• A = You have Ebola
Bayes Nets: Slide 7Copyright © 2001, Andrew W. Moore
Probabilities
• We write P(A) as “the fraction of possible
worlds in which A is true”
• We could at this point spend 2 hours on the
philosophy of this.
• But we won’t.
Bayes Nets: Slide 8Copyright © 2001, Andrew W. Moore
Visualizing A
Event space of
all possible
worlds
Its area is 1
Worlds in which A is False
Worlds in which
A is true
P(A) = Area of
reddish oval
Bayes Nets: Slide 9Copyright © 2001, Andrew W. Moore
Interpreting the axioms
• 0 <= P(A) <= 1
• P(True) = 1
• P(False) = 0
• P(A or B) = P(A) + P(B) - P(A and B)
The area of A can’t get
any smaller than 0
And a zero area would
mean no world could
ever have A true
Bayes Nets: Slide 10Copyright © 2001, Andrew W. Moore
Interpreting the axioms
• 0 <= P(A) <= 1
• P(True) = 1
• P(False) = 0
• P(A or B) = P(A) + P(B) - P(A and B)
The area of A can’t get
any bigger than 1
And an area of 1 would
mean all worlds will have
A true
Bayes Nets: Slide 11Copyright © 2001, Andrew W. Moore
Interpreting the axioms
• 0 <= P(A) <= 1
• P(True) = 1
• P(False) = 0
• P(A or B) = P(A) + P(B) - P(A and B)
A
B
Bayes Nets: Slide 12Copyright © 2001, Andrew W. Moore
Interpreting the axioms
• 0 <= P(A) <= 1
• P(True) = 1
• P(False) = 0
• P(A or B) = P(A) + P(B) - P(A and B)
A
B
P(A or B)
BP(A and B)
Simple addition and subtraction
Bayes Nets: Slide 13Copyright © 2001, Andrew W. Moore
These Axioms are Not to be
Trifled With
• There have been attempts to do different
methodologies for uncertainty
• Fuzzy Logic
• Three-valued logic
• Dempster-Shafer
• Non-monotonic reasoning
• But the axioms of probability are the only
system with this property:
If you gamble using them you can’t be unfairly exploited by
an opponent using some other system [di Finetti 1931]
Bayes Nets: Slide 14Copyright © 2001, Andrew W. Moore
Theorems from the Axioms
• 0 <= P(A) <= 1, P(True) = 1, P(False) = 0
• P(A or B) = P(A) + P(B) - P(A and B)
From these we can prove:
P(not A) = P(~A) = 1-P(A)
• How?
Bayes Nets: Slide 15Copyright © 2001, Andrew W. Moore
Side Note
• I am inflicting these proofs on you for two
reasons:
1. These kind of manipulations will need to be
second nature to you if you use probabilistic
analytics in depth
2. Suffering is good for you
Bayes Nets: Slide 16Copyright © 2001, Andrew W. Moore
Another important theorem
• 0 <= P(A) <= 1, P(True) = 1, P(False) = 0
• P(A or B) = P(A) + P(B) - P(A and B)
From these we can prove:
P(A) = P(A ^ B) + P(A ^ ~B)
• How?
Bayes Nets: Slide 17Copyright © 2001, Andrew W. Moore
Conditional Probability
• P(A|B) = Fraction of worlds in which B is true
that also have A true
F
H
H = “Have a headache”
F = “Coming down with Flu”
P(H) = 1/10
P(F) = 1/40
P(H|F) = 1/2
“Headaches are rare and flu
is rarer, but if you’re coming
down with ‘flu there’s a 50-
50 chance you’ll have a
headache.”
Bayes Nets: Slide 18Copyright © 2001, Andrew W. Moore
Conditional Probability
F
H
H = “Have a headache”
F = “Coming down with Flu”
P(H) = 1/10
P(F) = 1/40
P(H|F) = 1/2
P(H|F) = Fraction of flu-inflicted
worlds in which you have a
headache
= #worlds with flu and headache
------------------------------------
#worlds with flu
= Area of “H and F” region
------------------------------
Area of “F” region
= P(H ^ F)
-----------
P(F)
Bayes Nets: Slide 19Copyright © 2001, Andrew W. Moore
Definition of Conditional Probability
P(A ^ B)
P(A|B) = -----------
P(B)
Corollary: The Chain Rule
P(A ^ B) = P(A|B) P(B)
Bayes Nets: Slide 20Copyright © 2001, Andrew W. Moore
Bayes Rule
P(A ^ B) P(A|B) P(B)
P(B|A) = ----------- = ---------------
P(A) P(A)
This is Bayes Rule
Bayes, Thomas (1763) An essay
towards solving a problem in the doctrine
of chances. Philosophical Transactions
of the Royal Society of London, 53:370-
418
Bayes Nets: Slide 21Copyright © 2001, Andrew W. Moore
Using Bayes Rule to Gamble
The “Win” envelope
has a dollar and four
beads in it
$1.00
The “Lose” envelope
has three beads and
no money
Trivial question: someone draws an envelope at random and offers to
sell it to you. How much should you pay?
R R B B R B B
Bayes Nets: Slide 22Copyright © 2001, Andrew W. Moore
Using Bayes Rule to Gamble
The “Win” envelope
has a dollar and four
beads in it
$1.00
The “Lose” envelope
has three beads and
no money
Interesting question: before deciding, you are allowed to see one bead
drawn from the envelope.
Suppose it’s black: How much should you pay?
Suppose it’s red: How much should you pay?
Bayes Nets: Slide 23Copyright © 2001, Andrew W. Moore
Calculation…
$1.00
Bayes Nets: Slide 24Copyright © 2001, Andrew W. Moore
Multivalued Random Variables
• Suppose A can take on more than 2 values
• A is a random variable with arity k if it can
take on exactly one value out of {v1,v2, .. vk}
• Thus…
jivAvAP ji  if0)(
1)( 21  kvAvAvAP
Bayes Nets: Slide 25Copyright © 2001, Andrew W. Moore
An easy fact about Multivalued Random Variables:
• Using the axioms of probability…
0 <= P(A) <= 1, P(True) = 1, P(False) = 0
P(A or B) = P(A) + P(B) - P(A and B)
• And assuming that A obeys…
• It’s easy to prove that
jivAvAP ji  if0)(
1)( 21  kvAvAvAP
)()(
1
21 

i
j
ji vAPvAvAvAP
Bayes Nets: Slide 26Copyright © 2001, Andrew W. Moore
An easy fact about Multivalued Random Variables:
• Using the axioms of probability…
0 <= P(A) <= 1, P(True) = 1, P(False) = 0
P(A or B) = P(A) + P(B) - P(A and B)
• And assuming that A obeys…
• It’s easy to prove that
jivAvAP ji  if0)(
1)( 21  kvAvAvAP
)()(
1
21 

i
j
ji vAPvAvAvAP
• And thus we can prove
1)(
1

k
j
jvAP
Bayes Nets: Slide 27Copyright © 2001, Andrew W. Moore
Another fact about Multivalued Random Variables:
• Using the axioms of probability…
0 <= P(A) <= 1, P(True) = 1, P(False) = 0
P(A or B) = P(A) + P(B) - P(A and B)
• And assuming that A obeys…
• It’s easy to prove that
jivAvAP ji  if0)(
1)( 21  kvAvAvAP
)(])[(
1
21 

i
j
ji vABPvAvAvABP
Bayes Nets: Slide 28Copyright © 2001, Andrew W. Moore
Another fact about Multivalued Random Variables:
• Using the axioms of probability…
0 <= P(A) <= 1, P(True) = 1, P(False) = 0
P(A or B) = P(A) + P(B) - P(A and B)
• And assuming that A obeys…
• It’s easy to prove that
jivAvAP ji  if0)(
1)( 21  kvAvAvAP
)(])[(
1
21 

i
j
ji vABPvAvAvABP
• And thus we can prove
)()(
1


k
j
jvABPBP
Bayes Nets: Slide 29Copyright © 2001, Andrew W. Moore
More General Forms of Bayes Rule
)(~)|~()()|(
)()|(
)|(
APABPAPABP
APABP
BAP


)(
)()|(
)|(
XBP
XAPXABP
XBAP



Bayes Nets: Slide 30Copyright © 2001, Andrew W. Moore
More General Forms of Bayes Rule



 An
k
kk
ii
i
vAPvABP
vAPvABP
BvAP
1
)()|(
)()|(
)|(
Bayes Nets: Slide 31Copyright © 2001, Andrew W. Moore
Useful Easy-to-prove facts
1)|()|(  BAPBAP
1)|(
1

An
k
k BvAP
Bayes Nets: Slide 32Copyright © 2001, Andrew W. Moore
The Joint Distribution
Recipe for making a joint distribution
of M variables:
Example: Boolean
variables A, B, C
Bayes Nets: Slide 33Copyright © 2001, Andrew W. Moore
The Joint Distribution
Recipe for making a joint distribution
of M variables:
1. Make a truth table listing all
combinations of values of your
variables (if there are M Boolean
variables then the table will have
2M rows).
Example: Boolean
variables A, B, C
A B C
0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1
Bayes Nets: Slide 34Copyright © 2001, Andrew W. Moore
The Joint Distribution
Recipe for making a joint distribution
of M variables:
1. Make a truth table listing all
combinations of values of your
variables (if there are M Boolean
variables then the table will have
2M rows).
2. For each combination of values,
say how probable it is.
Example: Boolean
variables A, B, C
A B C Prob
0 0 0 0.30
0 0 1 0.05
0 1 0 0.10
0 1 1 0.05
1 0 0 0.05
1 0 1 0.10
1 1 0 0.25
1 1 1 0.10
Bayes Nets: Slide 35Copyright © 2001, Andrew W. Moore
The Joint Distribution
Recipe for making a joint distribution
of M variables:
1. Make a truth table listing all
combinations of values of your
variables (if there are M Boolean
variables then the table will have
2M rows).
2. For each combination of values,
say how probable it is.
3. If you subscribe to the axioms of
probability, those numbers must
sum to 1.
Example: Boolean
variables A, B, C
A B C Prob
0 0 0 0.30
0 0 1 0.05
0 1 0 0.10
0 1 1 0.05
1 0 0 0.05
1 0 1 0.10
1 1 0 0.25
1 1 1 0.10
A
B
C0.05
0.25
0.10 0.050.05
0.10
0.10
0.30
Bayes Nets: Slide 36Copyright © 2001, Andrew W. Moore
Using the
Joint
Once you have the JD you
can ask for the probability of
any logical expression
involving your attribute

E
PEP
matchingrows
)row()(
Bayes Nets: Slide 37Copyright © 2001, Andrew W. Moore
Using the
Joint
P(Poor Male) = 0.4654

E
PEP
matchingrows
)row()(
Bayes Nets: Slide 38Copyright © 2001, Andrew W. Moore
Using the
Joint
P(Poor) = 0.7604

E
PEP
matchingrows
)row()(
Bayes Nets: Slide 39Copyright © 2001, Andrew W. Moore
Inference
with the
Joint





2
21
matchingrows
andmatchingrows
2
21
21
)row(
)row(
)(
)(
)|(
E
EE
P
P
EP
EEP
EEP
Bayes Nets: Slide 40Copyright © 2001, Andrew W. Moore
Inference
with the
Joint





2
21
matchingrows
andmatchingrows
2
21
21
)row(
)row(
)(
)(
)|(
E
EE
P
P
EP
EEP
EEP
P(Male | Poor) = 0.4654 / 0.7604 = 0.612
Bayes Nets: Slide 41Copyright © 2001, Andrew W. Moore
Joint distributions
• Good news
Once you have a joint
distribution, you can
ask important
questions about
stuff that involves a
lot of uncertainty
• Bad news
Impossible to create
for more than about
ten attributes
because there are
so many numbers
needed when you
build the damn thing.
Bayes Nets: Slide 42Copyright © 2001, Andrew W. Moore
Using fewer numbers
Suppose there are two events:
• M: Manuela teaches the class (otherwise it’s Andrew)
• S: It is sunny
The joint p.d.f. for these events contain four entries.
If we want to build the joint p.d.f. we’ll have to invent those
four numbers. OR WILL WE??
• We don’t have to specify with bottom level conjunctive
events such as P(~M^S) IF…
• …instead it may sometimes be more convenient for us
to specify things like: P(M), P(S).
But just P(M) and P(S) don’t derive the joint distribution. So
you can’t answer all questions.
Bayes Nets: Slide 43Copyright © 2001, Andrew W. Moore
Using fewer numbers
Suppose there are two events:
• M: Manuela teaches the class (otherwise it’s Andrew)
• S: It is sunny
The joint p.d.f. for these events contain four entries.
If we want to build the joint p.d.f. we’ll have to invent those
four numbers. OR WILL WE??
• We don’t have to specify with bottom level conjunctive
events such as P(~M^S) IF…
• …instead it may sometimes be more convenient for us
to specify things like: P(M), P(S).
But just P(M) and P(S) don’t derive the joint distribution. So
you can’t answer all questions.
Bayes Nets: Slide 44Copyright © 2001, Andrew W. Moore
Independence
“The sunshine levels do not depend on and do not
influence who is teaching.”
This can be specified very simply:
P(S  M) = P(S)
This is a powerful statement!
It required extra domain knowledge. A different kind
of knowledge than numerical probabilities. It needed
an understanding of causation.
Bayes Nets: Slide 45Copyright © 2001, Andrew W. Moore
Independence
From P(S  M) = P(S), the rules of probability imply: (can
you prove these?)
• P(~S  M) = P(~S)
• P(M  S) = P(M)
• P(M ^ S) = P(M) P(S)
• P(~M ^ S) = P(~M) P(S), (PM^~S) = P(M)P(~S),
P(~M^~S) = P(~M)P(~S)
Bayes Nets: Slide 46Copyright © 2001, Andrew W. Moore
Independence
From P(S  M) = P(S), the rules of probability imply: (can
you prove these?)
• P(~S  M) = P(~S)
• P(M  S) = P(M)
• P(M ^ S) = P(M) P(S)
• P(~M ^ S) = P(~M) P(S), (PM^~S) = P(M)P(~S),
P(~M^~S) = P(~M)P(~S)
And in general:
P(M=u ^ S=v) = P(M=u) P(S=v)
for each of the four combinations of
u=True/False
v=True/False
Bayes Nets: Slide 47Copyright © 2001, Andrew W. Moore
Independence
We’ve stated:
P(M) = 0.6
P(S) = 0.3
P(S  M) = P(S)
M S Prob
T T
T F
F T
F F
And since we now have the joint pdf, we can make
any queries we like.
From these statements, we can
derive the full joint pdf.
Bayes Nets: Slide 48Copyright © 2001, Andrew W. Moore
A more interesting case
• M : Manuela teaches the class
• S : It is sunny
• L : The lecturer arrives slightly late.
Assume both lecturers are sometimes delayed by bad
weather. Andrew is more likely to arrive late than Manuela.
Bayes Nets: Slide 49Copyright © 2001, Andrew W. Moore
A more interesting case
• M : Manuela teaches the class
• S : It is sunny
• L : The lecturer arrives slightly late.
Assume both lecturers are sometimes delayed by bad
weather. Andrew is more likely to arrive late than Manuela.
Let’s begin with writing down knowledge we’re happy about:
P(S  M) = P(S), P(S) = 0.3, P(M) = 0.6
Lateness is not independent of the weather and is not
independent of the lecturer.
Bayes Nets: Slide 50Copyright © 2001, Andrew W. Moore
A more interesting case
• M : Manuela teaches the class
• S : It is sunny
• L : The lecturer arrives slightly late.
Assume both lecturers are sometimes delayed by bad
weather. Andrew is more likely to arrive late than Manuela.
Let’s begin with writing down knowledge we’re happy about:
P(S  M) = P(S), P(S) = 0.3, P(M) = 0.6
Lateness is not independent of the weather and is not
independent of the lecturer.
We already know the Joint of S and M, so all we need now is
P(L  S=u, M=v)
in the 4 cases of u/v = True/False.
Bayes Nets: Slide 51Copyright © 2001, Andrew W. Moore
A more interesting case
• M : Manuela teaches the class
• S : It is sunny
• L : The lecturer arrives slightly late.
Assume both lecturers are sometimes delayed by bad
weather. Andrew is more likely to arrive late than Manuela.
P(S  M) = P(S)
P(S) = 0.3
P(M) = 0.6
P(L  M ^ S) = 0.05
P(L  M ^ ~S) = 0.1
P(L  ~M ^ S) = 0.1
P(L  ~M ^ ~S) = 0.2
Now we can derive a full joint
p.d.f. with a “mere” six numbers
instead of seven*
*Savings are larger for larger numbers of variables.
Bayes Nets: Slide 52Copyright © 2001, Andrew W. Moore
A more interesting case
• M : Manuela teaches the class
• S : It is sunny
• L : The lecturer arrives slightly late.
Assume both lecturers are sometimes delayed by bad
weather. Andrew is more likely to arrive late than Manuela.
P(S  M) = P(S)
P(S) = 0.3
P(M) = 0.6
P(L  M ^ S) = 0.05
P(L  M ^ ~S) = 0.1
P(L  ~M ^ S) = 0.1
P(L  ~M ^ ~S) = 0.2
Question: Express
P(L=x ^ M=y ^ S=z)
in terms that only need the above
expressions, where x,y and z may
each be True or False.
Bayes Nets: Slide 53Copyright © 2001, Andrew W. Moore
A bit of notation
P(S  M) = P(S)
P(S) = 0.3
P(M) = 0.6
P(L  M ^ S) = 0.05
P(L  M ^ ~S) = 0.1
P(L  ~M ^ S) = 0.1
P(L  ~M ^ ~S) = 0.2
S M
L
P(s)=0.3
P(M)=0.6
P(LM^S)=0.05
P(LM^~S)=0.1
P(L~M^S)=0.1
P(L~M^~S)=0.2
Bayes Nets: Slide 54Copyright © 2001, Andrew W. Moore
A bit of notation
P(S  M) = P(S)
P(S) = 0.3
P(M) = 0.6
P(L  M ^ S) = 0.05
P(L  M ^ ~S) = 0.1
P(L  ~M ^ S) = 0.1
P(L  ~M ^ ~S) = 0.2
S M
L
P(s)=0.3
P(M)=0.6
P(LM^S)=0.05
P(LM^~S)=0.1
P(L~M^S)=0.1
P(L~M^~S)=0.2
Read the absence of an arrow
between S and M to mean “it
would not help me predict M if I
knew the value of S”
Read the two arrows into L to
mean that if I want to know the
value of L it may help me to
know M and to know S.
Thiskindofstuffwillbe
thoroughlyformalizedlater
Bayes Nets: Slide 55Copyright © 2001, Andrew W. Moore
An even cuter trick
Suppose we have these three events:
• M : Lecture taught by Manuela
• L : Lecturer arrives late
• R : Lecture concerns robots
Suppose:
• Andrew has a higher chance of being late than Manuela.
• Andrew has a higher chance of giving robotics lectures.
What kind of independence can we find?
How about:
• P(L  M) = P(L) ?
• P(R  M) = P(R) ?
• P(L  R) = P(L) ?
Bayes Nets: Slide 56Copyright © 2001, Andrew W. Moore
Conditional independence
Once you know who the lecturer is, then whether
they arrive late doesn’t affect whether the lecture
concerns robots.
P(R  M,L) = P(R  M) and
P(R  ~M,L) = P(R  ~M)
We express this in the following way:
“R and L are conditionally independent given M”
M
L R
Given knowledge of M,
knowing anything else in
the diagram won’t help
us with L, etc.
..which is also notated
by the following
diagram.
Bayes Nets: Slide 57Copyright © 2001, Andrew W. Moore
Conditional Independence formalized
R and L are conditionally independent given M if
for all x,y,z in {T,F}:
P(R=x  M=y ^ L=z) = P(R=x  M=y)
More generally:
Let S1 and S2 and S3 be sets of variables.
Set-of-variables S1 and set-of-variables S2 are
conditionally independent given S3 if for all
assignments of values to the variables in the sets,
P(S1’s assignments  S2’s assignments & S3’s assignments)=
P(S1’s assignments  S3’s assignments)
Bayes Nets: Slide 58Copyright © 2001, Andrew W. Moore
Example:
R and L are conditionally independent given M if
for all x,y,z in {T,F}:
P(R=x  M=y ^ L=z) = P(R=x  M=y)
More generally:
Let S1 and S2 and S3 be sets of variables.
Set-of-variables S1 and set-of-variables S2 are
conditionally independent given S3 if for all
assignments of values to the variables in the sets,
P(S1’s assignments  S2’s assignments & S3’s assignments)=
P(S1’s assignments  S3’s assignments)
“Shoe-size is conditionally independent of Glove-size given
height weight and age”
means
forall s,g,h,w,a
P(ShoeSize=s|Height=h,Weight=w,Age=a)
=
P(ShoeSize=s|Height=h,Weight=w,Age=a,GloveSize=g)
Bayes Nets: Slide 59Copyright © 2001, Andrew W. Moore
Example:
R and L are conditionally independent given M if
for all x,y,z in {T,F}:
P(R=x  M=y ^ L=z) = P(R=x  M=y)
More generally:
Let S1 and S2 and S3 be sets of variables.
Set-of-variables S1 and set-of-variables S2 are
conditionally independent given S3 if for all
assignments of values to the variables in the sets,
P(S1’s assignments  S2’s assignments & S3’s assignments)=
P(S1’s assignments  S3’s assignments)
“Shoe-size is conditionally independent of Glove-size given
height weight and age”
does not mean
forall s,g,h
P(ShoeSize=s|Height=h)
=
P(ShoeSize=s|Height=h, GloveSize=g)
Bayes Nets: Slide 60Copyright © 2001, Andrew W. Moore
Conditional
independence
M
L R
We can write down P(M). And then, since we know
L is only directly influenced by M, we can write
down the values of P(LM) and P(L~M) and know
we’ve fully specified L’s behavior. Ditto for R.
P(M) = 0.6
P(L  M) = 0.085
P(L  ~M) = 0.17
P(R  M) = 0.3
P(R  ~M) = 0.6
‘R and L conditionally
independent given M’
Bayes Nets: Slide 61Copyright © 2001, Andrew W. Moore
Conditional independence
M
L R
P(M) = 0.6
P(L  M) = 0.085
P(L  ~M) = 0.17
P(R  M) = 0.3
P(R  ~M) = 0.6
Conditional Independence:
P(RM,L) = P(RM),
P(R~M,L) = P(R~M)
Again, we can obtain any member of the Joint
prob dist that we desire:
P(L=x ^ R=y ^ M=z) =
Bayes Nets: Slide 62Copyright © 2001, Andrew W. Moore
Assume five variables
T: The lecture started by 10:35
L: The lecturer arrives late
R: The lecture concerns robots
M: The lecturer is Manuela
S: It is sunny
• T only directly influenced by L (i.e. T is
conditionally independent of R,M,S given L)
• L only directly influenced by M and S (i.e. L is
conditionally independent of R given M & S)
• R only directly influenced by M (i.e. R is
conditionally independent of L,S, given M)
• M and S are independent
Bayes Nets: Slide 63Copyright © 2001, Andrew W. Moore
Making a Bayes net
S M
R
L
T
Step One: add variables.
• Just choose the variables you’d like to be included in the
net.
T: The lecture started by 10:35
L: The lecturer arrives late
R: The lecture concerns robots
M: The lecturer is Manuela
S: It is sunny
Bayes Nets: Slide 64Copyright © 2001, Andrew W. Moore
Making a Bayes net
S M
R
L
T
Step Two: add links.
• The link structure must be acyclic.
• If node X is given parents Q1,Q2,..Qn you are promising
that any variable that’s a non-descendent of X is
conditionally independent of X given {Q1,Q2,..Qn}
T: The lecture started by 10:35
L: The lecturer arrives late
R: The lecture concerns robots
M: The lecturer is Manuela
S: It is sunny
Bayes Nets: Slide 65Copyright © 2001, Andrew W. Moore
Making a Bayes net
S M
R
L
T
P(s)=0.3
P(M)=0.6
P(RM)=0.3
P(R~M)=0.6
P(TL)=0.3
P(T~L)=0.8
P(LM^S)=0.05
P(LM^~S)=0.1
P(L~M^S)=0.1
P(L~M^~S)=0.2
Step Three: add a probability table for each node.
• The table for node X must list P(X|Parent Values) for each
possible combination of parent values
T: The lecture started by 10:35
L: The lecturer arrives late
R: The lecture concerns robots
M: The lecturer is Manuela
S: It is sunny
Bayes Nets: Slide 66Copyright © 2001, Andrew W. Moore
Making a Bayes net
S M
R
L
T
P(s)=0.3
P(M)=0.6
P(RM)=0.3
P(R~M)=0.6
P(TL)=0.3
P(T~L)=0.8
P(LM^S)=0.05
P(LM^~S)=0.1
P(L~M^S)=0.1
P(L~M^~S)=0.2
• Two unconnected variables may still be correlated
• Each node is conditionally independent of all non-
descendants in the tree, given its parents.
• You can deduce many other conditional independence
relations from a Bayes net. See the next lecture.
T: The lecture started by 10:35
L: The lecturer arrives late
R: The lecture concerns robots
M: The lecturer is Manuela
S: It is sunny
Bayes Nets: Slide 67Copyright © 2001, Andrew W. Moore
Bayes Nets Formalized
A Bayes net (also called a belief network) is an
augmented directed acyclic graph, represented by
the pair V , E where:
• V is a set of vertices.
• E is a set of directed edges joining vertices. No
loops of any length are allowed.
Each vertex in V contains the following information:
• The name of a random variable
• A probability distribution table indicating how the
probability of this variable’s values depends on
all possible combinations of parental values.
Bayes Nets: Slide 68Copyright © 2001, Andrew W. Moore
Building a Bayes Net
1. Choose a set of relevant variables.
2. Choose an ordering for them
3. Assume they’re called X1 .. Xm (where X1 is the
first in the ordering, X1 is the second, etc)
4. For i = 1 to m:
1. Add the Xi node to the network
2. Set Parents(Xi ) to be a minimal subset of
{X1…Xi-1} such that we have conditional
independence of Xi and all other members of
{X1…Xi-1} given Parents(Xi )
3. Define the probability table of
P(Xi =k  Assignments of Parents(Xi ) ).
Bayes Nets: Slide 69Copyright © 2001, Andrew W. Moore
Example Bayes Net Building
Suppose we’re building a nuclear power station.
There are the following random variables:
GRL : Gauge Reads Low.
CTL : Core temperature is low.
FG : Gauge is faulty.
FA : Alarm is faulty
AS : Alarm sounds
• If alarm working properly, the alarm is meant to
sound if the gauge stops reading a low temp.
• If gauge working properly, the gauge is meant to
read the temp of the core.
Bayes Nets: Slide 70Copyright © 2001, Andrew W. Moore
Computing a Joint Entry
How to compute an entry in a joint distribution?
E.G: What is P(S ^ ~M ^ L ~R ^ T)?
S M
R
L
T
P(s)=0.3
P(M)=0.6
P(RM)=0.3
P(R~M)=0.6
P(TL)=0.3
P(T~L)=0.8
P(LM^S)=0.05
P(LM^~S)=0.1
P(L~M^S)=0.1
P(L~M^~S)=0.2
Bayes Nets: Slide 71Copyright © 2001, Andrew W. Moore
Computing with Bayes Net
P(T ^ ~R ^ L ^ ~M ^ S) =
P(T  ~R ^ L ^ ~M ^ S) * P(~R ^ L ^ ~M ^ S) =
P(T  L) * P(~R ^ L ^ ~M ^ S) =
P(T  L) * P(~R  L ^ ~M ^ S) * P(L^~M^S) =
P(T  L) * P(~R  ~M) * P(L^~M^S) =
P(T  L) * P(~R  ~M) * P(L~M^S)*P(~M^S) =
P(T  L) * P(~R  ~M) * P(L~M^S)*P(~M | S)*P(S) =
P(T  L) * P(~R  ~M) * P(L~M^S)*P(~M)*P(S).
S M
R
L
T
P(s)=0.3
P(M)=0.6
P(RM)=0.3
P(R~M)=0.6
P(TL)=0.3
P(T~L)=0.8
P(LM^S)=0.05
P(LM^~S)=0.1
P(L~M^S)=0.1
P(L~M^~S)=0.2
Bayes Nets: Slide 72Copyright © 2001, Andrew W. Moore
The general case
P(X1
=x1 ^ X2=x2 ^ ….Xn-1=xn-1 ^ Xn=xn) =
P(Xn=xn ^ Xn-1=xn-1 ^ ….X2=x2 ^ X1=x1) =
P(Xn=xn  Xn-1=xn-1 ^ ….X2=x2 ^ X1=x1) * P(Xn-1=xn-1 ^…. X2=x2 ^ X1=x1) =
P(Xn=xn  Xn-1=xn-1 ^ ….X2=x2 ^ X1=x1) * P(Xn-1=xn-1 …. X2=x2 ^ X1=x1) *
P(Xn-2=xn-2 ^…. X2=x2 ^ X1=x1) =
:
:
=
       
    







n
i
iii
n
i
iiii
XxXP
xXxXxXP
1
1
1111
ParentsofsAssignment

So any entry in joint pdf table can be computed. And so any
conditional probability can be computed.
Bayes Nets: Slide 73Copyright © 2001, Andrew W. Moore
Where are we now?
• We have a methodology for building Bayes nets.
• We don’t require exponential storage to hold our probability
table. Only exponential in the maximum number of parents
of any node.
• We can compute probabilities of any given assignment of
truth values to the variables. And we can do it in time
linear with the number of nodes.
• So we can also compute answers to any questions.
E.G. What could we do to compute P(R  T,~S)?
S M
R
L
T
P(s)=0.3
P(M)=0.6
P(RM)=0.3
P(R~M)=0.6
P(TL)=0.3
P(T~L)=0.8
P(LM^S)=0.05
P(LM^~S)=0.1
P(L~M^S)=0.1
P(L~M^~S)=0.2
Bayes Nets: Slide 74Copyright © 2001, Andrew W. Moore
Where are we now?
• We have a methodology for building Bayes nets.
• We don’t require exponential storage to hold our probability
table. Only exponential in the maximum number of parents
of any node.
• We can compute probabilities of any given assignment of
truth values to the variables. And we can do it in time
linear with the number of nodes.
• So we can also compute answers to any questions.
E.G. What could we do to compute P(R  T,~S)?
S M
R
L
T
P(s)=0.3
P(M)=0.6
P(RM)=0.3
P(R~M)=0.6
P(TL)=0.3
P(T~L)=0.8
P(LM^S)=0.05
P(LM^~S)=0.1
P(L~M^S)=0.1
P(L~M^~S)=0.2
Step 1: Compute P(R ^ T ^ ~S)
Step 2: Compute P(~R ^ T ^ ~S)
Step 3: Return
P(R ^ T ^ ~S)
-------------------------------------
P(R ^ T ^ ~S)+ P(~R ^ T ^ ~S)
Bayes Nets: Slide 75Copyright © 2001, Andrew W. Moore
Where are we now?
• We have a methodology for building Bayes nets.
• We don’t require exponential storage to hold our probability
table. Only exponential in the maximum number of parents
of any node.
• We can compute probabilities of any given assignment of
truth values to the variables. And we can do it in time
linear with the number of nodes.
• So we can also compute answers to any questions.
E.G. What could we do to compute P(R  T,~S)?
S M
R
L
T
P(s)=0.3
P(M)=0.6
P(RM)=0.3
P(R~M)=0.6
P(TL)=0.3
P(T~L)=0.8
P(LM^S)=0.05
P(LM^~S)=0.1
P(L~M^S)=0.1
P(L~M^~S)=0.2
Step 1: Compute P(R ^ T ^ ~S)
Step 2: Compute P(~R ^ T ^ ~S)
Step 3: Return
P(R ^ T ^ ~S)
-------------------------------------
P(R ^ T ^ ~S)+ P(~R ^ T ^ ~S)
Sum of all the rows in the Joint
that match R ^ T ^ ~S
Sum of all the rows in the Joint
that match ~R ^ T ^ ~S
Bayes Nets: Slide 76Copyright © 2001, Andrew W. Moore
Where are we now?
• We have a methodology for building Bayes nets.
• We don’t require exponential storage to hold our probability
table. Only exponential in the maximum number of parents
of any node.
• We can compute probabilities of any given assignment of
truth values to the variables. And we can do it in time
linear with the number of nodes.
• So we can also compute answers to any questions.
E.G. What could we do to compute P(R  T,~S)?
S M
R
L
T
P(s)=0.3
P(M)=0.6
P(RM)=0.3
P(R~M)=0.6
P(TL)=0.3
P(T~L)=0.8
P(LM^S)=0.05
P(LM^~S)=0.1
P(L~M^S)=0.1
P(L~M^~S)=0.2
Step 1: Compute P(R ^ T ^ ~S)
Step 2: Compute P(~R ^ T ^ ~S)
Step 3: Return
P(R ^ T ^ ~S)
-------------------------------------
P(R ^ T ^ ~S)+ P(~R ^ T ^ ~S)
Sum of all the rows in the Joint
that match R ^ T ^ ~S
Sum of all the rows in the Joint
that match ~R ^ T ^ ~S
Each of these obtained by
the “computing a joint
probability entry” method of
the earlier slides
4 joint computes
4 joint computes
Bayes Nets: Slide 77Copyright © 2001, Andrew W. Moore
The good news
We can do inference. We can compute any
conditional probability:
P( Some variable  Some other variable values )





2
21
matchingentriesjoint
andmatchingentriesjoint
2
21
21
)entryjoint(
)entryjoint(
)(
)(
)|(
E
EE
P
P
EP
EEP
EEP
Bayes Nets: Slide 78Copyright © 2001, Andrew W. Moore
The good news
We can do inference. We can compute any
conditional probability:
P( Some variable  Some other variable values )





2
21
matchingentriesjoint
andmatchingentriesjoint
2
21
21
)entryjoint(
)entryjoint(
)(
)(
)|(
E
EE
P
P
EP
EEP
EEP
Suppose you have m binary-valued variables in your Bayes
Net and expression E2 mentions k variables.
How much work is the above computation?
Bayes Nets: Slide 79Copyright © 2001, Andrew W. Moore
The sad, bad news
Conditional probabilities by enumerating all matching entries
in the joint are expensive:
Exponential in the number of variables.
Bayes Nets: Slide 80Copyright © 2001, Andrew W. Moore
The sad, bad news
Conditional probabilities by enumerating all matching entries
in the joint are expensive:
Exponential in the number of variables.
But perhaps there are faster ways of querying Bayes nets?
• In fact, if I ever ask you to manually do a Bayes Net
inference, you’ll find there are often many tricks to save you
time.
• So we’ve just got to program our computer to do those tricks
too, right?
Bayes Nets: Slide 81Copyright © 2001, Andrew W. Moore
The sad, bad news
Conditional probabilities by enumerating all matching entries
in the joint are expensive:
Exponential in the number of variables.
But perhaps there are faster ways of querying Bayes nets?
• In fact, if I ever ask you to manually do a Bayes Net
inference, you’ll find there are often many tricks to save you
time.
• So we’ve just got to program our computer to do those tricks
too, right?
Sadder and worse news:
General querying of Bayes nets is NP-complete.
Bayes Nets: Slide 82Copyright © 2001, Andrew W. Moore
Bayes nets inference algorithms
A poly-tree is a directed acyclic graph in which no two nodes have more than one
path between them.
A poly tree Not a poly tree
(but still a legal Bayes net)
S
RL
T
L
T
MSM
R
X1
X2
X4
X3
X5
X1 X2
X3
X5
X4
• If net is a poly-tree, there is a linear-time algorithm (see a later
Andrew lecture).
• The best general-case algorithms convert a general net to a poly-
tree (often at huge expense) and calls the poly-tree algorithm.
• Another popular, practical approach (doesn’t assume poly-tree):
Stochastic Simulation.
Bayes Nets: Slide 83Copyright © 2001, Andrew W. Moore
Sampling from the Joint Distribution
It’s pretty easy to generate a set of variable-assignments at random with
the same probability as the underlying joint distribution.
How?
S M
R
L
T
P(s)=0.3
P(M)=0.6
P(RM)=0.3
P(R~M)=0.6
P(TL)=0.3
P(T~L)=0.8
P(LM^S)=0.05
P(LM^~S)=0.1
P(L~M^S)=0.1
P(L~M^~S)=0.2
Bayes Nets: Slide 84Copyright © 2001, Andrew W. Moore
Sampling from the Joint Distribution
1. Randomly choose S. S = True with prob 0.3
2. Randomly choose M. M = True with prob 0.6
3. Randomly choose L. The probability that L is true
depends on the assignments of S and M. E.G. if steps
1 and 2 had produced S=True, M=False, then
probability that L is true is 0.1
4. Randomly choose R. Probability depends on M.
5. Randomly choose T. Probability depends on L
S M
R
L
T
P(s)=0.3
P(M)=0.6
P(RM)=0.3
P(R~M)=0.6
P(TL)=0.3
P(T~L)=0.8
P(LM^S)=0.05
P(LM^~S)=0.1
P(L~M^S)=0.1
P(L~M^~S)=0.2
Bayes Nets: Slide 85Copyright © 2001, Andrew W. Moore
A general sampling algorithm
Let’s generalize the example on the previous slide to a general Bayes Net.
As in Slides 16-17 , call the variables X1 .. Xn, where Parents(Xi) must be a
subset of {X1 .. Xi-1}.
For i=1 to n:
1. Find parents, if any, of Xi. Assume n(i) parents. Call them Xp(i,1), Xp(i,2),
…Xp(i,n(i)).
2. Recall the values that those parents were randomly given: xp(i,1), xp(i,2),
…xp(i,n(i)).
3. Look up in the lookup-table for:
P(Xi=True  Xp(i,1)=xp(i,1),Xp(i,2)=xp(i,2)…Xp(i,n(i))=xp(i,n(i)))
4. Randomly set xi=True according to this probability
x1, x2,…xn are now a sample from the joint distribution of X1, X2,…Xn.
Bayes Nets: Slide 86Copyright © 2001, Andrew W. Moore
Stochastic Simulation Example
Someone wants to know P(R = True  T = True ^ S = False )
We’ll do lots of random samplings and count the number of
occurrences of the following:
• Nc : Num. samples in which T=True and S=False.
• Ns : Num. samples in which R=True, T=True and S=False.
• N : Number of random samplings
Now if N is big enough:
Nc /N is a good estimate of P(T=True and S=False).
Ns /N is a good estimate of P(R=True ,T=True , S=False).
P(RT^~S) = P(R^T^~S)/P(T^~S), so Ns / Nc can be a good
estimate of P(RT^~S).
Bayes Nets: Slide 87Copyright © 2001, Andrew W. Moore
General Stochastic Simulation
Someone wants to know P(E1  E2 )
We’ll do lots of random samplings and count the number of
occurrences of the following:
• Nc : Num. samples in which E2
• Ns : Num. samples in which E1 and E2
• N : Number of random samplings
Now if N is big enough:
Nc /N is a good estimate of P(E2).
Ns /N is a good estimate of P(E1 , E2).
P(E1  E2) = P(E1^ E2)/P(E2), so Ns / Nc can be a good estimate
of P(E1 E2).
Bayes Nets: Slide 88Copyright © 2001, Andrew W. Moore
Likelihood weighting
Problem with Stochastic Sampling:
With lots of constraints in E, or unlikely events in E, then most of the
simulations will be thrown away, (they’ll have no effect on Nc, or Ns).
Imagine we’re part way through our simulation.
In E2 we have the constraint Xi = v
We’re just about to generate a value for Xi at random. Given the values
assigned to the parents, we see that P(Xi = v  parents) = p .
Now we know that with stochastic sampling:
• we’ll generate “Xi = v” proportion p of the time, and proceed.
• And we’ll generate a different value proportion 1-p of the time, and the
simulation will be wasted.
Instead, always generate Xi = v, but weight the answer by weight “p” to
compensate.
Bayes Nets: Slide 89Copyright © 2001, Andrew W. Moore
Likelihood weighting
Set Nc :=0, Ns :=0
1. Generate a random assignment of all variables that
matches E2. This process returns a weight w.
2. Define w to be the probability that this assignment would
have been generated instead of an unmatching
assignment during its generation in the original
algorithm.Fact: w is a product of all likelihood factors
involved in the generation.
3. Nc := Nc + w
4. If our sample matches E1 then Ns := Ns + w
5. Go to 1
Again, Ns / Nc estimates P(E1  E2 )
Bayes Nets: Slide 90Copyright © 2001, Andrew W. Moore
Case Study I
Pathfinder system. (Heckerman 1991, Probabilistic Similarity Networks,
MIT Press, Cambridge MA).
• Diagnostic system for lymph-node diseases.
• 60 diseases and 100 symptoms and test-results.
• 14,000 probabilities
• Expert consulted to make net.
• 8 hours to determine variables.
• 35 hours for net topology.
• 40 hours for probability table values.
• Apparently, the experts found it quite easy to invent the causal links
and probabilities.
• Pathfinder is now outperforming the world experts in diagnosis. Being
extended to several dozen other medical domains.
Bayes Nets: Slide 91Copyright © 2001, Andrew W. Moore
Questions
• What are the strengths of probabilistic networks
compared with propositional logic?
• What are the weaknesses of probabilistic networks
compared with propositional logic?
• What are the strengths of probabilistic networks
compared with predicate logic?
• What are the weaknesses of probabilistic networks
compared with predicate logic?
• (How) could predicate logic and probabilistic
networks be combined?
Bayes Nets: Slide 92Copyright © 2001, Andrew W. Moore
What you should know
• The meanings and importance of independence
and conditional independence.
• The definition of a Bayes net.
• Computing probabilities of assignments of
variables (i.e. members of the joint p.d.f.) with a
Bayes net.
• The slow (exponential) method for computing
arbitrary, conditional probabilities.
• The stochastic simulation method and likelihood
weighting.

Weitere ähnliche Inhalte

Andere mochten auch

Gospel of the apostle john
Gospel of the apostle johnGospel of the apostle john
Gospel of the apostle johnwelingtonjh
 
烏蘭巴托採微
烏蘭巴托採微烏蘭巴托採微
烏蘭巴托採微psjlew
 
Our Professional Photographic Support
Our Professional Photographic SupportOur Professional Photographic Support
Our Professional Photographic Supportprecisionphoto1
 
壺口的瀑布~肖鐵
壺口的瀑布~肖鐵壺口的瀑布~肖鐵
壺口的瀑布~肖鐵psjlew
 
流轉不息的愛
流轉不息的愛流轉不息的愛
流轉不息的愛psjlew
 
世界七大自然奇觀之一:維多利亞大瀑布
世界七大自然奇觀之一:維多利亞大瀑布世界七大自然奇觀之一:維多利亞大瀑布
世界七大自然奇觀之一:維多利亞大瀑布psjlew
 
公益教育的结构
公益教育的结构公益教育的结构
公益教育的结构worldhema
 
美聯社2011年度新聞圖片精選
美聯社2011年度新聞圖片精選美聯社2011年度新聞圖片精選
美聯社2011年度新聞圖片精選psjlew
 
中國兩個第一家庭
中國兩個第一家庭中國兩個第一家庭
中國兩個第一家庭psjlew
 
20 worst drinks 2010
20 worst drinks 201020 worst drinks 2010
20 worst drinks 2010psjlew
 
中文萬歲
中文萬歲中文萬歲
中文萬歲psjlew
 
六種眼疾原因
六種眼疾原因六種眼疾原因
六種眼疾原因psjlew
 
台灣退休小鎮
台灣退休小鎮台灣退休小鎮
台灣退休小鎮psjlew
 
Mainland's chairman - 习近平
Mainland's chairman - 习近平Mainland's chairman - 习近平
Mainland's chairman - 习近平psjlew
 
蔣宋美齡家族照片
蔣宋美齡家族照片蔣宋美齡家族照片
蔣宋美齡家族照片psjlew
 
robinhoodworkpack1
 robinhoodworkpack1 robinhoodworkpack1
robinhoodworkpack1GCBA
 
Chas Everitt South African property marketing Solutions
Chas Everitt South African property marketing Solutions Chas Everitt South African property marketing Solutions
Chas Everitt South African property marketing Solutions Barry Davies
 

Andere mochten auch (20)

Urinary system
Urinary system Urinary system
Urinary system
 
Gospel of the apostle john
Gospel of the apostle johnGospel of the apostle john
Gospel of the apostle john
 
烏蘭巴托採微
烏蘭巴托採微烏蘭巴托採微
烏蘭巴托採微
 
Our Professional Photographic Support
Our Professional Photographic SupportOur Professional Photographic Support
Our Professional Photographic Support
 
壺口的瀑布~肖鐵
壺口的瀑布~肖鐵壺口的瀑布~肖鐵
壺口的瀑布~肖鐵
 
流轉不息的愛
流轉不息的愛流轉不息的愛
流轉不息的愛
 
世界七大自然奇觀之一:維多利亞大瀑布
世界七大自然奇觀之一:維多利亞大瀑布世界七大自然奇觀之一:維多利亞大瀑布
世界七大自然奇觀之一:維多利亞大瀑布
 
公益教育的结构
公益教育的结构公益教育的结构
公益教育的结构
 
鉛筆的啟示
鉛筆的啟示鉛筆的啟示
鉛筆的啟示
 
Untitled Presentation
Untitled PresentationUntitled Presentation
Untitled Presentation
 
美聯社2011年度新聞圖片精選
美聯社2011年度新聞圖片精選美聯社2011年度新聞圖片精選
美聯社2011年度新聞圖片精選
 
中國兩個第一家庭
中國兩個第一家庭中國兩個第一家庭
中國兩個第一家庭
 
20 worst drinks 2010
20 worst drinks 201020 worst drinks 2010
20 worst drinks 2010
 
中文萬歲
中文萬歲中文萬歲
中文萬歲
 
六種眼疾原因
六種眼疾原因六種眼疾原因
六種眼疾原因
 
台灣退休小鎮
台灣退休小鎮台灣退休小鎮
台灣退休小鎮
 
Mainland's chairman - 习近平
Mainland's chairman - 习近平Mainland's chairman - 习近平
Mainland's chairman - 习近平
 
蔣宋美齡家族照片
蔣宋美齡家族照片蔣宋美齡家族照片
蔣宋美齡家族照片
 
robinhoodworkpack1
 robinhoodworkpack1 robinhoodworkpack1
robinhoodworkpack1
 
Chas Everitt South African property marketing Solutions
Chas Everitt South African property marketing Solutions Chas Everitt South African property marketing Solutions
Chas Everitt South African property marketing Solutions
 

Ähnlich wie 2013-1 Machine Learning Lecture 03 - Andrew Moore - bayes nets for represe…

Prob18
Prob18Prob18
Prob18okeee
 
Statistics in Astronomy
Statistics in AstronomyStatistics in Astronomy
Statistics in AstronomyPeter Coles
 
Artificial Intelligence Notes Unit 3
Artificial Intelligence Notes Unit 3Artificial Intelligence Notes Unit 3
Artificial Intelligence Notes Unit 3DigiGurukul
 
Probability for Data Miners
Probability for Data MinersProbability for Data Miners
Probability for Data Minersguestfee8698
 
2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector Machines
2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector Machines2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector Machines
2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector MachinesDongseo University
 
Bayesian Networks Model in Step By Steps
Bayesian Networks Model in Step By StepsBayesian Networks Model in Step By Steps
Bayesian Networks Model in Step By StepsFerry Wahyu Wibowo
 
ch15BayesNet.ppt
ch15BayesNet.pptch15BayesNet.ppt
ch15BayesNet.pptLallHussain
 
Introduction to Bayesian Statistics.ppt
Introduction to Bayesian Statistics.pptIntroduction to Bayesian Statistics.ppt
Introduction to Bayesian Statistics.pptLong Dang
 

Ähnlich wie 2013-1 Machine Learning Lecture 03 - Andrew Moore - bayes nets for represe… (16)

Prob18
Prob18Prob18
Prob18
 
Statistics in Astronomy
Statistics in AstronomyStatistics in Astronomy
Statistics in Astronomy
 
Artificial Intelligence Notes Unit 3
Artificial Intelligence Notes Unit 3Artificial Intelligence Notes Unit 3
Artificial Intelligence Notes Unit 3
 
Uncertainity
Uncertainity Uncertainity
Uncertainity
 
Probability for Data Miners
Probability for Data MinersProbability for Data Miners
Probability for Data Miners
 
2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector Machines
2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector Machines2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector Machines
2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector Machines
 
artficial intelligence
artficial intelligenceartficial intelligence
artficial intelligence
 
Bayesian Networks Model in Step By Steps
Bayesian Networks Model in Step By StepsBayesian Networks Model in Step By Steps
Bayesian Networks Model in Step By Steps
 
ch15BayesNet.ppt
ch15BayesNet.pptch15BayesNet.ppt
ch15BayesNet.ppt
 
svm-jain.ppt
svm-jain.pptsvm-jain.ppt
svm-jain.ppt
 
Warmup_New.pdf
Warmup_New.pdfWarmup_New.pdf
Warmup_New.pdf
 
unit 3 -ML.pptx
unit 3 -ML.pptxunit 3 -ML.pptx
unit 3 -ML.pptx
 
Phylogenetics2
Phylogenetics2Phylogenetics2
Phylogenetics2
 
Introduction to Bayesian Statistics.ppt
Introduction to Bayesian Statistics.pptIntroduction to Bayesian Statistics.ppt
Introduction to Bayesian Statistics.ppt
 
bayesjaw.ppt
bayesjaw.pptbayesjaw.ppt
bayesjaw.ppt
 
Lec13_Bayes.pptx
Lec13_Bayes.pptxLec13_Bayes.pptx
Lec13_Bayes.pptx
 

Mehr von Dongseo University

Lecture_NaturalPolicyGradientsTRPOPPO.pdf
Lecture_NaturalPolicyGradientsTRPOPPO.pdfLecture_NaturalPolicyGradientsTRPOPPO.pdf
Lecture_NaturalPolicyGradientsTRPOPPO.pdfDongseo University
 
Evolutionary Computation Lecture notes03
Evolutionary Computation Lecture notes03Evolutionary Computation Lecture notes03
Evolutionary Computation Lecture notes03Dongseo University
 
Evolutionary Computation Lecture notes02
Evolutionary Computation Lecture notes02Evolutionary Computation Lecture notes02
Evolutionary Computation Lecture notes02Dongseo University
 
Evolutionary Computation Lecture notes01
Evolutionary Computation Lecture notes01Evolutionary Computation Lecture notes01
Evolutionary Computation Lecture notes01Dongseo University
 
Average Linear Selection Algorithm
Average Linear Selection AlgorithmAverage Linear Selection Algorithm
Average Linear Selection AlgorithmDongseo University
 
Lower Bound of Comparison Sort
Lower Bound of Comparison SortLower Bound of Comparison Sort
Lower Bound of Comparison SortDongseo University
 
Running Time of Building Binary Heap using Array
Running Time of Building Binary Heap using ArrayRunning Time of Building Binary Heap using Array
Running Time of Building Binary Heap using ArrayDongseo University
 
Proof By Math Induction Example
Proof By Math Induction ExampleProof By Math Induction Example
Proof By Math Induction ExampleDongseo University
 
Estimating probability distributions
Estimating probability distributionsEstimating probability distributions
Estimating probability distributionsDongseo University
 
2018-2 Machine Learning (Wasserstein GAN and BEGAN)
2018-2 Machine Learning (Wasserstein GAN and BEGAN)2018-2 Machine Learning (Wasserstein GAN and BEGAN)
2018-2 Machine Learning (Wasserstein GAN and BEGAN)Dongseo University
 
2018-2 Machine Learning (Linear regression, Logistic regression)
2018-2 Machine Learning (Linear regression, Logistic regression)2018-2 Machine Learning (Linear regression, Logistic regression)
2018-2 Machine Learning (Linear regression, Logistic regression)Dongseo University
 
2017-2 ML W9 Reinforcement Learning #5
2017-2 ML W9 Reinforcement Learning #52017-2 ML W9 Reinforcement Learning #5
2017-2 ML W9 Reinforcement Learning #5Dongseo University
 

Mehr von Dongseo University (20)

Lecture_NaturalPolicyGradientsTRPOPPO.pdf
Lecture_NaturalPolicyGradientsTRPOPPO.pdfLecture_NaturalPolicyGradientsTRPOPPO.pdf
Lecture_NaturalPolicyGradientsTRPOPPO.pdf
 
Evolutionary Computation Lecture notes03
Evolutionary Computation Lecture notes03Evolutionary Computation Lecture notes03
Evolutionary Computation Lecture notes03
 
Evolutionary Computation Lecture notes02
Evolutionary Computation Lecture notes02Evolutionary Computation Lecture notes02
Evolutionary Computation Lecture notes02
 
Evolutionary Computation Lecture notes01
Evolutionary Computation Lecture notes01Evolutionary Computation Lecture notes01
Evolutionary Computation Lecture notes01
 
Markov Chain Monte Carlo
Markov Chain Monte CarloMarkov Chain Monte Carlo
Markov Chain Monte Carlo
 
Simplex Lecture Notes
Simplex Lecture NotesSimplex Lecture Notes
Simplex Lecture Notes
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Median of Medians
Median of MediansMedian of Medians
Median of Medians
 
Average Linear Selection Algorithm
Average Linear Selection AlgorithmAverage Linear Selection Algorithm
Average Linear Selection Algorithm
 
Lower Bound of Comparison Sort
Lower Bound of Comparison SortLower Bound of Comparison Sort
Lower Bound of Comparison Sort
 
Running Time of Building Binary Heap using Array
Running Time of Building Binary Heap using ArrayRunning Time of Building Binary Heap using Array
Running Time of Building Binary Heap using Array
 
Running Time of MergeSort
Running Time of MergeSortRunning Time of MergeSort
Running Time of MergeSort
 
Binary Trees
Binary TreesBinary Trees
Binary Trees
 
Proof By Math Induction Example
Proof By Math Induction ExampleProof By Math Induction Example
Proof By Math Induction Example
 
TRPO and PPO notes
TRPO and PPO notesTRPO and PPO notes
TRPO and PPO notes
 
Estimating probability distributions
Estimating probability distributionsEstimating probability distributions
Estimating probability distributions
 
2018-2 Machine Learning (Wasserstein GAN and BEGAN)
2018-2 Machine Learning (Wasserstein GAN and BEGAN)2018-2 Machine Learning (Wasserstein GAN and BEGAN)
2018-2 Machine Learning (Wasserstein GAN and BEGAN)
 
2018-2 Machine Learning (Linear regression, Logistic regression)
2018-2 Machine Learning (Linear regression, Logistic regression)2018-2 Machine Learning (Linear regression, Logistic regression)
2018-2 Machine Learning (Linear regression, Logistic regression)
 
2017-2 ML W11 GAN #1
2017-2 ML W11 GAN #12017-2 ML W11 GAN #1
2017-2 ML W11 GAN #1
 
2017-2 ML W9 Reinforcement Learning #5
2017-2 ML W9 Reinforcement Learning #52017-2 ML W9 Reinforcement Learning #5
2017-2 ML W9 Reinforcement Learning #5
 

Kürzlich hochgeladen

AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxMaryGraceBautista27
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
Q4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxQ4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxnelietumpap1
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 

Kürzlich hochgeladen (20)

AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptx
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
Q4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxQ4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptx
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 

2013-1 Machine Learning Lecture 03 - Andrew Moore - bayes nets for represe…

  • 1. Oct 15th, 2001Copyright © 2001, Andrew W. Moore Bayes Nets for representing and reasoning about uncertainty Andrew W. Moore Associate Professor School of Computer Science Carnegie Mellon University www.cs.cmu.edu/~awm awm@cs.cmu.edu 412-268-7599 Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. PowerPoint originals are available. If you make use of a significant portion of these slides in your own lecture, please include this message, or the following link to the source repository of Andrew’s tutorials: http://www.cs.cmu.edu/~awm/tutorials . Comments and corrections gratefully received.
  • 2. Bayes Nets: Slide 2Copyright © 2001, Andrew W. Moore What we’ll discuss • Recall the numerous and dramatic benefits of Joint Distributions for describing uncertain worlds • Reel with terror at the problem with using Joint Distributions • Discover how Bayes Net methodology allows us to built Joint Distributions in manageable chunks • Discover there’s still a lurking problem… • …Start to solve that problem
  • 3. Bayes Nets: Slide 3Copyright © 2001, Andrew W. Moore Why this matters • In Andrew’s opinion, the most important technology in the Machine Learning / AI field to have emerged in the last 10 years. • A clean, clear, manageable language and methodology for expressing what you’re certain and uncertain about • Already, many practical applications in medicine, factories, helpdesks: P(this problem | these symptoms) anomalousness of this observation choosing next diagnostic test | these observations
  • 4. Bayes Nets: Slide 4Copyright © 2001, Andrew W. Moore Why this matters • In Andrew’s opinion, the most important technology in the Machine Learning / AI field to have emerged in the last 10 years. • A clean, clear, manageable language and methodology for expressing what you’re certain and uncertain about • Already, many practical applications in medicine, factories, helpdesks: P(this problem | these symptoms) anomalousness of this observation choosing next diagnostic test | these observations Anomaly Detection Inference Active Data Collection
  • 5. Bayes Nets: Slide 5Copyright © 2001, Andrew W. Moore Ways to deal with Uncertainty • Three-valued logic: True / False / Maybe • Fuzzy logic (truth values between 0 and 1) • Non-monotonic reasoning (especially focused on Penguin informatics) • Dempster-Shafer theory (and an extension known as quasi-Bayesian theory) • Possibabilistic Logic • Probability
  • 6. Bayes Nets: Slide 6Copyright © 2001, Andrew W. Moore Discrete Random Variables • A is a Boolean-valued random variable if A denotes an event, and there is some degree of uncertainty as to whether A occurs. • Examples • A = The US president in 2023 will be male • A = You wake up tomorrow with a headache • A = You have Ebola
  • 7. Bayes Nets: Slide 7Copyright © 2001, Andrew W. Moore Probabilities • We write P(A) as “the fraction of possible worlds in which A is true” • We could at this point spend 2 hours on the philosophy of this. • But we won’t.
  • 8. Bayes Nets: Slide 8Copyright © 2001, Andrew W. Moore Visualizing A Event space of all possible worlds Its area is 1 Worlds in which A is False Worlds in which A is true P(A) = Area of reddish oval
  • 9. Bayes Nets: Slide 9Copyright © 2001, Andrew W. Moore Interpreting the axioms • 0 <= P(A) <= 1 • P(True) = 1 • P(False) = 0 • P(A or B) = P(A) + P(B) - P(A and B) The area of A can’t get any smaller than 0 And a zero area would mean no world could ever have A true
  • 10. Bayes Nets: Slide 10Copyright © 2001, Andrew W. Moore Interpreting the axioms • 0 <= P(A) <= 1 • P(True) = 1 • P(False) = 0 • P(A or B) = P(A) + P(B) - P(A and B) The area of A can’t get any bigger than 1 And an area of 1 would mean all worlds will have A true
  • 11. Bayes Nets: Slide 11Copyright © 2001, Andrew W. Moore Interpreting the axioms • 0 <= P(A) <= 1 • P(True) = 1 • P(False) = 0 • P(A or B) = P(A) + P(B) - P(A and B) A B
  • 12. Bayes Nets: Slide 12Copyright © 2001, Andrew W. Moore Interpreting the axioms • 0 <= P(A) <= 1 • P(True) = 1 • P(False) = 0 • P(A or B) = P(A) + P(B) - P(A and B) A B P(A or B) BP(A and B) Simple addition and subtraction
  • 13. Bayes Nets: Slide 13Copyright © 2001, Andrew W. Moore These Axioms are Not to be Trifled With • There have been attempts to do different methodologies for uncertainty • Fuzzy Logic • Three-valued logic • Dempster-Shafer • Non-monotonic reasoning • But the axioms of probability are the only system with this property: If you gamble using them you can’t be unfairly exploited by an opponent using some other system [di Finetti 1931]
  • 14. Bayes Nets: Slide 14Copyright © 2001, Andrew W. Moore Theorems from the Axioms • 0 <= P(A) <= 1, P(True) = 1, P(False) = 0 • P(A or B) = P(A) + P(B) - P(A and B) From these we can prove: P(not A) = P(~A) = 1-P(A) • How?
  • 15. Bayes Nets: Slide 15Copyright © 2001, Andrew W. Moore Side Note • I am inflicting these proofs on you for two reasons: 1. These kind of manipulations will need to be second nature to you if you use probabilistic analytics in depth 2. Suffering is good for you
  • 16. Bayes Nets: Slide 16Copyright © 2001, Andrew W. Moore Another important theorem • 0 <= P(A) <= 1, P(True) = 1, P(False) = 0 • P(A or B) = P(A) + P(B) - P(A and B) From these we can prove: P(A) = P(A ^ B) + P(A ^ ~B) • How?
  • 17. Bayes Nets: Slide 17Copyright © 2001, Andrew W. Moore Conditional Probability • P(A|B) = Fraction of worlds in which B is true that also have A true F H H = “Have a headache” F = “Coming down with Flu” P(H) = 1/10 P(F) = 1/40 P(H|F) = 1/2 “Headaches are rare and flu is rarer, but if you’re coming down with ‘flu there’s a 50- 50 chance you’ll have a headache.”
  • 18. Bayes Nets: Slide 18Copyright © 2001, Andrew W. Moore Conditional Probability F H H = “Have a headache” F = “Coming down with Flu” P(H) = 1/10 P(F) = 1/40 P(H|F) = 1/2 P(H|F) = Fraction of flu-inflicted worlds in which you have a headache = #worlds with flu and headache ------------------------------------ #worlds with flu = Area of “H and F” region ------------------------------ Area of “F” region = P(H ^ F) ----------- P(F)
  • 19. Bayes Nets: Slide 19Copyright © 2001, Andrew W. Moore Definition of Conditional Probability P(A ^ B) P(A|B) = ----------- P(B) Corollary: The Chain Rule P(A ^ B) = P(A|B) P(B)
  • 20. Bayes Nets: Slide 20Copyright © 2001, Andrew W. Moore Bayes Rule P(A ^ B) P(A|B) P(B) P(B|A) = ----------- = --------------- P(A) P(A) This is Bayes Rule Bayes, Thomas (1763) An essay towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53:370- 418
  • 21. Bayes Nets: Slide 21Copyright © 2001, Andrew W. Moore Using Bayes Rule to Gamble The “Win” envelope has a dollar and four beads in it $1.00 The “Lose” envelope has three beads and no money Trivial question: someone draws an envelope at random and offers to sell it to you. How much should you pay? R R B B R B B
  • 22. Bayes Nets: Slide 22Copyright © 2001, Andrew W. Moore Using Bayes Rule to Gamble The “Win” envelope has a dollar and four beads in it $1.00 The “Lose” envelope has three beads and no money Interesting question: before deciding, you are allowed to see one bead drawn from the envelope. Suppose it’s black: How much should you pay? Suppose it’s red: How much should you pay?
  • 23. Bayes Nets: Slide 23Copyright © 2001, Andrew W. Moore Calculation… $1.00
  • 24. Bayes Nets: Slide 24Copyright © 2001, Andrew W. Moore Multivalued Random Variables • Suppose A can take on more than 2 values • A is a random variable with arity k if it can take on exactly one value out of {v1,v2, .. vk} • Thus… jivAvAP ji  if0)( 1)( 21  kvAvAvAP
  • 25. Bayes Nets: Slide 25Copyright © 2001, Andrew W. Moore An easy fact about Multivalued Random Variables: • Using the axioms of probability… 0 <= P(A) <= 1, P(True) = 1, P(False) = 0 P(A or B) = P(A) + P(B) - P(A and B) • And assuming that A obeys… • It’s easy to prove that jivAvAP ji  if0)( 1)( 21  kvAvAvAP )()( 1 21   i j ji vAPvAvAvAP
  • 26. Bayes Nets: Slide 26Copyright © 2001, Andrew W. Moore An easy fact about Multivalued Random Variables: • Using the axioms of probability… 0 <= P(A) <= 1, P(True) = 1, P(False) = 0 P(A or B) = P(A) + P(B) - P(A and B) • And assuming that A obeys… • It’s easy to prove that jivAvAP ji  if0)( 1)( 21  kvAvAvAP )()( 1 21   i j ji vAPvAvAvAP • And thus we can prove 1)( 1  k j jvAP
  • 27. Bayes Nets: Slide 27Copyright © 2001, Andrew W. Moore Another fact about Multivalued Random Variables: • Using the axioms of probability… 0 <= P(A) <= 1, P(True) = 1, P(False) = 0 P(A or B) = P(A) + P(B) - P(A and B) • And assuming that A obeys… • It’s easy to prove that jivAvAP ji  if0)( 1)( 21  kvAvAvAP )(])[( 1 21   i j ji vABPvAvAvABP
  • 28. Bayes Nets: Slide 28Copyright © 2001, Andrew W. Moore Another fact about Multivalued Random Variables: • Using the axioms of probability… 0 <= P(A) <= 1, P(True) = 1, P(False) = 0 P(A or B) = P(A) + P(B) - P(A and B) • And assuming that A obeys… • It’s easy to prove that jivAvAP ji  if0)( 1)( 21  kvAvAvAP )(])[( 1 21   i j ji vABPvAvAvABP • And thus we can prove )()( 1   k j jvABPBP
  • 29. Bayes Nets: Slide 29Copyright © 2001, Andrew W. Moore More General Forms of Bayes Rule )(~)|~()()|( )()|( )|( APABPAPABP APABP BAP   )( )()|( )|( XBP XAPXABP XBAP   
  • 30. Bayes Nets: Slide 30Copyright © 2001, Andrew W. Moore More General Forms of Bayes Rule     An k kk ii i vAPvABP vAPvABP BvAP 1 )()|( )()|( )|(
  • 31. Bayes Nets: Slide 31Copyright © 2001, Andrew W. Moore Useful Easy-to-prove facts 1)|()|(  BAPBAP 1)|( 1  An k k BvAP
  • 32. Bayes Nets: Slide 32Copyright © 2001, Andrew W. Moore The Joint Distribution Recipe for making a joint distribution of M variables: Example: Boolean variables A, B, C
  • 33. Bayes Nets: Slide 33Copyright © 2001, Andrew W. Moore The Joint Distribution Recipe for making a joint distribution of M variables: 1. Make a truth table listing all combinations of values of your variables (if there are M Boolean variables then the table will have 2M rows). Example: Boolean variables A, B, C A B C 0 0 0 0 0 1 0 1 0 0 1 1 1 0 0 1 0 1 1 1 0 1 1 1
  • 34. Bayes Nets: Slide 34Copyright © 2001, Andrew W. Moore The Joint Distribution Recipe for making a joint distribution of M variables: 1. Make a truth table listing all combinations of values of your variables (if there are M Boolean variables then the table will have 2M rows). 2. For each combination of values, say how probable it is. Example: Boolean variables A, B, C A B C Prob 0 0 0 0.30 0 0 1 0.05 0 1 0 0.10 0 1 1 0.05 1 0 0 0.05 1 0 1 0.10 1 1 0 0.25 1 1 1 0.10
  • 35. Bayes Nets: Slide 35Copyright © 2001, Andrew W. Moore The Joint Distribution Recipe for making a joint distribution of M variables: 1. Make a truth table listing all combinations of values of your variables (if there are M Boolean variables then the table will have 2M rows). 2. For each combination of values, say how probable it is. 3. If you subscribe to the axioms of probability, those numbers must sum to 1. Example: Boolean variables A, B, C A B C Prob 0 0 0 0.30 0 0 1 0.05 0 1 0 0.10 0 1 1 0.05 1 0 0 0.05 1 0 1 0.10 1 1 0 0.25 1 1 1 0.10 A B C0.05 0.25 0.10 0.050.05 0.10 0.10 0.30
  • 36. Bayes Nets: Slide 36Copyright © 2001, Andrew W. Moore Using the Joint Once you have the JD you can ask for the probability of any logical expression involving your attribute  E PEP matchingrows )row()(
  • 37. Bayes Nets: Slide 37Copyright © 2001, Andrew W. Moore Using the Joint P(Poor Male) = 0.4654  E PEP matchingrows )row()(
  • 38. Bayes Nets: Slide 38Copyright © 2001, Andrew W. Moore Using the Joint P(Poor) = 0.7604  E PEP matchingrows )row()(
  • 39. Bayes Nets: Slide 39Copyright © 2001, Andrew W. Moore Inference with the Joint      2 21 matchingrows andmatchingrows 2 21 21 )row( )row( )( )( )|( E EE P P EP EEP EEP
  • 40. Bayes Nets: Slide 40Copyright © 2001, Andrew W. Moore Inference with the Joint      2 21 matchingrows andmatchingrows 2 21 21 )row( )row( )( )( )|( E EE P P EP EEP EEP P(Male | Poor) = 0.4654 / 0.7604 = 0.612
  • 41. Bayes Nets: Slide 41Copyright © 2001, Andrew W. Moore Joint distributions • Good news Once you have a joint distribution, you can ask important questions about stuff that involves a lot of uncertainty • Bad news Impossible to create for more than about ten attributes because there are so many numbers needed when you build the damn thing.
  • 42. Bayes Nets: Slide 42Copyright © 2001, Andrew W. Moore Using fewer numbers Suppose there are two events: • M: Manuela teaches the class (otherwise it’s Andrew) • S: It is sunny The joint p.d.f. for these events contain four entries. If we want to build the joint p.d.f. we’ll have to invent those four numbers. OR WILL WE?? • We don’t have to specify with bottom level conjunctive events such as P(~M^S) IF… • …instead it may sometimes be more convenient for us to specify things like: P(M), P(S). But just P(M) and P(S) don’t derive the joint distribution. So you can’t answer all questions.
  • 43. Bayes Nets: Slide 43Copyright © 2001, Andrew W. Moore Using fewer numbers Suppose there are two events: • M: Manuela teaches the class (otherwise it’s Andrew) • S: It is sunny The joint p.d.f. for these events contain four entries. If we want to build the joint p.d.f. we’ll have to invent those four numbers. OR WILL WE?? • We don’t have to specify with bottom level conjunctive events such as P(~M^S) IF… • …instead it may sometimes be more convenient for us to specify things like: P(M), P(S). But just P(M) and P(S) don’t derive the joint distribution. So you can’t answer all questions.
  • 44. Bayes Nets: Slide 44Copyright © 2001, Andrew W. Moore Independence “The sunshine levels do not depend on and do not influence who is teaching.” This can be specified very simply: P(S  M) = P(S) This is a powerful statement! It required extra domain knowledge. A different kind of knowledge than numerical probabilities. It needed an understanding of causation.
  • 45. Bayes Nets: Slide 45Copyright © 2001, Andrew W. Moore Independence From P(S  M) = P(S), the rules of probability imply: (can you prove these?) • P(~S  M) = P(~S) • P(M  S) = P(M) • P(M ^ S) = P(M) P(S) • P(~M ^ S) = P(~M) P(S), (PM^~S) = P(M)P(~S), P(~M^~S) = P(~M)P(~S)
  • 46. Bayes Nets: Slide 46Copyright © 2001, Andrew W. Moore Independence From P(S  M) = P(S), the rules of probability imply: (can you prove these?) • P(~S  M) = P(~S) • P(M  S) = P(M) • P(M ^ S) = P(M) P(S) • P(~M ^ S) = P(~M) P(S), (PM^~S) = P(M)P(~S), P(~M^~S) = P(~M)P(~S) And in general: P(M=u ^ S=v) = P(M=u) P(S=v) for each of the four combinations of u=True/False v=True/False
  • 47. Bayes Nets: Slide 47Copyright © 2001, Andrew W. Moore Independence We’ve stated: P(M) = 0.6 P(S) = 0.3 P(S  M) = P(S) M S Prob T T T F F T F F And since we now have the joint pdf, we can make any queries we like. From these statements, we can derive the full joint pdf.
  • 48. Bayes Nets: Slide 48Copyright © 2001, Andrew W. Moore A more interesting case • M : Manuela teaches the class • S : It is sunny • L : The lecturer arrives slightly late. Assume both lecturers are sometimes delayed by bad weather. Andrew is more likely to arrive late than Manuela.
  • 49. Bayes Nets: Slide 49Copyright © 2001, Andrew W. Moore A more interesting case • M : Manuela teaches the class • S : It is sunny • L : The lecturer arrives slightly late. Assume both lecturers are sometimes delayed by bad weather. Andrew is more likely to arrive late than Manuela. Let’s begin with writing down knowledge we’re happy about: P(S  M) = P(S), P(S) = 0.3, P(M) = 0.6 Lateness is not independent of the weather and is not independent of the lecturer.
  • 50. Bayes Nets: Slide 50Copyright © 2001, Andrew W. Moore A more interesting case • M : Manuela teaches the class • S : It is sunny • L : The lecturer arrives slightly late. Assume both lecturers are sometimes delayed by bad weather. Andrew is more likely to arrive late than Manuela. Let’s begin with writing down knowledge we’re happy about: P(S  M) = P(S), P(S) = 0.3, P(M) = 0.6 Lateness is not independent of the weather and is not independent of the lecturer. We already know the Joint of S and M, so all we need now is P(L  S=u, M=v) in the 4 cases of u/v = True/False.
  • 51. Bayes Nets: Slide 51Copyright © 2001, Andrew W. Moore A more interesting case • M : Manuela teaches the class • S : It is sunny • L : The lecturer arrives slightly late. Assume both lecturers are sometimes delayed by bad weather. Andrew is more likely to arrive late than Manuela. P(S  M) = P(S) P(S) = 0.3 P(M) = 0.6 P(L  M ^ S) = 0.05 P(L  M ^ ~S) = 0.1 P(L  ~M ^ S) = 0.1 P(L  ~M ^ ~S) = 0.2 Now we can derive a full joint p.d.f. with a “mere” six numbers instead of seven* *Savings are larger for larger numbers of variables.
  • 52. Bayes Nets: Slide 52Copyright © 2001, Andrew W. Moore A more interesting case • M : Manuela teaches the class • S : It is sunny • L : The lecturer arrives slightly late. Assume both lecturers are sometimes delayed by bad weather. Andrew is more likely to arrive late than Manuela. P(S  M) = P(S) P(S) = 0.3 P(M) = 0.6 P(L  M ^ S) = 0.05 P(L  M ^ ~S) = 0.1 P(L  ~M ^ S) = 0.1 P(L  ~M ^ ~S) = 0.2 Question: Express P(L=x ^ M=y ^ S=z) in terms that only need the above expressions, where x,y and z may each be True or False.
  • 53. Bayes Nets: Slide 53Copyright © 2001, Andrew W. Moore A bit of notation P(S  M) = P(S) P(S) = 0.3 P(M) = 0.6 P(L  M ^ S) = 0.05 P(L  M ^ ~S) = 0.1 P(L  ~M ^ S) = 0.1 P(L  ~M ^ ~S) = 0.2 S M L P(s)=0.3 P(M)=0.6 P(LM^S)=0.05 P(LM^~S)=0.1 P(L~M^S)=0.1 P(L~M^~S)=0.2
  • 54. Bayes Nets: Slide 54Copyright © 2001, Andrew W. Moore A bit of notation P(S  M) = P(S) P(S) = 0.3 P(M) = 0.6 P(L  M ^ S) = 0.05 P(L  M ^ ~S) = 0.1 P(L  ~M ^ S) = 0.1 P(L  ~M ^ ~S) = 0.2 S M L P(s)=0.3 P(M)=0.6 P(LM^S)=0.05 P(LM^~S)=0.1 P(L~M^S)=0.1 P(L~M^~S)=0.2 Read the absence of an arrow between S and M to mean “it would not help me predict M if I knew the value of S” Read the two arrows into L to mean that if I want to know the value of L it may help me to know M and to know S. Thiskindofstuffwillbe thoroughlyformalizedlater
  • 55. Bayes Nets: Slide 55Copyright © 2001, Andrew W. Moore An even cuter trick Suppose we have these three events: • M : Lecture taught by Manuela • L : Lecturer arrives late • R : Lecture concerns robots Suppose: • Andrew has a higher chance of being late than Manuela. • Andrew has a higher chance of giving robotics lectures. What kind of independence can we find? How about: • P(L  M) = P(L) ? • P(R  M) = P(R) ? • P(L  R) = P(L) ?
  • 56. Bayes Nets: Slide 56Copyright © 2001, Andrew W. Moore Conditional independence Once you know who the lecturer is, then whether they arrive late doesn’t affect whether the lecture concerns robots. P(R  M,L) = P(R  M) and P(R  ~M,L) = P(R  ~M) We express this in the following way: “R and L are conditionally independent given M” M L R Given knowledge of M, knowing anything else in the diagram won’t help us with L, etc. ..which is also notated by the following diagram.
  • 57. Bayes Nets: Slide 57Copyright © 2001, Andrew W. Moore Conditional Independence formalized R and L are conditionally independent given M if for all x,y,z in {T,F}: P(R=x  M=y ^ L=z) = P(R=x  M=y) More generally: Let S1 and S2 and S3 be sets of variables. Set-of-variables S1 and set-of-variables S2 are conditionally independent given S3 if for all assignments of values to the variables in the sets, P(S1’s assignments  S2’s assignments & S3’s assignments)= P(S1’s assignments  S3’s assignments)
  • 58. Bayes Nets: Slide 58Copyright © 2001, Andrew W. Moore Example: R and L are conditionally independent given M if for all x,y,z in {T,F}: P(R=x  M=y ^ L=z) = P(R=x  M=y) More generally: Let S1 and S2 and S3 be sets of variables. Set-of-variables S1 and set-of-variables S2 are conditionally independent given S3 if for all assignments of values to the variables in the sets, P(S1’s assignments  S2’s assignments & S3’s assignments)= P(S1’s assignments  S3’s assignments) “Shoe-size is conditionally independent of Glove-size given height weight and age” means forall s,g,h,w,a P(ShoeSize=s|Height=h,Weight=w,Age=a) = P(ShoeSize=s|Height=h,Weight=w,Age=a,GloveSize=g)
  • 59. Bayes Nets: Slide 59Copyright © 2001, Andrew W. Moore Example: R and L are conditionally independent given M if for all x,y,z in {T,F}: P(R=x  M=y ^ L=z) = P(R=x  M=y) More generally: Let S1 and S2 and S3 be sets of variables. Set-of-variables S1 and set-of-variables S2 are conditionally independent given S3 if for all assignments of values to the variables in the sets, P(S1’s assignments  S2’s assignments & S3’s assignments)= P(S1’s assignments  S3’s assignments) “Shoe-size is conditionally independent of Glove-size given height weight and age” does not mean forall s,g,h P(ShoeSize=s|Height=h) = P(ShoeSize=s|Height=h, GloveSize=g)
  • 60. Bayes Nets: Slide 60Copyright © 2001, Andrew W. Moore Conditional independence M L R We can write down P(M). And then, since we know L is only directly influenced by M, we can write down the values of P(LM) and P(L~M) and know we’ve fully specified L’s behavior. Ditto for R. P(M) = 0.6 P(L  M) = 0.085 P(L  ~M) = 0.17 P(R  M) = 0.3 P(R  ~M) = 0.6 ‘R and L conditionally independent given M’
  • 61. Bayes Nets: Slide 61Copyright © 2001, Andrew W. Moore Conditional independence M L R P(M) = 0.6 P(L  M) = 0.085 P(L  ~M) = 0.17 P(R  M) = 0.3 P(R  ~M) = 0.6 Conditional Independence: P(RM,L) = P(RM), P(R~M,L) = P(R~M) Again, we can obtain any member of the Joint prob dist that we desire: P(L=x ^ R=y ^ M=z) =
  • 62. Bayes Nets: Slide 62Copyright © 2001, Andrew W. Moore Assume five variables T: The lecture started by 10:35 L: The lecturer arrives late R: The lecture concerns robots M: The lecturer is Manuela S: It is sunny • T only directly influenced by L (i.e. T is conditionally independent of R,M,S given L) • L only directly influenced by M and S (i.e. L is conditionally independent of R given M & S) • R only directly influenced by M (i.e. R is conditionally independent of L,S, given M) • M and S are independent
  • 63. Bayes Nets: Slide 63Copyright © 2001, Andrew W. Moore Making a Bayes net S M R L T Step One: add variables. • Just choose the variables you’d like to be included in the net. T: The lecture started by 10:35 L: The lecturer arrives late R: The lecture concerns robots M: The lecturer is Manuela S: It is sunny
  • 64. Bayes Nets: Slide 64Copyright © 2001, Andrew W. Moore Making a Bayes net S M R L T Step Two: add links. • The link structure must be acyclic. • If node X is given parents Q1,Q2,..Qn you are promising that any variable that’s a non-descendent of X is conditionally independent of X given {Q1,Q2,..Qn} T: The lecture started by 10:35 L: The lecturer arrives late R: The lecture concerns robots M: The lecturer is Manuela S: It is sunny
  • 65. Bayes Nets: Slide 65Copyright © 2001, Andrew W. Moore Making a Bayes net S M R L T P(s)=0.3 P(M)=0.6 P(RM)=0.3 P(R~M)=0.6 P(TL)=0.3 P(T~L)=0.8 P(LM^S)=0.05 P(LM^~S)=0.1 P(L~M^S)=0.1 P(L~M^~S)=0.2 Step Three: add a probability table for each node. • The table for node X must list P(X|Parent Values) for each possible combination of parent values T: The lecture started by 10:35 L: The lecturer arrives late R: The lecture concerns robots M: The lecturer is Manuela S: It is sunny
  • 66. Bayes Nets: Slide 66Copyright © 2001, Andrew W. Moore Making a Bayes net S M R L T P(s)=0.3 P(M)=0.6 P(RM)=0.3 P(R~M)=0.6 P(TL)=0.3 P(T~L)=0.8 P(LM^S)=0.05 P(LM^~S)=0.1 P(L~M^S)=0.1 P(L~M^~S)=0.2 • Two unconnected variables may still be correlated • Each node is conditionally independent of all non- descendants in the tree, given its parents. • You can deduce many other conditional independence relations from a Bayes net. See the next lecture. T: The lecture started by 10:35 L: The lecturer arrives late R: The lecture concerns robots M: The lecturer is Manuela S: It is sunny
  • 67. Bayes Nets: Slide 67Copyright © 2001, Andrew W. Moore Bayes Nets Formalized A Bayes net (also called a belief network) is an augmented directed acyclic graph, represented by the pair V , E where: • V is a set of vertices. • E is a set of directed edges joining vertices. No loops of any length are allowed. Each vertex in V contains the following information: • The name of a random variable • A probability distribution table indicating how the probability of this variable’s values depends on all possible combinations of parental values.
  • 68. Bayes Nets: Slide 68Copyright © 2001, Andrew W. Moore Building a Bayes Net 1. Choose a set of relevant variables. 2. Choose an ordering for them 3. Assume they’re called X1 .. Xm (where X1 is the first in the ordering, X1 is the second, etc) 4. For i = 1 to m: 1. Add the Xi node to the network 2. Set Parents(Xi ) to be a minimal subset of {X1…Xi-1} such that we have conditional independence of Xi and all other members of {X1…Xi-1} given Parents(Xi ) 3. Define the probability table of P(Xi =k  Assignments of Parents(Xi ) ).
  • 69. Bayes Nets: Slide 69Copyright © 2001, Andrew W. Moore Example Bayes Net Building Suppose we’re building a nuclear power station. There are the following random variables: GRL : Gauge Reads Low. CTL : Core temperature is low. FG : Gauge is faulty. FA : Alarm is faulty AS : Alarm sounds • If alarm working properly, the alarm is meant to sound if the gauge stops reading a low temp. • If gauge working properly, the gauge is meant to read the temp of the core.
  • 70. Bayes Nets: Slide 70Copyright © 2001, Andrew W. Moore Computing a Joint Entry How to compute an entry in a joint distribution? E.G: What is P(S ^ ~M ^ L ~R ^ T)? S M R L T P(s)=0.3 P(M)=0.6 P(RM)=0.3 P(R~M)=0.6 P(TL)=0.3 P(T~L)=0.8 P(LM^S)=0.05 P(LM^~S)=0.1 P(L~M^S)=0.1 P(L~M^~S)=0.2
  • 71. Bayes Nets: Slide 71Copyright © 2001, Andrew W. Moore Computing with Bayes Net P(T ^ ~R ^ L ^ ~M ^ S) = P(T  ~R ^ L ^ ~M ^ S) * P(~R ^ L ^ ~M ^ S) = P(T  L) * P(~R ^ L ^ ~M ^ S) = P(T  L) * P(~R  L ^ ~M ^ S) * P(L^~M^S) = P(T  L) * P(~R  ~M) * P(L^~M^S) = P(T  L) * P(~R  ~M) * P(L~M^S)*P(~M^S) = P(T  L) * P(~R  ~M) * P(L~M^S)*P(~M | S)*P(S) = P(T  L) * P(~R  ~M) * P(L~M^S)*P(~M)*P(S). S M R L T P(s)=0.3 P(M)=0.6 P(RM)=0.3 P(R~M)=0.6 P(TL)=0.3 P(T~L)=0.8 P(LM^S)=0.05 P(LM^~S)=0.1 P(L~M^S)=0.1 P(L~M^~S)=0.2
  • 72. Bayes Nets: Slide 72Copyright © 2001, Andrew W. Moore The general case P(X1 =x1 ^ X2=x2 ^ ….Xn-1=xn-1 ^ Xn=xn) = P(Xn=xn ^ Xn-1=xn-1 ^ ….X2=x2 ^ X1=x1) = P(Xn=xn  Xn-1=xn-1 ^ ….X2=x2 ^ X1=x1) * P(Xn-1=xn-1 ^…. X2=x2 ^ X1=x1) = P(Xn=xn  Xn-1=xn-1 ^ ….X2=x2 ^ X1=x1) * P(Xn-1=xn-1 …. X2=x2 ^ X1=x1) * P(Xn-2=xn-2 ^…. X2=x2 ^ X1=x1) = : : =                     n i iii n i iiii XxXP xXxXxXP 1 1 1111 ParentsofsAssignment  So any entry in joint pdf table can be computed. And so any conditional probability can be computed.
  • 73. Bayes Nets: Slide 73Copyright © 2001, Andrew W. Moore Where are we now? • We have a methodology for building Bayes nets. • We don’t require exponential storage to hold our probability table. Only exponential in the maximum number of parents of any node. • We can compute probabilities of any given assignment of truth values to the variables. And we can do it in time linear with the number of nodes. • So we can also compute answers to any questions. E.G. What could we do to compute P(R  T,~S)? S M R L T P(s)=0.3 P(M)=0.6 P(RM)=0.3 P(R~M)=0.6 P(TL)=0.3 P(T~L)=0.8 P(LM^S)=0.05 P(LM^~S)=0.1 P(L~M^S)=0.1 P(L~M^~S)=0.2
  • 74. Bayes Nets: Slide 74Copyright © 2001, Andrew W. Moore Where are we now? • We have a methodology for building Bayes nets. • We don’t require exponential storage to hold our probability table. Only exponential in the maximum number of parents of any node. • We can compute probabilities of any given assignment of truth values to the variables. And we can do it in time linear with the number of nodes. • So we can also compute answers to any questions. E.G. What could we do to compute P(R  T,~S)? S M R L T P(s)=0.3 P(M)=0.6 P(RM)=0.3 P(R~M)=0.6 P(TL)=0.3 P(T~L)=0.8 P(LM^S)=0.05 P(LM^~S)=0.1 P(L~M^S)=0.1 P(L~M^~S)=0.2 Step 1: Compute P(R ^ T ^ ~S) Step 2: Compute P(~R ^ T ^ ~S) Step 3: Return P(R ^ T ^ ~S) ------------------------------------- P(R ^ T ^ ~S)+ P(~R ^ T ^ ~S)
  • 75. Bayes Nets: Slide 75Copyright © 2001, Andrew W. Moore Where are we now? • We have a methodology for building Bayes nets. • We don’t require exponential storage to hold our probability table. Only exponential in the maximum number of parents of any node. • We can compute probabilities of any given assignment of truth values to the variables. And we can do it in time linear with the number of nodes. • So we can also compute answers to any questions. E.G. What could we do to compute P(R  T,~S)? S M R L T P(s)=0.3 P(M)=0.6 P(RM)=0.3 P(R~M)=0.6 P(TL)=0.3 P(T~L)=0.8 P(LM^S)=0.05 P(LM^~S)=0.1 P(L~M^S)=0.1 P(L~M^~S)=0.2 Step 1: Compute P(R ^ T ^ ~S) Step 2: Compute P(~R ^ T ^ ~S) Step 3: Return P(R ^ T ^ ~S) ------------------------------------- P(R ^ T ^ ~S)+ P(~R ^ T ^ ~S) Sum of all the rows in the Joint that match R ^ T ^ ~S Sum of all the rows in the Joint that match ~R ^ T ^ ~S
  • 76. Bayes Nets: Slide 76Copyright © 2001, Andrew W. Moore Where are we now? • We have a methodology for building Bayes nets. • We don’t require exponential storage to hold our probability table. Only exponential in the maximum number of parents of any node. • We can compute probabilities of any given assignment of truth values to the variables. And we can do it in time linear with the number of nodes. • So we can also compute answers to any questions. E.G. What could we do to compute P(R  T,~S)? S M R L T P(s)=0.3 P(M)=0.6 P(RM)=0.3 P(R~M)=0.6 P(TL)=0.3 P(T~L)=0.8 P(LM^S)=0.05 P(LM^~S)=0.1 P(L~M^S)=0.1 P(L~M^~S)=0.2 Step 1: Compute P(R ^ T ^ ~S) Step 2: Compute P(~R ^ T ^ ~S) Step 3: Return P(R ^ T ^ ~S) ------------------------------------- P(R ^ T ^ ~S)+ P(~R ^ T ^ ~S) Sum of all the rows in the Joint that match R ^ T ^ ~S Sum of all the rows in the Joint that match ~R ^ T ^ ~S Each of these obtained by the “computing a joint probability entry” method of the earlier slides 4 joint computes 4 joint computes
  • 77. Bayes Nets: Slide 77Copyright © 2001, Andrew W. Moore The good news We can do inference. We can compute any conditional probability: P( Some variable  Some other variable values )      2 21 matchingentriesjoint andmatchingentriesjoint 2 21 21 )entryjoint( )entryjoint( )( )( )|( E EE P P EP EEP EEP
  • 78. Bayes Nets: Slide 78Copyright © 2001, Andrew W. Moore The good news We can do inference. We can compute any conditional probability: P( Some variable  Some other variable values )      2 21 matchingentriesjoint andmatchingentriesjoint 2 21 21 )entryjoint( )entryjoint( )( )( )|( E EE P P EP EEP EEP Suppose you have m binary-valued variables in your Bayes Net and expression E2 mentions k variables. How much work is the above computation?
  • 79. Bayes Nets: Slide 79Copyright © 2001, Andrew W. Moore The sad, bad news Conditional probabilities by enumerating all matching entries in the joint are expensive: Exponential in the number of variables.
  • 80. Bayes Nets: Slide 80Copyright © 2001, Andrew W. Moore The sad, bad news Conditional probabilities by enumerating all matching entries in the joint are expensive: Exponential in the number of variables. But perhaps there are faster ways of querying Bayes nets? • In fact, if I ever ask you to manually do a Bayes Net inference, you’ll find there are often many tricks to save you time. • So we’ve just got to program our computer to do those tricks too, right?
  • 81. Bayes Nets: Slide 81Copyright © 2001, Andrew W. Moore The sad, bad news Conditional probabilities by enumerating all matching entries in the joint are expensive: Exponential in the number of variables. But perhaps there are faster ways of querying Bayes nets? • In fact, if I ever ask you to manually do a Bayes Net inference, you’ll find there are often many tricks to save you time. • So we’ve just got to program our computer to do those tricks too, right? Sadder and worse news: General querying of Bayes nets is NP-complete.
  • 82. Bayes Nets: Slide 82Copyright © 2001, Andrew W. Moore Bayes nets inference algorithms A poly-tree is a directed acyclic graph in which no two nodes have more than one path between them. A poly tree Not a poly tree (but still a legal Bayes net) S RL T L T MSM R X1 X2 X4 X3 X5 X1 X2 X3 X5 X4 • If net is a poly-tree, there is a linear-time algorithm (see a later Andrew lecture). • The best general-case algorithms convert a general net to a poly- tree (often at huge expense) and calls the poly-tree algorithm. • Another popular, practical approach (doesn’t assume poly-tree): Stochastic Simulation.
  • 83. Bayes Nets: Slide 83Copyright © 2001, Andrew W. Moore Sampling from the Joint Distribution It’s pretty easy to generate a set of variable-assignments at random with the same probability as the underlying joint distribution. How? S M R L T P(s)=0.3 P(M)=0.6 P(RM)=0.3 P(R~M)=0.6 P(TL)=0.3 P(T~L)=0.8 P(LM^S)=0.05 P(LM^~S)=0.1 P(L~M^S)=0.1 P(L~M^~S)=0.2
  • 84. Bayes Nets: Slide 84Copyright © 2001, Andrew W. Moore Sampling from the Joint Distribution 1. Randomly choose S. S = True with prob 0.3 2. Randomly choose M. M = True with prob 0.6 3. Randomly choose L. The probability that L is true depends on the assignments of S and M. E.G. if steps 1 and 2 had produced S=True, M=False, then probability that L is true is 0.1 4. Randomly choose R. Probability depends on M. 5. Randomly choose T. Probability depends on L S M R L T P(s)=0.3 P(M)=0.6 P(RM)=0.3 P(R~M)=0.6 P(TL)=0.3 P(T~L)=0.8 P(LM^S)=0.05 P(LM^~S)=0.1 P(L~M^S)=0.1 P(L~M^~S)=0.2
  • 85. Bayes Nets: Slide 85Copyright © 2001, Andrew W. Moore A general sampling algorithm Let’s generalize the example on the previous slide to a general Bayes Net. As in Slides 16-17 , call the variables X1 .. Xn, where Parents(Xi) must be a subset of {X1 .. Xi-1}. For i=1 to n: 1. Find parents, if any, of Xi. Assume n(i) parents. Call them Xp(i,1), Xp(i,2), …Xp(i,n(i)). 2. Recall the values that those parents were randomly given: xp(i,1), xp(i,2), …xp(i,n(i)). 3. Look up in the lookup-table for: P(Xi=True  Xp(i,1)=xp(i,1),Xp(i,2)=xp(i,2)…Xp(i,n(i))=xp(i,n(i))) 4. Randomly set xi=True according to this probability x1, x2,…xn are now a sample from the joint distribution of X1, X2,…Xn.
  • 86. Bayes Nets: Slide 86Copyright © 2001, Andrew W. Moore Stochastic Simulation Example Someone wants to know P(R = True  T = True ^ S = False ) We’ll do lots of random samplings and count the number of occurrences of the following: • Nc : Num. samples in which T=True and S=False. • Ns : Num. samples in which R=True, T=True and S=False. • N : Number of random samplings Now if N is big enough: Nc /N is a good estimate of P(T=True and S=False). Ns /N is a good estimate of P(R=True ,T=True , S=False). P(RT^~S) = P(R^T^~S)/P(T^~S), so Ns / Nc can be a good estimate of P(RT^~S).
  • 87. Bayes Nets: Slide 87Copyright © 2001, Andrew W. Moore General Stochastic Simulation Someone wants to know P(E1  E2 ) We’ll do lots of random samplings and count the number of occurrences of the following: • Nc : Num. samples in which E2 • Ns : Num. samples in which E1 and E2 • N : Number of random samplings Now if N is big enough: Nc /N is a good estimate of P(E2). Ns /N is a good estimate of P(E1 , E2). P(E1  E2) = P(E1^ E2)/P(E2), so Ns / Nc can be a good estimate of P(E1 E2).
  • 88. Bayes Nets: Slide 88Copyright © 2001, Andrew W. Moore Likelihood weighting Problem with Stochastic Sampling: With lots of constraints in E, or unlikely events in E, then most of the simulations will be thrown away, (they’ll have no effect on Nc, or Ns). Imagine we’re part way through our simulation. In E2 we have the constraint Xi = v We’re just about to generate a value for Xi at random. Given the values assigned to the parents, we see that P(Xi = v  parents) = p . Now we know that with stochastic sampling: • we’ll generate “Xi = v” proportion p of the time, and proceed. • And we’ll generate a different value proportion 1-p of the time, and the simulation will be wasted. Instead, always generate Xi = v, but weight the answer by weight “p” to compensate.
  • 89. Bayes Nets: Slide 89Copyright © 2001, Andrew W. Moore Likelihood weighting Set Nc :=0, Ns :=0 1. Generate a random assignment of all variables that matches E2. This process returns a weight w. 2. Define w to be the probability that this assignment would have been generated instead of an unmatching assignment during its generation in the original algorithm.Fact: w is a product of all likelihood factors involved in the generation. 3. Nc := Nc + w 4. If our sample matches E1 then Ns := Ns + w 5. Go to 1 Again, Ns / Nc estimates P(E1  E2 )
  • 90. Bayes Nets: Slide 90Copyright © 2001, Andrew W. Moore Case Study I Pathfinder system. (Heckerman 1991, Probabilistic Similarity Networks, MIT Press, Cambridge MA). • Diagnostic system for lymph-node diseases. • 60 diseases and 100 symptoms and test-results. • 14,000 probabilities • Expert consulted to make net. • 8 hours to determine variables. • 35 hours for net topology. • 40 hours for probability table values. • Apparently, the experts found it quite easy to invent the causal links and probabilities. • Pathfinder is now outperforming the world experts in diagnosis. Being extended to several dozen other medical domains.
  • 91. Bayes Nets: Slide 91Copyright © 2001, Andrew W. Moore Questions • What are the strengths of probabilistic networks compared with propositional logic? • What are the weaknesses of probabilistic networks compared with propositional logic? • What are the strengths of probabilistic networks compared with predicate logic? • What are the weaknesses of probabilistic networks compared with predicate logic? • (How) could predicate logic and probabilistic networks be combined?
  • 92. Bayes Nets: Slide 92Copyright © 2001, Andrew W. Moore What you should know • The meanings and importance of independence and conditional independence. • The definition of a Bayes net. • Computing probabilities of assignments of variables (i.e. members of the joint p.d.f.) with a Bayes net. • The slow (exponential) method for computing arbitrary, conditional probabilities. • The stochastic simulation method and likelihood weighting.