SlideShare ist ein Scribd-Unternehmen logo
1 von 74
Downloaden Sie, um offline zu lesen
AI, Uncertainty & Bayesian Networks
2015-03-10 / 03-12
Kim, Byoung-Hee
Biointelligence Laboratory
Seoul National University
http://bi.snu.ac.kr
Artificial Intelligence: Cognitive Agents
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 2
A Bayesian network is a graphical model
for probabilistic relationships among a set of variables
ACM Turing Award
 Nobel Prize in Computing
 2011 Winner: Judea Pearl (UCLA)
 For fundamental contributions to artificial intelligence
through the development of a calculus for probabilistic
and causal reasoning
 Invention of Bayesian networks
 Pearl's accomplishments have “redefined the term
'thinking machine’” over the past 30 years
 BN mimics “the neural activities of the human brain,
constantly exchanging messages without benefit of a
supervisor”
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 3
Notes
 Related chapters in the textbook (AIMA 3rd ed. by Russell and Norvig)
 Ch. 13 Quantifying Uncertainty
 Ch. 14 Probabilistic Reasoning (14.1~14.2)
 Ch. 20 Learning Probabilistic Models (20.1~20.2)
 On the reference ‘A Tutorial on Learning with Bayesian
Networks by David Heckerman’
 It is very technical but covers insights and comprehensive
backgrounds on Bayesian networks
 This lecture covers the ‘Introduction’ section
 This lecture is an easier introductory tutorial
 Both contents in the textbook and Heckerman’s tutorial is fairly
mathematical
 This lecture covers basic concepts and tools to understand
Bayesian networks
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 4
Contents
 Bayesian Networks: Introduction
 Motivating example
 Decomposing a joint distribution of variables
 d-separation
 A mini Turing test in causal conversation
 Correlation & causation
 AI & Uncertainty
 Bayesian Networks in Detail
 d-separation: revisited & details
 Probability & Bayesian
 Inference in & learning Bayesian networks
 BN as AI tools and advantages
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 5
Bayesian Networks:
Introduction
Causality, Dependency
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr
 From correlation to causality
 정성적 방법
 정량적 방법
 Granger causality index
 http://www.google.com/trends/
 Google의 CausalImpact
 R package for causal inference in time series
 Official posting: http://google-
opensource.blogspot.kr/2014/09/causalimpact-new-open-
source-package.html
 소개 기사(영문): https://gigaom.com/2014/09/11/google-
has-open-sourced-a-tool-for-inferring-cause-from-
correlations/
7
Slide from K. Mohan & J. Pearl, UAI ’12 Tutorial on Graphical Models for Causal Inference
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 8
Slide from K. Mohan & J. Pearl, UAI ’12 Tutorial on Graphical Models for Causal Inference
Assuming binary states for all the variables
Ex) Season: dry or rainy
Ex) Sprinkler: ON or OFF
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 9
Slide from K. Mohan & J. Pearl, UAI ’12 Tutorial on Graphical Models for Causal Inference
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 10
Slide from K. Mohan & J. Pearl, UAI ’12 Tutorial on Graphical Models for Causal Inference
(directed acyclic graph)
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 11
Slide from K. Mohan & J. Pearl, UAI ’12 Tutorial on Graphical Models for Causal Inference
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 12
Slide from K. Mohan & J. Pearl, UAI ’12 Tutorial on Graphical Models for Causal Inference
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 13
Slide from K. Mohan & J. Pearl, UAI ’12 Tutorial on Graphical Models for Causal Inference
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 14
Slide from K. Mohan & J. Pearl, UAI ’12 Tutorial on Graphical Models for Causal Inference
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 15
Slide from K. Mohan & J. Pearl, UAI ’12 Tutorial on Graphical Models for Causal Inference
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 16
Slide from K. Mohan & J. Pearl, UAI ’12 Tutorial on Graphical Models for Causal Inference
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 17
Slide from K. Mohan & J. Pearl, UAI ’12 Tutorial on Graphical Models for Causal Inference
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 18
Slide from K. Mohan & J. Pearl, UAI ’12 Tutorial on Graphical Models for Causal Inference
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 19
Slide from K. Mohan & J. Pearl, UAI ’12 Tutorial on Graphical Models for Causal Inference
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 20
Slide from K. Mohan & J. Pearl, UAI ’12 Tutorial on Graphical Models for Causal Inference
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 21
Slide from K. Mohan & J. Pearl, UAI ’12 Tutorial on Graphical Models for Causal Inference
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 22
Slide from K. Mohan & J. Pearl, UAI ’12 Tutorial on Graphical Models for Causal Inference
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 23
Slide from K. Mohan & J. Pearl, UAI ’12 Tutorial on Graphical Models for Causal Inference
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 24
Slide from K. Mohan & J. Pearl, UAI ’12 Tutorial on Graphical Models for Causal Inference
Correlation & Causation
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 25
Correlation does not imply causation
A chart that, correlates the number of
pirates with global temperature. The
two variables are correlated, but one
does not imply the other
a correlation between ice cream
consumption and crime, but
shows that the actual cause is
temperature.
Example
 Designing a Bayesian Network
 My own design of conditional probability tables
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 26
DRY 0.6
RAINY 0.4
SEASON
SEASON DRY RAINY
DRY 0.8 0.1
RAINY 0.2 0.9
SEASON DRY RAINY
YES 0.3 0.8
NO 0.7 0.2
WET YES NO
YES 0.8 0.1
NO 0.2 0.9
SPRINKLER RAIN
WET
SLIPPERY
SPRINKLER ON OFF
RAIN YES NO YES NO
YES 0.8 0.1 0.8 0.1
NO 0.2 0.9 0.2 0.9
Example
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 27
 Designing a Bayesian Network
 Tool: GeNIe (https://dslpitt.org/genie/)
 GeNIe (Graphical Network Interface) is the graphical interface to SMILE,
a fully portable Bayesian inference engine in C++
 Inference based on the designed Bayesian Network
Q1
A1
AI & Uncertainty
Probability
 Probability plays a central role in modern pattern
recognition.
 The main tool to deal uncertainties
 All of the probabilistic inference and learning amount to
repeated application of the sum rule and the product rule
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 29
Random Variables: variables + probability
Artificial Intelligence (AI)
 The objective of AI is to build intelligent computers
 We want intelligent, adaptive, robust behavior
 Often hand programming not possible.
 Solution? Get the computer to program itself, by
showing it examples of the behavior we want!
 This is the learning approach to AI.
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 30
cat car
Artificial Intelligence (AI)
 (Traditional) AI
 Knowledge & reasoning; work with facts/assertions;
develop rules of logical inference
 Planning: work with applicability/effects of actions;
develop searches for actions which achieve goals/avert
disasters.
 Expert systems: develop by hand a set of rules for
examining inputs, updating internal states and
generating outputs
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 31
Artificial Intelligence (AI)
 Probabilistic AI
 emphasis on noisy measurements, approximation in hard
cases, learning, algorithmic issues.
 The power of learning
 Automatic system building
 old expert systems needed hand coding of knowledge and
of output semantics
 learning automatically constructs rules and supports all
types of queries
 Probabilistic databases
 traditional DB technology cannot answer queries about
items that were never loaded into the dataset
 UAI models are like probabilistic databases
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 32
Uncertainty and Artificial Intelligence
(UAI)
 Probabilistic methods can be used to:
 make decisions given partial information about the world
 account for noisy sensors or actuators
 explain phenomena not part of our models
 describe inherently stochastic behavior in the world
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 33
Other Names for UAI
 Machine learning (ML), data mining, applied
statistics, adaptive (stochastic) signal processing,
probabilistic planning/reasoning...
 Some differences:
 Data mining almost always uses large data sets,
statistics almost always small ones
 Data mining, planning, decision theory often have no
internal parameters to be learned
 Statistics often has no algorithm to run!
 ML/UAI algorithms are rarely online and rarely scale to
huge data (changing now).
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 34
Learning in AI
 Learning is most useful
 when the structure of the task is not well understood
 but can be characterized by a dataset with strong
statistical regularity
 Also useful in adaptive or dynamic situations when
the task (or its parameters) are constantly
changing
 Currently, these are challenging topics of machine
learning and data mining research
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 35
Probabilistic AI
 Let inputs=X, correct answers=Y, outputs of our
machine=Z
 Learning: estimation of p(X, Y)
 The central object of interest is the joint distribution
 The main difficulty is compactly representing it and
robustly learning its shape given noisy samples
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 36
Probabilistic Graphical Models (PGMs)
 Probabilistic graphical models represent large joint
distributions compactly using a set of “local”
relationships specified by a graph
 Each random variable in our model corresponds
to a graph node.
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 37
Probabilistic Graphical Models (PGMs)
 There are useful properties in using
probabilistic graphical models
 A simple way to visualize the structure of a probabilistic
model
 Insights into the properties of the model
 Complex computations (for inference and learning) can
be expressed in terms of graphical manipulations
underlying mathematical expressions
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 38
39
Directed graph vs. undirected graph
 Both (probabilistic) graphical models
 Specify a factorization (how to express the joint distribution)
 Define a set of conditional independence properties
Parent - child
Local conditional distribution
Maximal clique
Potential function
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr
Bayesian Networks (BN) Markov Random Field (MRF)
Bayesian Networks in Detail
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 41
(DAG)
Designing a Bayesian Network Model
 TakeHeart II: Decision support system for clinical cardiovascular risk
assessment
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 42
Inference in a Bayesian Network Model
 Given an assignment of a subset of variables (evidence) in a BN,
estimate the posterior distribution over another subset of unobserved
variables of interest.
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 43
Inferences viewed as
message passing along the network
Bayesian Networks
 The joint distribution defined by a graph is given by the
product of a conditional distribution of each node
conditioned on their parent nodes.
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 44
𝑝 𝑥1, 𝑥2, … , 𝑥7 =
(𝑃𝑎(𝑥 𝑘) denotes the set of parents of xk)



K
k
kk xPaxpp
1
))(|()(x
ex)
* Without given DAG structure, usual chain rule can be applied to get
the joint distribution. But computational cost is much higher.
Bayes’ Theorem
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 45
( | ) ( )
( | )
( )
p X Y p Y
p Y X
p X

( ) ( | ) ( )
Y
p X p X Y p Y 
Posterior
Likelihood
Prior
Normalizing
constant
posterior  likelihood × prior
Bayes’ Theorem
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 46
Figure from Figure 1. in (Adams, et all, 2013)
obtained from http://journal.frontiersin.org/article/10.3389/fpsyt.2013.00047/full
Bayesian Probabilities
-Frequentist vs. Bayesian
 Likelihood:
 Frequentist
 w: a fixed parameter determined by ‘estimator’
 Maximum likelihood: Error function =
 Error bars: Obtained by the distribution of possible data sets
 Bootstrap
 Cross-validation
 Bayesian
 a probability distribution w: the uncertainty in the parameters
 Prior knowledge
 Noninformative (uniform) prior, Laplace correction in estimating priors
 Monte Carlo methods, variational Bayes, EP
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 47
( | ) ( )
( | )
( )
p p
p
p

w w
w
D
D
D
( | )p wD
log ( | )p wD
D
Thomas Bayes
(See an article ‘WHERE Do PROBABILITIES COME FROM?’ on page 491 in the textbook (Russell and Norvig, 2010) for more discussion)
Conditional Independence
 Conditional independence simplifies both the structure of a model and
the computations
 An important feature of graphical models is that conditional
independence properties of the joint distribution can be read directly
from the graph without having to perform any analytical manipulations
 The general framework for this is called d-separation
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 48
Three example graphs – 1st case
 None of the variables are observed
 The variable c is observed
 The conditioned node ‘blocks’ the path from a to b,
causes a and b to become (conditionally) independent.
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 49
Node c is tail-to-tail
Three example graphs – 2nd case
 None of the variables are observed
 The variable c is observed
 The conditioned node ‘blocks’ the path from a to b,
causes a and b to become (conditionally) independent.
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 50
Node c is head-to-tail
Three example graphs – 3rd case
 None of the variables are observed
 The variable c is observed
 When node c is unobserved,
it ‘blocks’ the path and the variables a and b are independent.
 Conditioning on c ‘unblocks’ the path and render a and b dependent.
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 51
Node c is head-to-head
Three example graphs - Fuel gauge example
 B – Battery, F-fuel, G-electric fuel gauge
 Checking the fuel gauge
 Checking the battery also has the meaning?
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 52
( Makes it more likely )
Makes it less likely than observation of fuel gauge only.
(rather unreliable fuel gauge)
(explaining away)
d-separation
 Tail-to-tail node or head-to-tail node
 Unless it is observed in which case it blocks a path, the
path is unblocked.
 Head-to-head node
 Blocks a path if is unobserved, but on the node, and/or
at least one of its descendants, is observed the path
becomes unblocked.
 d-separation?
 All paths are blocked.
 The joint distribution will satisfy conditional
independence w.r.t. concerned variables.
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 53
d-separation
 (a) a is dependent to b given c
 Head-to-head node e is unblocked, because a descendant c is in
the conditioning set.
 Tail-to-tail node f is unblocked
 (b) a is independent to b given f
 Head-to-head node e is blocked
 Tail-to-tail node f is blocked
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 54
d-separation
 Another example of conditional independence and d-separation: i.i.d.
(independent identically distributed) data
 Problem: finding posterior dist. for the mean of a univariate Gaussian dist.
 Every path is blocked and so the observations D={x1,…,xN} are independent
given
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 55
(The observations are in general no longer independent!)
(independent)
d-separation
 Naïve Bayes model
 Key assumption: conditioned on the class z, the distribution of the
input variables x1,…, xD are independent.
 Input {x1,…,xN} with their class labels,
then we can fit the naïve Bayes model to the training data using
maximum likelihood assuming that the data are drawn
independently from the model.
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 56
d-separation
 Markov blanket or Markov boundary
 When dealing with the conditional distribution of xi , consider the
minimal set of nodes that isolates xi from the rest of the graph.
 The set of nodes comprising parents, children, co-parents is called
the Markov blanket.
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 57
Co-parents
parents
children
Probability Distributions
 Discrete variables
 Beta, Bernoulli, binomial
 Dirichlet, multinomial
 …
 Continuous variables
 Normal (Gaussian)
 Student-t
 …
 Exponential family & conjugacy
 Many probability densities on x can be represented as the same form
 There are conjugate family of density functions having the same form of density
functions
 Beta & binomial
 Dirichlet & multinomial
 Normal & Normal
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 58
 ( | ) ( ) ( ) exp ( )
T
p h g  x x u x
beta Dirichlet
binomial Gaussian
F
x
beta
binomial
Dirichlet
multinomial
Inference in Graphical Models
 Inference in graphical models
 Given evidences (some nodes are clamped to observed values)
 Wish to compute the posterior distributions of other nodes
 Inference algorithms in graphical structures
 Main idea: propagation of local messages
 Exact inference
 Sum-product algorithm, max-product algorithm, junction tree
algorithm
 Approximate inference
 Loopy belief propagation + message passing schedule
 Variational methods, sampling methods (Monte Carlo methods)
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 59
A
B D
C E
ABD
BCD
CDE
Learning Parameters of
Bayesian Networks
 Parameters
 probabilities in conditional probability tables (CPTs) for all the
variables in the network
 Learning parameters
 Assuming that the structure is fixed, i.e. designed or learned.
 We need data, i.e. observed instances
 Estimation based on relative frequencies from data + belief
 Example: coin toss. Estimation of ‘heads’ in various ways
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 60
SEASON DRY RAINY
YES ? ?
NO ? ?
RAIN
DRY ?
RAINY ?
SEASON
The principle of indifference: head and tail are equally probable
If we tossed a coin 10,000 times and it landed
heads 3373 times, we would estimate the
probability of heads to be about .3373
𝑃 ℎ𝑒𝑎𝑑𝑠 = ൗ1
2
1
2
Learning Parameters of
Bayesian Networks
 Learning parameters (continued)
 Estimation based on relative frequencies from data + belief
 Example: A-match soccer game between Korea and Japan. How,
do you think, is it probable that Korean would win?
 A: 0.85 (Korean), B: 0.3 (Japanese)
 This probability is not a ratio, and it is not a relative
frequency because the game cannot be repeated many
times under the exact same conditions
 Degree of belief or subjective probability
 Usual method
 Estimate the probability distribution of a variable X
based on a relative frequency and belief concerning a
relative frequency
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 61
3
Learning Parameters of
Bayesian Networks
 Simple ‘counting’ solution (Bayesian point of view)
 Parameter estimation of a single node
 Assume local parameter independence
 For a binary variable (for example, a coin toss)
 prior: Beta distribution - Beta(a,b)
 after we have observed m heads and N-m tails posterior -
Beta(a+m,b+N-m) and 𝑃 𝑋 = ℎ𝑒𝑎𝑑 = ൗ(𝑎+𝑚)
𝑁
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 62
(conjugacy of Beta and Binomial distributions)
beta
binomial
Learning Parameters of
Bayesian Networks
 Simple ‘counting’ solution (Bayesian point of view)
 For a multinomial variable (for example, a dice toss)
 prior: Dirichlet distribution – Dirichlet(a1,a2, …, ad)
 𝑃 𝑋 = 𝑘 = Τ𝑎 𝑘
𝑁 𝑁 = σ 𝑎 𝑘
 Observing state i: Dirichlet(a1,…,ai+1,…, ad)
 For an entire network
 We simply iterate over its nodes
 In the case of incomplete data
 In real data, many of the variable values may be incorrect or missing
 Usual approximating solution is given by Gibbs sampling or EM
(expectation maximization) technique
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 63
(conjugacy of Dirichlet and Multinomial distributions)
Learning Parameters of
Bayesian Networks
 Smoothing
 Another viewpoint
 Laplace smoothing or additive smoothing given observed counts for
d states of a variable 𝑋 = (𝑥1, 𝑥2,…𝑥 𝑑)
 From a Bayesian point of view, this corresponds to the expected
value of the posterior distribution, using a symmetric Dirichlet
distribution with parameter α as a prior.
 Additive smoothing is commonly a component of naive Bayes
classifiers.
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 64
𝑃 𝑋 = 𝑘 =
𝑥 𝑘 + 𝛼
𝑁 + 𝛼𝑑
𝑖 = 1, … , 𝑑 , (𝛼 = 𝛼1 = 𝛼2 = ⋯ 𝛼 𝑑)
Learning the Graph Structure
 Learning the graph structure itself from data requires
 A space of possible structures
 A measure that can be used to score each structure
 From a Bayesian viewpoint
 Tough points
 Marginalization over latent variables => challenging computational
problem
 Exploring the space of structures can also be problematic
 The # of different graph structures grows exponentially with the # of
nodes
 Usually we resort to heuristics
 Local score based, global score based, conditional independence test based, …
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 65
: score for each model
Bayesian Networks as Tools for AI
 Learning
 Extracting and encoding knowledge from data
 Knowledge is represented in Probabilistic relationship among variables
 Causal relationship
 Network of variables
 Common framework for machine learning models
 Supervised and unsupervised learning
 Knowledge Representation & Reasoning
 Bayesian networks can be constructed from prior knowledge alone
 Constructed model can be used for reasoning based on probabilistic inference methods
 Expert System
 Uncertain expert knowledge can be encoded into a Bayesian network
 DAG in a Bayesian network is hand-constructed by domain experts
 Then the conditional probabilities were assessed by the expert, learned from data, or
obtained using a combination of both techniques.
 Bayesian network-based expert systems are popular
 Planning
 In some different form, known as decision graphs or influence diagrams
 We don’t cover about this direction
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 66
Advantages of Bayesian Networks for
Data Analysis
 Ability to handle missing data
 Because the model encodes dependencies among all variables
 Learning causal relationships
 Can be used to gain understanding about a problem domain
 Can be used to predict the consequences of intervention
 Having both causal and probabilistic semantics
 It is an ideal representation for combining prior knowledge (which comes in
causal form) and data
 Efficient and principled approach for avoiding the overfitting of data
 By Bayesian statistical methods in conjunction with Bayesian networks
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 67
(summary from the abstract of D. Heckerman’s Tutorial on BN)
(Read ‘Introduction’ section for detailed explanations)
References
 K. Mohan & J. Pearl, UAI ’12 Tutorial on Graphical Models for Causal
Inference
 S. Roweis, MLSS ’05 Lecture on Probabilistic Graphical Models
 Chapter 1, Chapter 2, Chapter 8 (Graphical Models), in Pattern
Recognition and Machine Learning by C.M. Bishop, 2006.
 David Heckerman, A Tutorial on Learning with Bayesian Networks.
 R.E. Neapolitan, Learning Bayesian Networks, Pearson Prentice Hall,
2004.
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 68
More Textbooks and Courses
 https://www.coursera.org/course/pgm :
Probabilistic Graphical Models by D. Koller
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 69
APPENDIX
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 70
Learning Parameters of
Bayesian Networks
 Case 1: X is a binary variable
 F: beta distribution, X: Bernoulli or binomial distribution
 Ex) F ~ Beta(a,b), then 𝑃 𝑋 = 1 = Τ𝑎
𝑁 (𝑁 = 𝑎 + 𝑏)
 Case 2: X is a multinomial variable
 F: Dirichlet distribution, X: multinomial distribution
 Ex) F ~ Dirichlet(a1,a2, …, ad), then 𝑃 𝑋 = 𝑘 = Τ𝑎 𝑘
𝑁 𝑁 = σ 𝑎 𝑘
 Laplace smoothing or additive smoothing given observed
frequencies 𝑋 = (𝑥1, 𝑥2,…𝑥 𝑑)
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 71
F
X
The probability distribution of F represents our belief
concerning the relative frequency with which X equals k.
𝑃 𝑋 = 𝑘 =
𝑥 𝑘 + 𝛼
𝑁 + 𝛼𝑑
𝑖 = 1, … , 𝑑 , (𝛼 = 𝛼1 = 𝛼2 = ⋯ 𝛼 𝑑)
Graphical interpretation of
Bayes’ theorem
 Given structure:
 We observe the value of y
 Goal: infer the posterior distribution over x,
 Marginal distribution : a prior over the latent
variable x
 We can evaluate the marginal distribution
 By Bayes’ theorem we can calculate
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 72
( , ) ( ) ( | )p x y p x p y x
( | )p x y
( )p x
( )p y
'
( ) ( | ') ( ')
x
p y p y x p x 
( | ) ( )
( | )
( )
p y x p x
p x y
p y

(a)
(b)
(c)
d-separation
 Directed factorization
 Filtering whether can be expressed in terms of the factorization implied by
the graph?
 If we present to the filter the set of all possible distributions p(x) over the set
of variables X, then the subset of distributions that are passed by the filter
will be denoted DF (Directed Factorization)
 Fully connected graph: The set DF will contain all possible distributions
 Fully disconnected graph: The joint distributions which factorize into the
product of the marginal distributions over the variables only.
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 73
Gaussian distribution
© 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 74
2 2
2 1/ 2 2
1 1
( | , ) exp ( )
(2 ) 2
x x  
 
 
   
 
N
1
/ 2 1/ 2
1 1 1
( | , ) exp ( ) ( )
(2 ) | | 2
T
D

 
    
 
x μ Σ x μ Σ x μ
Σ
N
Multivariate Gaussian

Weitere ähnliche Inhalte

Was ist angesagt?

Problem solving agents
Problem solving agentsProblem solving agents
Problem solving agentsMegha Sharma
 
Inference in Bayesian Networks
Inference in Bayesian NetworksInference in Bayesian Networks
Inference in Bayesian Networksguestfee8698
 
Semantic nets in artificial intelligence
Semantic nets in artificial intelligenceSemantic nets in artificial intelligence
Semantic nets in artificial intelligenceharshita virwani
 
Ch 7 Knowledge Representation.pdf
Ch 7 Knowledge Representation.pdfCh 7 Knowledge Representation.pdf
Ch 7 Knowledge Representation.pdfKrishnaMadala1
 
Unit IV UNCERTAINITY AND STATISTICAL REASONING in AI K.Sundar,AP/CSE,VEC
Unit IV UNCERTAINITY AND STATISTICAL REASONING in AI K.Sundar,AP/CSE,VECUnit IV UNCERTAINITY AND STATISTICAL REASONING in AI K.Sundar,AP/CSE,VEC
Unit IV UNCERTAINITY AND STATISTICAL REASONING in AI K.Sundar,AP/CSE,VECsundarKanagaraj1
 
Heuristic Search Techniques {Artificial Intelligence}
Heuristic Search Techniques {Artificial Intelligence}Heuristic Search Techniques {Artificial Intelligence}
Heuristic Search Techniques {Artificial Intelligence}FellowBuddy.com
 
Uncertain knowledge and reasoning
Uncertain knowledge and reasoningUncertain knowledge and reasoning
Uncertain knowledge and reasoningShiwani Gupta
 
Logics for non monotonic reasoning-ai
Logics for non monotonic reasoning-aiLogics for non monotonic reasoning-ai
Logics for non monotonic reasoning-aiShaishavShah8
 
Heuristic Search Techniques Unit -II.ppt
Heuristic Search Techniques Unit -II.pptHeuristic Search Techniques Unit -II.ppt
Heuristic Search Techniques Unit -II.pptkarthikaparthasarath
 
Artificial Intelligence Notes Unit 1
Artificial Intelligence Notes Unit 1 Artificial Intelligence Notes Unit 1
Artificial Intelligence Notes Unit 1 DigiGurukul
 

Was ist angesagt? (20)

Uncertainty in AI
Uncertainty in AIUncertainty in AI
Uncertainty in AI
 
Problem solving agents
Problem solving agentsProblem solving agents
Problem solving agents
 
Inference in Bayesian Networks
Inference in Bayesian NetworksInference in Bayesian Networks
Inference in Bayesian Networks
 
Semantic nets in artificial intelligence
Semantic nets in artificial intelligenceSemantic nets in artificial intelligence
Semantic nets in artificial intelligence
 
Ch 7 Knowledge Representation.pdf
Ch 7 Knowledge Representation.pdfCh 7 Knowledge Representation.pdf
Ch 7 Knowledge Representation.pdf
 
Hill climbing
Hill climbingHill climbing
Hill climbing
 
And or graph
And or graphAnd or graph
And or graph
 
Unit IV UNCERTAINITY AND STATISTICAL REASONING in AI K.Sundar,AP/CSE,VEC
Unit IV UNCERTAINITY AND STATISTICAL REASONING in AI K.Sundar,AP/CSE,VECUnit IV UNCERTAINITY AND STATISTICAL REASONING in AI K.Sundar,AP/CSE,VEC
Unit IV UNCERTAINITY AND STATISTICAL REASONING in AI K.Sundar,AP/CSE,VEC
 
predicate logic example
predicate logic examplepredicate logic example
predicate logic example
 
Bayes Belief Networks
Bayes Belief NetworksBayes Belief Networks
Bayes Belief Networks
 
Heuristic Search Techniques {Artificial Intelligence}
Heuristic Search Techniques {Artificial Intelligence}Heuristic Search Techniques {Artificial Intelligence}
Heuristic Search Techniques {Artificial Intelligence}
 
Uncertain knowledge and reasoning
Uncertain knowledge and reasoningUncertain knowledge and reasoning
Uncertain knowledge and reasoning
 
Artificial Intelligence - Reasoning in Uncertain Situations
Artificial Intelligence - Reasoning in Uncertain SituationsArtificial Intelligence - Reasoning in Uncertain Situations
Artificial Intelligence - Reasoning in Uncertain Situations
 
Logics for non monotonic reasoning-ai
Logics for non monotonic reasoning-aiLogics for non monotonic reasoning-ai
Logics for non monotonic reasoning-ai
 
Dynamic Itemset Counting
Dynamic Itemset CountingDynamic Itemset Counting
Dynamic Itemset Counting
 
Heuristic Search Techniques Unit -II.ppt
Heuristic Search Techniques Unit -II.pptHeuristic Search Techniques Unit -II.ppt
Heuristic Search Techniques Unit -II.ppt
 
Structured Knowledge Representation
Structured Knowledge RepresentationStructured Knowledge Representation
Structured Knowledge Representation
 
Informed search
Informed searchInformed search
Informed search
 
Artificial Intelligence Notes Unit 1
Artificial Intelligence Notes Unit 1 Artificial Intelligence Notes Unit 1
Artificial Intelligence Notes Unit 1
 
Naive Bayes
Naive BayesNaive Bayes
Naive Bayes
 

Ähnlich wie Bayesian networks in AI

Jia-Bin Huang's Curriculum Vitae
Jia-Bin Huang's Curriculum VitaeJia-Bin Huang's Curriculum Vitae
Jia-Bin Huang's Curriculum VitaeJia-Bin Huang
 
IRJET- Car Defect Detection using Machine Learning for Insurance
IRJET- Car Defect Detection using Machine Learning for InsuranceIRJET- Car Defect Detection using Machine Learning for Insurance
IRJET- Car Defect Detection using Machine Learning for InsuranceIRJET Journal
 
Monitoring and Operational Data Analytics from a User Perspective at First Eu...
Monitoring and Operational Data Analytics from a User Perspective at First Eu...Monitoring and Operational Data Analytics from a User Perspective at First Eu...
Monitoring and Operational Data Analytics from a User Perspective at First Eu...University of Maribor
 
Software Sustainability Institute
Software Sustainability InstituteSoftware Sustainability Institute
Software Sustainability InstituteNeil Chue Hong
 
A Literature Survey on Image Linguistic Visual Question Answering
A Literature Survey on Image Linguistic Visual Question AnsweringA Literature Survey on Image Linguistic Visual Question Answering
A Literature Survey on Image Linguistic Visual Question AnsweringIRJET Journal
 
Suspicious Activity Detection
Suspicious Activity DetectionSuspicious Activity Detection
Suspicious Activity DetectionMushahid Ali
 
Motion capture for Animation
Motion capture for AnimationMotion capture for Animation
Motion capture for AnimationIRJET Journal
 
속담으로 풀어보는 오가닉 비즈니스
속담으로 풀어보는 오가닉 비즈니스속담으로 풀어보는 오가닉 비즈니스
속담으로 풀어보는 오가닉 비즈니스Sangkyu Rho
 
RESUME_neel (2) (1) (1)
RESUME_neel (2) (1) (1)RESUME_neel (2) (1) (1)
RESUME_neel (2) (1) (1)NEELAM MALI
 
Qu speaker series 14: Synthetic Data Generation in Finance
Qu speaker series 14: Synthetic Data Generation in FinanceQu speaker series 14: Synthetic Data Generation in Finance
Qu speaker series 14: Synthetic Data Generation in FinanceQuantUniversity
 
11.0003www.iiste.org call for paper.survey on wireless intelligent video surv...
11.0003www.iiste.org call for paper.survey on wireless intelligent video surv...11.0003www.iiste.org call for paper.survey on wireless intelligent video surv...
11.0003www.iiste.org call for paper.survey on wireless intelligent video surv...Alexander Decker
 
3.survey on wireless intelligent video surveillance system using moving objec...
3.survey on wireless intelligent video surveillance system using moving objec...3.survey on wireless intelligent video surveillance system using moving objec...
3.survey on wireless intelligent video surveillance system using moving objec...Alexander Decker
 
Covid Mask Detection and Social Distancing Using Raspberry pi
Covid Mask Detection and Social Distancing Using Raspberry piCovid Mask Detection and Social Distancing Using Raspberry pi
Covid Mask Detection and Social Distancing Using Raspberry piIRJET Journal
 
final internship ppt (1).pptx
final internship ppt (1).pptxfinal internship ppt (1).pptx
final internship ppt (1).pptxVidyags3
 
FACE MASK DETECTION USING ML
FACE MASK DETECTION USING MLFACE MASK DETECTION USING ML
FACE MASK DETECTION USING MLIRJET Journal
 
Smart manufacturing through cloud based-r-nabati--dr abdulbaghi ghaderzadeh
Smart manufacturing through cloud based-r-nabati--dr abdulbaghi ghaderzadehSmart manufacturing through cloud based-r-nabati--dr abdulbaghi ghaderzadeh
Smart manufacturing through cloud based-r-nabati--dr abdulbaghi ghaderzadehnabati
 
Isab 11 for_slideshare
Isab 11 for_slideshareIsab 11 for_slideshare
Isab 11 for_slideshareRichard Adams
 
Improved Knowledge from Data: Building an Immersive Data Analysis Platform
Improved Knowledge from Data: Building an Immersive Data Analysis PlatformImproved Knowledge from Data: Building an Immersive Data Analysis Platform
Improved Knowledge from Data: Building an Immersive Data Analysis PlatformFelipe Pedroso
 

Ähnlich wie Bayesian networks in AI (20)

CV _Manoj
CV _ManojCV _Manoj
CV _Manoj
 
Jia-Bin Huang's Curriculum Vitae
Jia-Bin Huang's Curriculum VitaeJia-Bin Huang's Curriculum Vitae
Jia-Bin Huang's Curriculum Vitae
 
IRJET- Car Defect Detection using Machine Learning for Insurance
IRJET- Car Defect Detection using Machine Learning for InsuranceIRJET- Car Defect Detection using Machine Learning for Insurance
IRJET- Car Defect Detection using Machine Learning for Insurance
 
Monitoring and Operational Data Analytics from a User Perspective at First Eu...
Monitoring and Operational Data Analytics from a User Perspective at First Eu...Monitoring and Operational Data Analytics from a User Perspective at First Eu...
Monitoring and Operational Data Analytics from a User Perspective at First Eu...
 
Software Sustainability Institute
Software Sustainability InstituteSoftware Sustainability Institute
Software Sustainability Institute
 
A Literature Survey on Image Linguistic Visual Question Answering
A Literature Survey on Image Linguistic Visual Question AnsweringA Literature Survey on Image Linguistic Visual Question Answering
A Literature Survey on Image Linguistic Visual Question Answering
 
Suspicious Activity Detection
Suspicious Activity DetectionSuspicious Activity Detection
Suspicious Activity Detection
 
Euler-time Diagrams
Euler-time DiagramsEuler-time Diagrams
Euler-time Diagrams
 
Motion capture for Animation
Motion capture for AnimationMotion capture for Animation
Motion capture for Animation
 
속담으로 풀어보는 오가닉 비즈니스
속담으로 풀어보는 오가닉 비즈니스속담으로 풀어보는 오가닉 비즈니스
속담으로 풀어보는 오가닉 비즈니스
 
RESUME_neel (2) (1) (1)
RESUME_neel (2) (1) (1)RESUME_neel (2) (1) (1)
RESUME_neel (2) (1) (1)
 
Qu speaker series 14: Synthetic Data Generation in Finance
Qu speaker series 14: Synthetic Data Generation in FinanceQu speaker series 14: Synthetic Data Generation in Finance
Qu speaker series 14: Synthetic Data Generation in Finance
 
11.0003www.iiste.org call for paper.survey on wireless intelligent video surv...
11.0003www.iiste.org call for paper.survey on wireless intelligent video surv...11.0003www.iiste.org call for paper.survey on wireless intelligent video surv...
11.0003www.iiste.org call for paper.survey on wireless intelligent video surv...
 
3.survey on wireless intelligent video surveillance system using moving objec...
3.survey on wireless intelligent video surveillance system using moving objec...3.survey on wireless intelligent video surveillance system using moving objec...
3.survey on wireless intelligent video surveillance system using moving objec...
 
Covid Mask Detection and Social Distancing Using Raspberry pi
Covid Mask Detection and Social Distancing Using Raspberry piCovid Mask Detection and Social Distancing Using Raspberry pi
Covid Mask Detection and Social Distancing Using Raspberry pi
 
final internship ppt (1).pptx
final internship ppt (1).pptxfinal internship ppt (1).pptx
final internship ppt (1).pptx
 
FACE MASK DETECTION USING ML
FACE MASK DETECTION USING MLFACE MASK DETECTION USING ML
FACE MASK DETECTION USING ML
 
Smart manufacturing through cloud based-r-nabati--dr abdulbaghi ghaderzadeh
Smart manufacturing through cloud based-r-nabati--dr abdulbaghi ghaderzadehSmart manufacturing through cloud based-r-nabati--dr abdulbaghi ghaderzadeh
Smart manufacturing through cloud based-r-nabati--dr abdulbaghi ghaderzadeh
 
Isab 11 for_slideshare
Isab 11 for_slideshareIsab 11 for_slideshare
Isab 11 for_slideshare
 
Improved Knowledge from Data: Building an Immersive Data Analysis Platform
Improved Knowledge from Data: Building an Immersive Data Analysis PlatformImproved Knowledge from Data: Building an Immersive Data Analysis Platform
Improved Knowledge from Data: Building an Immersive Data Analysis Platform
 

Kürzlich hochgeladen

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 

Kürzlich hochgeladen (20)

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

Bayesian networks in AI

  • 1. AI, Uncertainty & Bayesian Networks 2015-03-10 / 03-12 Kim, Byoung-Hee Biointelligence Laboratory Seoul National University http://bi.snu.ac.kr Artificial Intelligence: Cognitive Agents
  • 2. © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 2 A Bayesian network is a graphical model for probabilistic relationships among a set of variables
  • 3. ACM Turing Award  Nobel Prize in Computing  2011 Winner: Judea Pearl (UCLA)  For fundamental contributions to artificial intelligence through the development of a calculus for probabilistic and causal reasoning  Invention of Bayesian networks  Pearl's accomplishments have “redefined the term 'thinking machine’” over the past 30 years  BN mimics “the neural activities of the human brain, constantly exchanging messages without benefit of a supervisor” © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 3
  • 4. Notes  Related chapters in the textbook (AIMA 3rd ed. by Russell and Norvig)  Ch. 13 Quantifying Uncertainty  Ch. 14 Probabilistic Reasoning (14.1~14.2)  Ch. 20 Learning Probabilistic Models (20.1~20.2)  On the reference ‘A Tutorial on Learning with Bayesian Networks by David Heckerman’  It is very technical but covers insights and comprehensive backgrounds on Bayesian networks  This lecture covers the ‘Introduction’ section  This lecture is an easier introductory tutorial  Both contents in the textbook and Heckerman’s tutorial is fairly mathematical  This lecture covers basic concepts and tools to understand Bayesian networks © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 4
  • 5. Contents  Bayesian Networks: Introduction  Motivating example  Decomposing a joint distribution of variables  d-separation  A mini Turing test in causal conversation  Correlation & causation  AI & Uncertainty  Bayesian Networks in Detail  d-separation: revisited & details  Probability & Bayesian  Inference in & learning Bayesian networks  BN as AI tools and advantages © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 5
  • 7. Causality, Dependency © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr  From correlation to causality  정성적 방법  정량적 방법  Granger causality index  http://www.google.com/trends/  Google의 CausalImpact  R package for causal inference in time series  Official posting: http://google- opensource.blogspot.kr/2014/09/causalimpact-new-open- source-package.html  소개 기사(영문): https://gigaom.com/2014/09/11/google- has-open-sourced-a-tool-for-inferring-cause-from- correlations/ 7 Slide from K. Mohan & J. Pearl, UAI ’12 Tutorial on Graphical Models for Causal Inference
  • 8. © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 8 Slide from K. Mohan & J. Pearl, UAI ’12 Tutorial on Graphical Models for Causal Inference Assuming binary states for all the variables Ex) Season: dry or rainy Ex) Sprinkler: ON or OFF
  • 9. © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 9 Slide from K. Mohan & J. Pearl, UAI ’12 Tutorial on Graphical Models for Causal Inference
  • 10. © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 10 Slide from K. Mohan & J. Pearl, UAI ’12 Tutorial on Graphical Models for Causal Inference (directed acyclic graph)
  • 11. © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 11 Slide from K. Mohan & J. Pearl, UAI ’12 Tutorial on Graphical Models for Causal Inference
  • 12. © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 12 Slide from K. Mohan & J. Pearl, UAI ’12 Tutorial on Graphical Models for Causal Inference
  • 13. © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 13 Slide from K. Mohan & J. Pearl, UAI ’12 Tutorial on Graphical Models for Causal Inference
  • 14. © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 14 Slide from K. Mohan & J. Pearl, UAI ’12 Tutorial on Graphical Models for Causal Inference
  • 15. © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 15 Slide from K. Mohan & J. Pearl, UAI ’12 Tutorial on Graphical Models for Causal Inference
  • 16. © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 16 Slide from K. Mohan & J. Pearl, UAI ’12 Tutorial on Graphical Models for Causal Inference
  • 17. © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 17 Slide from K. Mohan & J. Pearl, UAI ’12 Tutorial on Graphical Models for Causal Inference
  • 18. © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 18 Slide from K. Mohan & J. Pearl, UAI ’12 Tutorial on Graphical Models for Causal Inference
  • 19. © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 19 Slide from K. Mohan & J. Pearl, UAI ’12 Tutorial on Graphical Models for Causal Inference
  • 20. © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 20 Slide from K. Mohan & J. Pearl, UAI ’12 Tutorial on Graphical Models for Causal Inference
  • 21. © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 21 Slide from K. Mohan & J. Pearl, UAI ’12 Tutorial on Graphical Models for Causal Inference
  • 22. © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 22 Slide from K. Mohan & J. Pearl, UAI ’12 Tutorial on Graphical Models for Causal Inference
  • 23. © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 23 Slide from K. Mohan & J. Pearl, UAI ’12 Tutorial on Graphical Models for Causal Inference
  • 24. © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 24 Slide from K. Mohan & J. Pearl, UAI ’12 Tutorial on Graphical Models for Causal Inference
  • 25. Correlation & Causation © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 25 Correlation does not imply causation A chart that, correlates the number of pirates with global temperature. The two variables are correlated, but one does not imply the other a correlation between ice cream consumption and crime, but shows that the actual cause is temperature.
  • 26. Example  Designing a Bayesian Network  My own design of conditional probability tables © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 26 DRY 0.6 RAINY 0.4 SEASON SEASON DRY RAINY DRY 0.8 0.1 RAINY 0.2 0.9 SEASON DRY RAINY YES 0.3 0.8 NO 0.7 0.2 WET YES NO YES 0.8 0.1 NO 0.2 0.9 SPRINKLER RAIN WET SLIPPERY SPRINKLER ON OFF RAIN YES NO YES NO YES 0.8 0.1 0.8 0.1 NO 0.2 0.9 0.2 0.9
  • 27. Example © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 27  Designing a Bayesian Network  Tool: GeNIe (https://dslpitt.org/genie/)  GeNIe (Graphical Network Interface) is the graphical interface to SMILE, a fully portable Bayesian inference engine in C++  Inference based on the designed Bayesian Network Q1 A1
  • 29. Probability  Probability plays a central role in modern pattern recognition.  The main tool to deal uncertainties  All of the probabilistic inference and learning amount to repeated application of the sum rule and the product rule © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 29 Random Variables: variables + probability
  • 30. Artificial Intelligence (AI)  The objective of AI is to build intelligent computers  We want intelligent, adaptive, robust behavior  Often hand programming not possible.  Solution? Get the computer to program itself, by showing it examples of the behavior we want!  This is the learning approach to AI. © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 30 cat car
  • 31. Artificial Intelligence (AI)  (Traditional) AI  Knowledge & reasoning; work with facts/assertions; develop rules of logical inference  Planning: work with applicability/effects of actions; develop searches for actions which achieve goals/avert disasters.  Expert systems: develop by hand a set of rules for examining inputs, updating internal states and generating outputs © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 31
  • 32. Artificial Intelligence (AI)  Probabilistic AI  emphasis on noisy measurements, approximation in hard cases, learning, algorithmic issues.  The power of learning  Automatic system building  old expert systems needed hand coding of knowledge and of output semantics  learning automatically constructs rules and supports all types of queries  Probabilistic databases  traditional DB technology cannot answer queries about items that were never loaded into the dataset  UAI models are like probabilistic databases © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 32
  • 33. Uncertainty and Artificial Intelligence (UAI)  Probabilistic methods can be used to:  make decisions given partial information about the world  account for noisy sensors or actuators  explain phenomena not part of our models  describe inherently stochastic behavior in the world © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 33
  • 34. Other Names for UAI  Machine learning (ML), data mining, applied statistics, adaptive (stochastic) signal processing, probabilistic planning/reasoning...  Some differences:  Data mining almost always uses large data sets, statistics almost always small ones  Data mining, planning, decision theory often have no internal parameters to be learned  Statistics often has no algorithm to run!  ML/UAI algorithms are rarely online and rarely scale to huge data (changing now). © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 34
  • 35. Learning in AI  Learning is most useful  when the structure of the task is not well understood  but can be characterized by a dataset with strong statistical regularity  Also useful in adaptive or dynamic situations when the task (or its parameters) are constantly changing  Currently, these are challenging topics of machine learning and data mining research © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 35
  • 36. Probabilistic AI  Let inputs=X, correct answers=Y, outputs of our machine=Z  Learning: estimation of p(X, Y)  The central object of interest is the joint distribution  The main difficulty is compactly representing it and robustly learning its shape given noisy samples © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 36
  • 37. Probabilistic Graphical Models (PGMs)  Probabilistic graphical models represent large joint distributions compactly using a set of “local” relationships specified by a graph  Each random variable in our model corresponds to a graph node. © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 37
  • 38. Probabilistic Graphical Models (PGMs)  There are useful properties in using probabilistic graphical models  A simple way to visualize the structure of a probabilistic model  Insights into the properties of the model  Complex computations (for inference and learning) can be expressed in terms of graphical manipulations underlying mathematical expressions © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 38
  • 39. 39 Directed graph vs. undirected graph  Both (probabilistic) graphical models  Specify a factorization (how to express the joint distribution)  Define a set of conditional independence properties Parent - child Local conditional distribution Maximal clique Potential function © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr Bayesian Networks (BN) Markov Random Field (MRF)
  • 41. © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 41 (DAG)
  • 42. Designing a Bayesian Network Model  TakeHeart II: Decision support system for clinical cardiovascular risk assessment © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 42
  • 43. Inference in a Bayesian Network Model  Given an assignment of a subset of variables (evidence) in a BN, estimate the posterior distribution over another subset of unobserved variables of interest. © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 43 Inferences viewed as message passing along the network
  • 44. Bayesian Networks  The joint distribution defined by a graph is given by the product of a conditional distribution of each node conditioned on their parent nodes. © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 44 𝑝 𝑥1, 𝑥2, … , 𝑥7 = (𝑃𝑎(𝑥 𝑘) denotes the set of parents of xk)    K k kk xPaxpp 1 ))(|()(x ex) * Without given DAG structure, usual chain rule can be applied to get the joint distribution. But computational cost is much higher.
  • 45. Bayes’ Theorem © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 45 ( | ) ( ) ( | ) ( ) p X Y p Y p Y X p X  ( ) ( | ) ( ) Y p X p X Y p Y  Posterior Likelihood Prior Normalizing constant posterior  likelihood × prior
  • 46. Bayes’ Theorem © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 46 Figure from Figure 1. in (Adams, et all, 2013) obtained from http://journal.frontiersin.org/article/10.3389/fpsyt.2013.00047/full
  • 47. Bayesian Probabilities -Frequentist vs. Bayesian  Likelihood:  Frequentist  w: a fixed parameter determined by ‘estimator’  Maximum likelihood: Error function =  Error bars: Obtained by the distribution of possible data sets  Bootstrap  Cross-validation  Bayesian  a probability distribution w: the uncertainty in the parameters  Prior knowledge  Noninformative (uniform) prior, Laplace correction in estimating priors  Monte Carlo methods, variational Bayes, EP © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 47 ( | ) ( ) ( | ) ( ) p p p p  w w w D D D ( | )p wD log ( | )p wD D Thomas Bayes (See an article ‘WHERE Do PROBABILITIES COME FROM?’ on page 491 in the textbook (Russell and Norvig, 2010) for more discussion)
  • 48. Conditional Independence  Conditional independence simplifies both the structure of a model and the computations  An important feature of graphical models is that conditional independence properties of the joint distribution can be read directly from the graph without having to perform any analytical manipulations  The general framework for this is called d-separation © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 48
  • 49. Three example graphs – 1st case  None of the variables are observed  The variable c is observed  The conditioned node ‘blocks’ the path from a to b, causes a and b to become (conditionally) independent. © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 49 Node c is tail-to-tail
  • 50. Three example graphs – 2nd case  None of the variables are observed  The variable c is observed  The conditioned node ‘blocks’ the path from a to b, causes a and b to become (conditionally) independent. © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 50 Node c is head-to-tail
  • 51. Three example graphs – 3rd case  None of the variables are observed  The variable c is observed  When node c is unobserved, it ‘blocks’ the path and the variables a and b are independent.  Conditioning on c ‘unblocks’ the path and render a and b dependent. © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 51 Node c is head-to-head
  • 52. Three example graphs - Fuel gauge example  B – Battery, F-fuel, G-electric fuel gauge  Checking the fuel gauge  Checking the battery also has the meaning? © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 52 ( Makes it more likely ) Makes it less likely than observation of fuel gauge only. (rather unreliable fuel gauge) (explaining away)
  • 53. d-separation  Tail-to-tail node or head-to-tail node  Unless it is observed in which case it blocks a path, the path is unblocked.  Head-to-head node  Blocks a path if is unobserved, but on the node, and/or at least one of its descendants, is observed the path becomes unblocked.  d-separation?  All paths are blocked.  The joint distribution will satisfy conditional independence w.r.t. concerned variables. © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 53
  • 54. d-separation  (a) a is dependent to b given c  Head-to-head node e is unblocked, because a descendant c is in the conditioning set.  Tail-to-tail node f is unblocked  (b) a is independent to b given f  Head-to-head node e is blocked  Tail-to-tail node f is blocked © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 54
  • 55. d-separation  Another example of conditional independence and d-separation: i.i.d. (independent identically distributed) data  Problem: finding posterior dist. for the mean of a univariate Gaussian dist.  Every path is blocked and so the observations D={x1,…,xN} are independent given © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 55 (The observations are in general no longer independent!) (independent)
  • 56. d-separation  Naïve Bayes model  Key assumption: conditioned on the class z, the distribution of the input variables x1,…, xD are independent.  Input {x1,…,xN} with their class labels, then we can fit the naïve Bayes model to the training data using maximum likelihood assuming that the data are drawn independently from the model. © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 56
  • 57. d-separation  Markov blanket or Markov boundary  When dealing with the conditional distribution of xi , consider the minimal set of nodes that isolates xi from the rest of the graph.  The set of nodes comprising parents, children, co-parents is called the Markov blanket. © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 57 Co-parents parents children
  • 58. Probability Distributions  Discrete variables  Beta, Bernoulli, binomial  Dirichlet, multinomial  …  Continuous variables  Normal (Gaussian)  Student-t  …  Exponential family & conjugacy  Many probability densities on x can be represented as the same form  There are conjugate family of density functions having the same form of density functions  Beta & binomial  Dirichlet & multinomial  Normal & Normal © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 58  ( | ) ( ) ( ) exp ( ) T p h g  x x u x beta Dirichlet binomial Gaussian F x beta binomial Dirichlet multinomial
  • 59. Inference in Graphical Models  Inference in graphical models  Given evidences (some nodes are clamped to observed values)  Wish to compute the posterior distributions of other nodes  Inference algorithms in graphical structures  Main idea: propagation of local messages  Exact inference  Sum-product algorithm, max-product algorithm, junction tree algorithm  Approximate inference  Loopy belief propagation + message passing schedule  Variational methods, sampling methods (Monte Carlo methods) © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 59 A B D C E ABD BCD CDE
  • 60. Learning Parameters of Bayesian Networks  Parameters  probabilities in conditional probability tables (CPTs) for all the variables in the network  Learning parameters  Assuming that the structure is fixed, i.e. designed or learned.  We need data, i.e. observed instances  Estimation based on relative frequencies from data + belief  Example: coin toss. Estimation of ‘heads’ in various ways © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 60 SEASON DRY RAINY YES ? ? NO ? ? RAIN DRY ? RAINY ? SEASON The principle of indifference: head and tail are equally probable If we tossed a coin 10,000 times and it landed heads 3373 times, we would estimate the probability of heads to be about .3373 𝑃 ℎ𝑒𝑎𝑑𝑠 = ൗ1 2 1 2
  • 61. Learning Parameters of Bayesian Networks  Learning parameters (continued)  Estimation based on relative frequencies from data + belief  Example: A-match soccer game between Korea and Japan. How, do you think, is it probable that Korean would win?  A: 0.85 (Korean), B: 0.3 (Japanese)  This probability is not a ratio, and it is not a relative frequency because the game cannot be repeated many times under the exact same conditions  Degree of belief or subjective probability  Usual method  Estimate the probability distribution of a variable X based on a relative frequency and belief concerning a relative frequency © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 61 3
  • 62. Learning Parameters of Bayesian Networks  Simple ‘counting’ solution (Bayesian point of view)  Parameter estimation of a single node  Assume local parameter independence  For a binary variable (for example, a coin toss)  prior: Beta distribution - Beta(a,b)  after we have observed m heads and N-m tails posterior - Beta(a+m,b+N-m) and 𝑃 𝑋 = ℎ𝑒𝑎𝑑 = ൗ(𝑎+𝑚) 𝑁 © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 62 (conjugacy of Beta and Binomial distributions) beta binomial
  • 63. Learning Parameters of Bayesian Networks  Simple ‘counting’ solution (Bayesian point of view)  For a multinomial variable (for example, a dice toss)  prior: Dirichlet distribution – Dirichlet(a1,a2, …, ad)  𝑃 𝑋 = 𝑘 = Τ𝑎 𝑘 𝑁 𝑁 = σ 𝑎 𝑘  Observing state i: Dirichlet(a1,…,ai+1,…, ad)  For an entire network  We simply iterate over its nodes  In the case of incomplete data  In real data, many of the variable values may be incorrect or missing  Usual approximating solution is given by Gibbs sampling or EM (expectation maximization) technique © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 63 (conjugacy of Dirichlet and Multinomial distributions)
  • 64. Learning Parameters of Bayesian Networks  Smoothing  Another viewpoint  Laplace smoothing or additive smoothing given observed counts for d states of a variable 𝑋 = (𝑥1, 𝑥2,…𝑥 𝑑)  From a Bayesian point of view, this corresponds to the expected value of the posterior distribution, using a symmetric Dirichlet distribution with parameter α as a prior.  Additive smoothing is commonly a component of naive Bayes classifiers. © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 64 𝑃 𝑋 = 𝑘 = 𝑥 𝑘 + 𝛼 𝑁 + 𝛼𝑑 𝑖 = 1, … , 𝑑 , (𝛼 = 𝛼1 = 𝛼2 = ⋯ 𝛼 𝑑)
  • 65. Learning the Graph Structure  Learning the graph structure itself from data requires  A space of possible structures  A measure that can be used to score each structure  From a Bayesian viewpoint  Tough points  Marginalization over latent variables => challenging computational problem  Exploring the space of structures can also be problematic  The # of different graph structures grows exponentially with the # of nodes  Usually we resort to heuristics  Local score based, global score based, conditional independence test based, … © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 65 : score for each model
  • 66. Bayesian Networks as Tools for AI  Learning  Extracting and encoding knowledge from data  Knowledge is represented in Probabilistic relationship among variables  Causal relationship  Network of variables  Common framework for machine learning models  Supervised and unsupervised learning  Knowledge Representation & Reasoning  Bayesian networks can be constructed from prior knowledge alone  Constructed model can be used for reasoning based on probabilistic inference methods  Expert System  Uncertain expert knowledge can be encoded into a Bayesian network  DAG in a Bayesian network is hand-constructed by domain experts  Then the conditional probabilities were assessed by the expert, learned from data, or obtained using a combination of both techniques.  Bayesian network-based expert systems are popular  Planning  In some different form, known as decision graphs or influence diagrams  We don’t cover about this direction © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 66
  • 67. Advantages of Bayesian Networks for Data Analysis  Ability to handle missing data  Because the model encodes dependencies among all variables  Learning causal relationships  Can be used to gain understanding about a problem domain  Can be used to predict the consequences of intervention  Having both causal and probabilistic semantics  It is an ideal representation for combining prior knowledge (which comes in causal form) and data  Efficient and principled approach for avoiding the overfitting of data  By Bayesian statistical methods in conjunction with Bayesian networks © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 67 (summary from the abstract of D. Heckerman’s Tutorial on BN) (Read ‘Introduction’ section for detailed explanations)
  • 68. References  K. Mohan & J. Pearl, UAI ’12 Tutorial on Graphical Models for Causal Inference  S. Roweis, MLSS ’05 Lecture on Probabilistic Graphical Models  Chapter 1, Chapter 2, Chapter 8 (Graphical Models), in Pattern Recognition and Machine Learning by C.M. Bishop, 2006.  David Heckerman, A Tutorial on Learning with Bayesian Networks.  R.E. Neapolitan, Learning Bayesian Networks, Pearson Prentice Hall, 2004. © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 68
  • 69. More Textbooks and Courses  https://www.coursera.org/course/pgm : Probabilistic Graphical Models by D. Koller © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 69
  • 70. APPENDIX © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 70
  • 71. Learning Parameters of Bayesian Networks  Case 1: X is a binary variable  F: beta distribution, X: Bernoulli or binomial distribution  Ex) F ~ Beta(a,b), then 𝑃 𝑋 = 1 = Τ𝑎 𝑁 (𝑁 = 𝑎 + 𝑏)  Case 2: X is a multinomial variable  F: Dirichlet distribution, X: multinomial distribution  Ex) F ~ Dirichlet(a1,a2, …, ad), then 𝑃 𝑋 = 𝑘 = Τ𝑎 𝑘 𝑁 𝑁 = σ 𝑎 𝑘  Laplace smoothing or additive smoothing given observed frequencies 𝑋 = (𝑥1, 𝑥2,…𝑥 𝑑) © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 71 F X The probability distribution of F represents our belief concerning the relative frequency with which X equals k. 𝑃 𝑋 = 𝑘 = 𝑥 𝑘 + 𝛼 𝑁 + 𝛼𝑑 𝑖 = 1, … , 𝑑 , (𝛼 = 𝛼1 = 𝛼2 = ⋯ 𝛼 𝑑)
  • 72. Graphical interpretation of Bayes’ theorem  Given structure:  We observe the value of y  Goal: infer the posterior distribution over x,  Marginal distribution : a prior over the latent variable x  We can evaluate the marginal distribution  By Bayes’ theorem we can calculate © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 72 ( , ) ( ) ( | )p x y p x p y x ( | )p x y ( )p x ( )p y ' ( ) ( | ') ( ') x p y p y x p x  ( | ) ( ) ( | ) ( ) p y x p x p x y p y  (a) (b) (c)
  • 73. d-separation  Directed factorization  Filtering whether can be expressed in terms of the factorization implied by the graph?  If we present to the filter the set of all possible distributions p(x) over the set of variables X, then the subset of distributions that are passed by the filter will be denoted DF (Directed Factorization)  Fully connected graph: The set DF will contain all possible distributions  Fully disconnected graph: The joint distributions which factorize into the product of the marginal distributions over the variables only. © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 73
  • 74. Gaussian distribution © 2014-2015, SNU CSE Biointelligence Lab., http://bi.snu.ac.kr 74 2 2 2 1/ 2 2 1 1 ( | , ) exp ( ) (2 ) 2 x x             N 1 / 2 1/ 2 1 1 1 ( | , ) exp ( ) ( ) (2 ) | | 2 T D           x μ Σ x μ Σ x μ Σ N Multivariate Gaussian