SlideShare a Scribd company logo
1 of 51
Download to read offline
Bayesian Network Modelling
with Examples in Genetics and Systems Biology
Marco Scutari
scutari@stats.ox.ac.uk
Department of Statistics
University of Oxford
September 29, 2016
What Are Bayesian Networks?
Marco Scutari University of Oxford
What Are Bayesian Networks?
A Graph and a Probability Distribution
Bayesian networks (BNs) are defined by:
ˆ a network structure, a directed acyclic graph G = (V, A), in which
each node vi ∈ V corresponds to a random variable Xi;
ˆ a global probability distribution X with parameters Θ, which can
be factorised into smaller local probability distributions according to
the arcs aij ∈ A present in the graph.
The main role of the network structure is to express the conditional
independence relationships among the variables in the model through
graphical separation, thus specifying the factorisation of the global
distribution:
P(X) =
p
i=1
P(Xi | ΠXi ; ΘXi ) where ΠXi = {parents of Xi}
Marco Scutari University of Oxford
What Are Bayesian Networks?
Key Books to Reference
(Best perused as ebooks, the Koller & Friedman is ≈ 21/2 inches thick.)
Marco Scutari University of Oxford
What Are Bayesian Networks?
How the DAG Maps to the Probability Distribution
C
A B
D
E
F
DAG
Graphical
separation
Probabilistic
independence
Formally, the DAG is an independence map of the probability
distribution of X, with graphical separation (⊥⊥G) implying probabilistic
independence (⊥⊥P ).
Marco Scutari University of Oxford
What Are Bayesian Networks?
Graphical Separation in DAGs (Fundamental Connections)
separation (undirected graphs)
d-separation (directed acyclic graphs)
C
A B
C
A B
C
A B
C
A B
Marco Scutari University of Oxford
What Are Bayesian Networks?
Graphical Separation in DAGs (General Case)
Now, in the general case we can extend the patterns from the
fundamental connections and apply them to every possible path between
A and B for a given C; this is how d-separation is defined.
If A, B and C are three disjoint subsets of nodes in a directed
acyclic graph G, then C is said to d-separate A from B,
denoted A ⊥⊥G B | C, if along every path between a node in
A and a node in B there is a node v satisfying one of the
following two conditions:
1. v has converging edges (i.e. there are two edges pointing
to v from the adjacent nodes in the path) and none of v
or its descendants (i.e. the nodes that can be reached
from v) are in C.
2. v is in C and does not have converging edges.
This definition clearly does not provide a computationally feasible
approach to assess d-separation; but there are other ways.
Marco Scutari University of Oxford
What Are Bayesian Networks?
A Simple Algorithm to Check D-Separation (I)
C
A B
D
E
F
C
A B
D
E
F
Say we want to check whether A and E are d-separated by B. First, we
can drop all the nodes that are not ancestors (i.e. parents, parents’
parents, etc.) of A, E and B since each node only depends on its
parents.
Marco Scutari University of Oxford
What Are Bayesian Networks?
A Simple Algorithm to Check D-Separation (II)
C
A B
E
C
A B
E
Transform the subgraph into its moral graph by
1. connecting all nodes that have one parent in common; and
2. removing all arc directions to obtain an undirected graph.
This transformation has the double effect of making the dependence
between parents explicit by “marrying” them and of allowing us to use
the classic definition of graphical separation.
Marco Scutari University of Oxford
What Are Bayesian Networks?
A Simple Algorithm to Check D-Separation (III)
C
A B
E
Finally, we can just perform e.g. a depth-first or breadth-first search and
see if we can find an open path between A and B, that is, a path that is
not blocked by C.
Marco Scutari University of Oxford
What Are Bayesian Networks?
Completely D-Separating: Markov Blankets
Parents Children
Children's other parents
(Spouses)
Markov blanket of A
A
FI
H E
D
C
B
G
We can easily use the DAG to solve
the feature selection problem. The
set of nodes that graphically
isolates a target node from the rest
of the DAG is called its Markov
blanket and includes:
ˆ its parents;
ˆ its children;
ˆ other nodes sharing a child.
Since ⊥⊥G implies ⊥⊥P , we can
restrict ourselves to the Markov
blanket to perform any kind of
inference on the target node, and
disregard the rest.
Marco Scutari University of Oxford
What Are Bayesian Networks?
Different DAGs, Same Distribution: Topological Ordering
A DAG uniquely identifies a factorisation of P(X); the converse is not
true. Consider again the DAG on the left:
P(X) = P(A) P(B) P(C | A, B) P(D | C) P(E | C) P(F | D).
We can rearrange the dependencies using Bayes theorem to obtain:
P(X) = P(A | B, C) P(B | C) P(C | D) P(D | F) P(E | C) P(F),
which gives the DAG on the right, with a different topological ordering.
C
A B
D
E
F
C
A B
D
E
F
Marco Scutari University of Oxford
What Are Bayesian Networks?
Different DAGs, Same Distribution: Equivalence Classes
On a smaller scale, even keeping the same underlying undirected graph
we can reverse a number of arcs without changing the dependence
structure of X. Since the triplets A → B → C and A ← B → C are
probabilistically equivalent, we can reverse the directions of their arcs as
we like as long as we do not create any new v-structure (A → B ← C,
with no arc between A and C).
This means that we can group DAGs into equivalence classes that are
uniquely identified by the underlying undirected graph and the
v-structures. The directions of other arcs can be either:
ˆ uniquely identifiable because one of the directions would introduce
cycles or new v-structures in the graph (compelled arcs);
ˆ completely undetermined.
Marco Scutari University of Oxford
What Are Bayesian Networks?
Completed Partially Directed Acyclic Graphs (CPDAGs)
C
A B
D
E
F
C
A B
D
E
F
C
A B
D
E
F
D
E
F
C
BA
DAG CPDAG
Marco Scutari University of Oxford
What Are Bayesian Networks?
What About the Probability Distributions?
The second component of a BN is the probability distribution P(X).
The choice should such that the BN:
ˆ can be learned efficiently from data;
ˆ is flexible (distributional assumptions should not be too strict);
ˆ is easy to query to perform inference.
The three most common choices in the literature (by far), are:
ˆ discrete BNs (DBNs), in which X and the Xi | ΠXi are
multinomial;
ˆ Gaussian BNs (GBNs), in which X is multivariate normal and the
Xi | ΠXi are univariate normal;
ˆ conditional linear Gaussian BNs (CLGBNs), in which X is a
mixture of multivariate normals and the Xi | ΠXi are either
multinomial, univariate normal or mixtures of normals.
It has been proved in the literature that exact inference is possible in
these three cases, hence their popularity.
Marco Scutari University of Oxford
What Are Bayesian Networks?
Discrete Bayesian Networks
visit to Asia? smoking?
tuberculosis? lung cancer? bronchitis?
either tuberculosis
or lung cancer?
positive X-ray?
dyspnoea?
A classic example of DBN is
the ASIA network from
Lauritzen & Spiegelhalter
(1988), which includes a
collection of binary variables.
It describes a simple
diagnostic problem for
tuberculosis and lung cancer.
Total parameters of X :
28 − 1 = 255
Marco Scutari University of Oxford
What Are Bayesian Networks?
Conditional Probability Tables (CPTs)
visit to Asia?
tuberculosis?
smoking?
lung cancer?
smoking?
bronchitis?
tuberculosis? lung cancer?
either tuberculosis
or lung cancer?
either tuberculosis
or lung cancer?
positive X-ray?
bronchitis?either tuberculosis
or lung cancer?
dyspnoea?
visit to Asia? smoking?
The local distributions
Xi | ΠXi take the form
of conditional
probability tables for
each node given all the
configurations of the
values of its parents.
Overall parameters of
the Xi | ΠXi : 18
Marco Scutari University of Oxford
What Are Bayesian Networks?
Gaussian Bayesian Networks
mechanics analysis
vectors statistics
algebra
A classic example of GBN is
the MARKS networks from
Mardia, Kent & Bibby JM
(1979), which describes the
relationships between the
marks on 5 math-related
topics.
Assuming X ∼ N(µ, Σ), we can compute Ω = Σ−1. Then Ωij = 0
implies Xi ⊥⊥P Xj | X  {X,Xj}. The absence of an arc Xi → Xj in
the DAG implies Xi ⊥⊥G Xj | X  {X,Xj}, which in turn implies
Xi ⊥⊥P Xj | X  {X,Xj}.
Total parameters of X : 5 + 15 = 20
Marco Scutari University of Oxford
What Are Bayesian Networks?
Partial Correlations and Linear Regressions
The local distributions Xi | ΠXi take the form of linear regression
models with the ΠXi acting as regressors and with independent error
terms.
ALG = 50.60 + εALG ∼ N(0, 112.8)
ANL = −3.57 + 0.99ALG + εANL ∼ N(0, 110.25)
MECH = −12.36 + 0.54ALG + 0.46VECT + εMECH ∼ N(0, 195.2)
STAT = −11.19 + 0.76ALG + 0.31ANL + εSTAT ∼ N(0, 158.8)
VECT = 12.41 + 0.75ALG + εVECT ∼ N(0, 109.8)
(That is because Ωij ∝ βj for Xi, so βj > 0 if and only if Ωij > 0. Also
Ωij ∝ ρij, the partial correlation between Xi and Xj, so we are
implicitly assuming all probabilistic dependencies are linear.)
Overall parameters of the Xi | ΠXi : 11 + 5 = 16
Marco Scutari University of Oxford
What Are Bayesian Networks?
Conditional Linear Gaussian Bayesian Networks
CLGBNs contain both discrete and continuous nodes, and combine
DBNs and GBNs as follows to obtain a mixture-of-Gaussians network:
ˆ continuous nodes cannot be parents of discrete nodes;
ˆ the local distribution of each discrete node is a CPT;
ˆ the local distribution of each continuous node is a set of linear
regression models, one for each configurations of the discrete
parents, with the continuous parents acting as regressors.
sexdrug
weight loss
(week 1)
weight loss
(week 2)
One of the classic examples is
the RATS’ WEIGHTS network
from Edwards (1995), which
describes weight loss in a drug
trial performed on rats.
Marco Scutari University of Oxford
What Are Bayesian Networks?
Mixtures of Linear Regressions
The resulting local distribution for the first weight loss for drugs D1, D2
and D3 is:
W1,D1 = 7 + εD1 ∼ N(0, 2.5)
W1,D2 = 7.50 + εD2 ∼ N(0, 2)
W1,D3 = 14.75 + εD3 ∼ N(0, 11)
with just the intercepts since the node has no continuous parents. The
local distribution for the second loss is:
W2,D1 = 1.02 + 0.89βW1 + εD1 ∼ N(0, 3.2)
W2,D2 = −1.68 + 1.35βW1 + εD2 ∼ N(0, 4)
W2,D3 = −1.83 + 0.82βW1 + εD3 ∼ N(0, 1.9)
Overall, they look like random effect models with random intercepts and
random slopes.
Marco Scutari University of Oxford
Case Study:
A Protein Signalling Network
Marco Scutari University of Oxford
Case Study: A Protein Signalling Network
Source and Overview of the Data
Causal Protein-Signalling Networks Derived from
Multiparameter Single Cell Data
Karen Sachs, et al., Science, 308, 523 (2005);
DOI: 10.1126/science.1105809
That’s a landmark paper in applying BNs because it highlights the use of
interventional data; and because results are validated using existing literature.
The data consist in the 5400 simultaneous measurements of 11 phosphorylated
proteins and phospholypids:
ˆ 1800 data subject only to general stimolatory cues, so that the protein
signalling paths are active;
ˆ 600 data with with specific stimolatory/inhibitory cues for each of the
following 4 proteins: Mek, PIP2, Akt, PKA;
ˆ 1200 data with specific cues for PKA.
The goal of the analysis is to learn what relationships link these 11 proteins,
that is, the signalling pathways they are part of.
Marco Scutari University of Oxford
Case Study: A Protein Signalling Network
Analysis and Validated Network
Akt
Erk
Jnk
Mek
P38
PIP2
PIP3
PKA
PKC
Plcg
Raf
1. Outliers were removed and the data
were discretised, since it was
impossible to model them with a
GBN.
2. A large number of DAGs were
learned and averaged to produce a
more robust model. The averaged
DAG was created using the arcs
present in at least 85% of the DAGs.
3. The validity of the averaged BN was
evaluated against established
signalling pathways from literature.
Marco Scutari University of Oxford
Case Study: A Protein Signalling Network
Bayesian Network Structure Learning
Learning a BN B = (G, Θ) from a data set D is performed in two steps:
P(B | D) = P(G, Θ | D)
learning
= P(G | D)
structure learning
· P(Θ | G, D)
parameter learning
.
In a Bayesian setting structure learning consists in finding the DAG with the
best P(G | D). We can decompose P(G | D) into
P(G | D) ∝ P(G) P(D | G) = P(G) P(D | G, Θ) P(Θ | G)dΘ
where P(G) is the prior distribution over the space of the DAGs and P(D | G)
is the marginal likelihood of the data given G averaged over all possible
parameter sets Θ; and then
P(D | G) =
N
i=1
P(Xi | ΠXi
, ΘXi
) P(ΘXi
| ΠXi
)dΘXi
.
Marco Scutari University of Oxford
Case Study: A Protein Signalling Network
The Hill-Climbing Algorithm
The most common score-based structure learning algorithm, in which
we are looking for the DAG that maximises a score such as the posterior
P(G | D) or BIC, is a greedy search such as hill-climbing:
1. Choose an initial DAG G, usually (but not necessarily) empty.
2. Compute the score of G, denoted as ScoreG = Score(G).
3. Set maxscore = ScoreG.
4. Repeat the following steps as long as maxscore increases:
4.1 for every possible arc addition, deletion or reversal not introducing
cycles:
4.1.1 compute the score of the modified DAG G∗
, ScoreG∗ = Score(G∗
):
4.1.2 if ScoreG∗ > ScoreG, set G = G∗
and ScoreG = ScoreG∗ .
4.2 update maxscore with the new value of ScoreG.
5. Return the DAG G.
Only one local distribution changes in each step, which makes algorithm
computationally efficient and easy to speed up with caching.
Marco Scutari University of Oxford
Case Study: A Protein Signalling Network
Learning Multiple DAGs from the Data
Searching from different starting points increases our coverage of the
space of the possible DAGs; the frequency with which an arc appears is
a measure of the strength of the dependence.
Marco Scutari University of Oxford
Case Study: A Protein Signalling Network
Model Averaging for DAGs0.00.20.40.60.81.0
arc strength
ECDF(arcstrength)
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
significant
arcs
estimatedthreshold
Sachs'threshold
0.0 0.2 0.4 0.6 0.8 1.0
q
q
q
qq
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
qq
q
q
q
q
qqqqqqqqq
q
qq
q
qqqqq
arc strength
ECDF(arcstrength)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.00.00.20.40.60.81.0
Arcs with significant strength can be identified using a threshold estimated
from the data by minimising the distance from the observed ECDF and the
ideal, asymptotic one (the blue area in the right panel).
Marco Scutari University of Oxford
Case Study: A Protein Signalling Network
Combining Observational and Interventional Data
model without interventions
Akt
Erk
Jnk
Mek
P38
PIP2
PIP3
PKA
PKC
Plcg
Raf
model with interventions
Akt
Erk
Jnk
Mek
P38
PIP2
PIP3
PKA
PKC
Plcg
Raf
Observations must be scored taking into account the effects of the
interventions, which break biological pathways; the overall network score is a
mixture of scores adjusted for each experiment.
Marco Scutari University of Oxford
Case Study: A Protein Signalling Network
Using The Protein Network to Plan Experiments
This idea goes by the name of hypothesis generation: using a statistical
model to decide which follow-up experiments to perform. BNs are
especially easy to use for this because they automate the computation
of arbitrary events.
P(Akt)
probability
Akt
LOW
AVG
HIGH
0.0 0.2 0.4 0.6
without intervention
with intervention
P(PKA)
probability
PKA
LOW
AVG
HIGH
0.2 0.4 0.6
without intervention
with intervention
Marco Scutari University of Oxford
Case Study: A Protein Signalling Network
Conditional Probability Queries
DBNs, GBNs, CLGBNs all have exact inference algorithms to compute
conditional probabilities for an event given some evidence. However,
approximate inference scales better and the same algorithms can be
used for all kinds of BNs. An example is likelihood weighting below.
Input: a BN B = (G, Θ), an query Q = q and a set of evidence E
Output: an approximate estimate of P(Q | E, G, Θ).
1. Order the Xi according to the topological ordering in G.
2. Set wE = 0 and wE,q = 0.
3. For a suitably large number of samples x = (x1, . . . , xp):
3.1 generate x(i), i = 1, . . . , p from X(i) | ΠX(i)
using the values
e1, . . . , ek specified by the hard evidence E for Xi1
, . . . , Xik
.
3.2 compute the weight wx = P(Xi∗ = e∗ | ΠXi∗ )
3.3 set wE = wE + wx;
3.4 if x includes Q = q , set wE,q = wE,q + wx.
4. Estimate P(Q | E, G, Θ) with wE,q/wE.
Marco Scutari University of Oxford
Case Study:
Genome-Wide Predictions
Marco Scutari University of Oxford
Case Study: Genome-Wide Predictions
MAGIC Populations: Wheat and Rice
Sequence data (e.g. SNP markers) is routinely used in statistical genetics to
understand the genetic basis of human diseases, and to breed traits of
commercial interest in plants and animals. Multiparent (MAGIC) populations
are ideal for the latter. Here we consider two:
ˆ A winter wheat population: 721 varieties, 16K markers, 7 traits.
ˆ An indica rice population: 1087 varieties, 4K markers, 10 traits.
Phenotypic traits include flowering time, height, yield, a number of disease
scores; and physical and quality traits for grains in rice.
The goal of the analysis is to find key markers controlling the traits; the causal
relationships between them; keep a good predictive accuracy.
Multiple Quantitative Trait Analysis Using Bayesian
Networks
Marco Scutari, et al., Genetics, 198, 129–137 (2014);
DOI: 10.1534/genetics.114.165704
Marco Scutari University of Oxford
Case Study: Genome-Wide Predictions
Bayesian Networks for Selection and Association Studies
If we have a set of traits and markers for each variety, all we need are
the Markov blankets of the traits; most markers are discarded in the
process. Using common sense, we can make some assumptions:
ˆ traits can depend on markers, but not vice versa;
ˆ dependencies between traits should follow the order of the
respective measurements (e.g. longitudinal traits, traits measured
before and after harvest, etc.);
ˆ dependencies in multiple kinds of genetic data (e.g. SNP + gene
expression or SNPs + methylation) should follow the central
dogma of molecular biology.
Assumptions on the direction of the dependencies allow to reduce
Markov blankets learning to learning the parents and the children of
each trait, which is a much simpler task.
Marco Scutari University of Oxford
Case Study: Genome-Wide Predictions
Parametric Assumptions
In the spirit of classic additive genetics models, we use a Gaussian BN.
Then the local distribution of each trait Ti is a linear regression model
Ti = µTi
+ ΠTi
βTi
+ εTi
= µTi
+ TjβTj
+ . . . + TkβTk
traits
+ GlβGl
+ . . . + GmβGm
markers
+εTi
and the local distribution of each marker Gi is likewise
Gi = µGi
+ ΠGi
βGi
+ εGi
=
= µGi
+ GlβGl
+ . . . + GmβGm
markers
+εGi
in which the regressors (ΠTi or ΠGi ) are treated as fixed effects. ΠTi
can be interpreted as causal effects for the traits, ΠGi as markers being
in linkage disequilibrium with each other.
Marco Scutari University of Oxford
Case Study: Genome-Wide Predictions
Learning the Bayesian Network (I)
1. Feature Selection.
1.1 Independently learn the parents and the children of each trait with
the SI-HITON-PC algorithm; children can only be other traits,
parents are mostly markers, spouses can be either. Both are selected
using the exact Student’s t test for partial correlations.
1.2 Drop all the markers that are not parents of any trait.
Parents and children of T1 Parents and children of T2 Parents and children of T3 Parents and children of T4
Redundant markers that are not in the
Markov blanket of any trait
Marco Scutari University of Oxford
Case Study: Genome-Wide Predictions
Constraint-Based Structure Learning Algorithms
C
A B
D
E
F
CPDAG
Graphical
separation
Conditional
independence tests
The mapping between edges and conditional independence relationships
lies at the core of BNs; therefore, one way to learn the structure of a
BN is to check which such relationships hold using a suitable conditional
independence test. Such an approach results in a set of conditional
independence constraints that identify a single equivalence class.
Marco Scutari University of Oxford
Case Study: Genome-Wide Predictions
The Semi-Interleaved HITON-PC Algorithm
Input: each trait Ti in turn, other traits (Tj) and all markers (Gl), a
significance threshold α.
Output: the set CPC parents and children of Ti in the BN.
1. Perform a marginal independence test between Ti and each Tj (Ti ⊥⊥ Tj)
and Gl (Ti ⊥⊥ Gl) in turn.
2. Discard all Tj and Gl whose p-values are greater than α.
3. Set CPC = {∅}.
4. For each the Tj and Gl in order of increasing p-value:
4.1 Perform a conditional independence test between Ti and Tj/Gl
conditional on all possible subsets Z of the current CPC
(Ti ⊥⊥ Tj | Z ⊆ CPC or Ti ⊥⊥ Gl | Z ⊆ CPC).
4.2 If the p-value is smaller than α for all subsets then
CPC = CPC ∪ {Tj} or CPC = CPC ∪ {Gl}.
NOTE: the algorithm is defined for a generic independence test, you can plug
in any test that is appropriate for the data.
Marco Scutari University of Oxford
Case Study: Genome-Wide Predictions
Learning the Bayesian Network (II)
2. Structure Learning. Learn the structure of the network from the nodes
selected in the previous step, setting the directions of the arcs according
to the assumptions above. The optimal structure can be identified with a
suitable goodness-of-fit criterion such as BIC. This follows the spirit of
other hybrid approaches that have shown to be well-performing in
literature.
Empty network Learned network
Marco Scutari University of Oxford
Case Study: Genome-Wide Predictions
Learning the Bayesian Network (III)
3. Parameter Learning. Learn the parameters: each local distribution is a
linear regression and the global distribution is a hierarchical linear model.
Typically least squares works well because SI-HITON-PC selects sets of
weakly correlated parents; ridge regression can be used otherwise.
Learned network Local distributions
Marco Scutari University of Oxford
Case Study: Genome-Wide Predictions
Model Averaging and Assessing Predictive Accuracy
We perform all the above in 10 runs of 10-fold cross-validation to
ˆ assess predictive accuracy with e.g. predictive correlation;
ˆ obtain a set of DAGs to produce an averaged, de-noised consensus DAG
with model averaging.
Marco Scutari University of Oxford
Case Study: Genome-Wide Predictions
WHEAT: a Bayesian Network (44 nodes, 66 arcs)
YR.GLASS
HT
YR.FIELDMIL
FT
G418
G311
G1217
G800
G866
G795
G2570
G260
G2920
G832
G1896
G2953
G266
G847
G942
G200
G257
G2208
G1373
G599
G261
G383
G1853
G1033
G1945
G1338
G1276
G1263
G1789
G2318
G1294
G1800
YLD
FUS
G1750
G524
G775
G2835
G43
PHYSICAL TRAITS
OF THE PLANT
DISEASES
Marco Scutari University of Oxford
Case Study: Genome-Wide Predictions
RICE: a Bayesian Network (64 nodes, 102 arcs)
HT
FT
AMY
GL
GW
GTEMP
G4432
G2670
G4744
G3092
G3486
G1533
G2639
G3823
G1378
G3105
G3102
G3098
G317
G5167
G1778
G3872
G4529
G1440
G5794
G4668
G2764
G457
G3862
G3964
G4109
G2089
G3219
G3209
G3222
G3212
G4573
G1311
G2949
G6003
G6010 G6123
G2815
G3049
YLD
CHALK
BROWN
SUB
G1888
G1958
G4156
G4382
G3136
G4145
G3106
G1266
G3927
G5997
G4553
G2179
G5006
G3992
G678
G3925
PHYSICAL TRAITS
OF THE PLANT
DISEASES
ABIOTIC
STRESS
PHYSICAL AND QUALITY
TRAITS OF THE GRAINS
Marco Scutari University of Oxford
Case Study: Genome-Wide Predictions
Predicting Traits for New Individuals
We can predict the traits:
1. from the averaged consensus
network;
2. from each of the 10 × 10 networks
we learn during cross-validation,
and average the predictions for
each new individual and trait.
Option 2. almost always provides better
accuracy than option 1., especially for
polygenic traits; 10 × 10 networks can
cover the genome much better, and we
have to learn them anyway.
So: averaged network for interpretation,
ensemble of networks for predictions.
cross−validatedcorrelation
0.2
0.4
0.6
0.8
YR.GLASS YLD HT YR.FIELD FUS MIL FT
AVERAGED NETWORK(α = 0.05, ρC)
AVERAGED PREDICTIONS(α = 0.05, ρC)
AVERAGED NETWORK(α = 0.05, ρG)
AVERAGED PREDICTIONS(α = 0.05, ρG)
cross−validatedcorrelation
0.2
0.4
0.6
0.8
YLD HT FT AMY GL GW CHALKGTEMP SUB BROWN
AVERAGED NETWORK(α = 0.05, ρC)
AVERAGED PREDICTIONS(α = 0.05, ρC)
AVERAGED NETWORK(α = 0.05, ρG)
AVERAGED PREDICTIONS(α = 0.05, ρG)
Marco Scutari University of Oxford
Case Study: Genome-Wide Predictions
Causal Relationships Between Traits
One of the key properties of BNs is their ability of
capturing the direction of the causal relationships
among traits in the absence of latent confounders
(the experimental design behind the data collection
should take care of a number of them).
It works because each trait will have at least one
incoming arc from the markers, say Gl → Tj, and
then (Gl →) Tj ← Tk and (Gl →) Tj → Tk are not
probabilistically equivalent. So the network can
ˆ suggest the direction of novel relationships;
ˆ confirm the direction of known relationships,
troubleshooting the experimental design and
data collection.
HT
YR.FIELD
FT
G418
1217
0
G260
G2920
832
896
G2953
G266
G847
G942
G257
G2208
G
G1338
G
G1294
G1800
YLD
G2835
YR.GLASS
HT
YRMIL
G418
G1217
0
G2570
G832
G1896
G2953
G9
G257
G2208
G1373G1945
G1338
G1800
YLD
FUS
G1750
G2835
(WHEAT)
(WHEAT)
Marco Scutari University of Oxford
Case Study: Genome-Wide Predictions
Spotting Confounding Effects
HT
G2570
G832
G1896
G2953
YLD
FUS
G2835
(WHEAT) Traits can interact in complex ways that may not
be obvious when they are studied individually,
but that can be explained by considering
neighbouring variables in the network.
An example: in the WHEAT data, the difference
in the mean YLD between the bottom and top
quartiles of the FUS disease scores is +0.08.
So apparently FUS is associated with increased YLD! What we are actually
measuring is the confounding effect of HT (FUS ← HT → YLD); conditional
on each quartile of HT, FUS has a negative effect on YLD ranging from -0.04
to -0.06. This is reassuring since it is known that susceptibility to fusarium is
positively related to HT, which in turn affects YLD.
Marco Scutari University of Oxford
Case Study: Genome-Wide Predictions
Disentangling Pleiotropic Effects
When a marker is shown to be associated to
multiple traits, we should separate its direct
and indirect effects on each of the traits.
(Especially if the traits themselves are linked!)
Take for example G1533 in the RICE data set:
it is putative causal for YLD, HT and FT.
HT
FT
G4432
G1533
G4109
YLD
(RICE)
ˆ The difference in mean between the two homozygotes is +4.5cm in HT, +2.28 weeks in
FT and +0.28 t/ha in YLD.
ˆ Controlling for YLD and FT, the difference for HT halves (+2.1cm);
ˆ Controlling for YLD and HT, the difference for FT is about the same (+2.3 weeks);
ˆ Controlling for HT and FT the difference for YLD halves (+0.16 t/ha).
So, the model suggests the marker is causal for FT and that the effect on the
other traits is partly indirect. This agrees from the p-values from an
independent association study (FT: 5.87e-28 < YLD: 4.18e-10, HT:1e-11).
Marco Scutari University of Oxford
Case Study: Genome-Wide Predictions
Identifying Causal (Sets of) Markers
Compared to additive regression models, BNs make it trivial to compute:
ˆ posterior probability of association for a marker and a trait after all the
other markers and traits are taken into account to rule out linkage
disequilibrium, confounding, pleiotropy, etc.;
ˆ expected allele counts nLOW and nHIGH for a marker and low/high values
of a set of traits (nLOW − nHIGH should be large if the marker tags a
causal variant and thus should show which allele is favourable).
G1778 G3872 G4529 G1440 G5794
small grains 0.00 0.78 0.29 0.16 0.74
large grains 2.00 0.47 0.63 0.35 0.12
G4668 G2764 G3927 G3992 G4432
small grains 0.24 0.29 0.18 0.09 0.00
large grains 0.17 0.00 0.62 0.29 0.82
small grains = bottom 10% GL, bottom 10% GW in the RICE data.
large grains = top 10% GL, top 10% GW.
Marco Scutari University of Oxford
Conclusions
Marco Scutari University of Oxford
Conclusions
Conclusions and Remarks
ˆ BNs provide an intuitive representation of the relationships linking
heterogeneous sets of variables, which we can use for qualitative and
causal reasoning.
ˆ We can learn BNs from data while including prior knowledge as
needed to improve the quality of the model.
ˆ BNs subsume a large number of probabilistic models and thus can
readily incorporate other techniques from statistics and computer
science (via information theory).
ˆ For most tasks we can start just reusing state-of-the-art, general
purpose algorithms.
ˆ Once learned, BNs provide a flexible tool for inference, while
maintaining good predictive power.
ˆ Markov blankets are a valuable tool for feature selection, and make it
possible to learn just part of the BN depending on the goals of the
analysis.
Marco Scutari University of Oxford
That’s all, Thanks!
Marco Scutari University of Oxford

More Related Content

What's hot

Lecture7 xing fei-fei
Lecture7 xing fei-feiLecture7 xing fei-fei
Lecture7 xing fei-feiTianlu Wang
 
Prototype-based models in machine learning
Prototype-based models in machine learningPrototype-based models in machine learning
Prototype-based models in machine learningUniversity of Groningen
 
Reproducibility and differential analysis with selfish
Reproducibility and differential analysis with selfishReproducibility and differential analysis with selfish
Reproducibility and differential analysis with selfishtuxette
 
About functional SIR
About functional SIRAbout functional SIR
About functional SIRtuxette
 
ProbabilisticModeling20080411
ProbabilisticModeling20080411ProbabilisticModeling20080411
ProbabilisticModeling20080411Clay Stanek
 
A review on structure learning in GNN
A review on structure learning in GNNA review on structure learning in GNN
A review on structure learning in GNNtuxette
 
Differential analyses of structures in HiC data
Differential analyses of structures in HiC dataDifferential analyses of structures in HiC data
Differential analyses of structures in HiC datatuxette
 
Graph Signal Processing for Machine Learning A Review and New Perspectives - ...
Graph Signal Processing for Machine Learning A Review and New Perspectives - ...Graph Signal Processing for Machine Learning A Review and New Perspectives - ...
Graph Signal Processing for Machine Learning A Review and New Perspectives - ...lauratoni4
 
Deep Learning for Graphs
Deep Learning for GraphsDeep Learning for Graphs
Deep Learning for GraphsDeepLearningBlr
 
Kernel methods for data integration in systems biology
Kernel methods for data integration in systems biologyKernel methods for data integration in systems biology
Kernel methods for data integration in systems biologytuxette
 
(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning
(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning
(研究会輪読) Facial Landmark Detection by Deep Multi-task LearningMasahiro Suzuki
 
Random Forest for Big Data
Random Forest for Big DataRandom Forest for Big Data
Random Forest for Big Datatuxette
 
Bridging knowledge graphs_to_generate_scene_graphs
Bridging knowledge graphs_to_generate_scene_graphsBridging knowledge graphs_to_generate_scene_graphs
Bridging knowledge graphs_to_generate_scene_graphsWoen Yon Lai
 
Kernel methods for data integration in systems biology
Kernel methods for data integration in systems biology Kernel methods for data integration in systems biology
Kernel methods for data integration in systems biology tuxette
 
a paper reading of table recognition
a paper reading of table recognitiona paper reading of table recognition
a paper reading of table recognitionNing Lu
 
Massive parallelism with gpus for centrality ranking in complex networks
Massive parallelism with gpus for centrality ranking in complex networksMassive parallelism with gpus for centrality ranking in complex networks
Massive parallelism with gpus for centrality ranking in complex networksijcsit
 
Learning from (dis)similarity data
Learning from (dis)similarity dataLearning from (dis)similarity data
Learning from (dis)similarity datatuxette
 
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...acijjournal
 
A quantum-inspired optimization heuristic for the multiple sequence alignment...
A quantum-inspired optimization heuristic for the multiple sequence alignment...A quantum-inspired optimization heuristic for the multiple sequence alignment...
A quantum-inspired optimization heuristic for the multiple sequence alignment...Konstantinos Giannakis
 

What's hot (20)

Lecture7 xing fei-fei
Lecture7 xing fei-feiLecture7 xing fei-fei
Lecture7 xing fei-fei
 
Prototype-based models in machine learning
Prototype-based models in machine learningPrototype-based models in machine learning
Prototype-based models in machine learning
 
Reproducibility and differential analysis with selfish
Reproducibility and differential analysis with selfishReproducibility and differential analysis with selfish
Reproducibility and differential analysis with selfish
 
About functional SIR
About functional SIRAbout functional SIR
About functional SIR
 
ProbabilisticModeling20080411
ProbabilisticModeling20080411ProbabilisticModeling20080411
ProbabilisticModeling20080411
 
A review on structure learning in GNN
A review on structure learning in GNNA review on structure learning in GNN
A review on structure learning in GNN
 
Differential analyses of structures in HiC data
Differential analyses of structures in HiC dataDifferential analyses of structures in HiC data
Differential analyses of structures in HiC data
 
Graph Signal Processing for Machine Learning A Review and New Perspectives - ...
Graph Signal Processing for Machine Learning A Review and New Perspectives - ...Graph Signal Processing for Machine Learning A Review and New Perspectives - ...
Graph Signal Processing for Machine Learning A Review and New Perspectives - ...
 
Deep Learning for Graphs
Deep Learning for GraphsDeep Learning for Graphs
Deep Learning for Graphs
 
Kernel methods for data integration in systems biology
Kernel methods for data integration in systems biologyKernel methods for data integration in systems biology
Kernel methods for data integration in systems biology
 
(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning
(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning
(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning
 
Random Forest for Big Data
Random Forest for Big DataRandom Forest for Big Data
Random Forest for Big Data
 
Bridging knowledge graphs_to_generate_scene_graphs
Bridging knowledge graphs_to_generate_scene_graphsBridging knowledge graphs_to_generate_scene_graphs
Bridging knowledge graphs_to_generate_scene_graphs
 
Kernel methods for data integration in systems biology
Kernel methods for data integration in systems biology Kernel methods for data integration in systems biology
Kernel methods for data integration in systems biology
 
a paper reading of table recognition
a paper reading of table recognitiona paper reading of table recognition
a paper reading of table recognition
 
Line
LineLine
Line
 
Massive parallelism with gpus for centrality ranking in complex networks
Massive parallelism with gpus for centrality ranking in complex networksMassive parallelism with gpus for centrality ranking in complex networks
Massive parallelism with gpus for centrality ranking in complex networks
 
Learning from (dis)similarity data
Learning from (dis)similarity dataLearning from (dis)similarity data
Learning from (dis)similarity data
 
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...
 
A quantum-inspired optimization heuristic for the multiple sequence alignment...
A quantum-inspired optimization heuristic for the multiple sequence alignment...A quantum-inspired optimization heuristic for the multiple sequence alignment...
A quantum-inspired optimization heuristic for the multiple sequence alignment...
 

Viewers also liked

Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Bayes Nets meetup London
 
Twitter Author Prediction from Tweets using Bayesian Network
Twitter Author Prediction from Tweets using Bayesian NetworkTwitter Author Prediction from Tweets using Bayesian Network
Twitter Author Prediction from Tweets using Bayesian NetworkHendy Irawan
 
Bayesian networks and the search for causality
Bayesian networks and the search for causalityBayesian networks and the search for causality
Bayesian networks and the search for causalityBayes Nets meetup London
 
Ralf Herbrich - Introduction to Graphical models in Industry
Ralf Herbrich - Introduction to Graphical models in IndustryRalf Herbrich - Introduction to Graphical models in Industry
Ralf Herbrich - Introduction to Graphical models in IndustryBayes Nets meetup London
 
David Barber - Deep Nets, Bayes and the story of AI
David Barber - Deep Nets, Bayes and the story of AIDavid Barber - Deep Nets, Bayes and the story of AI
David Barber - Deep Nets, Bayes and the story of AIBayes Nets meetup London
 
Understanding your data with Bayesian networks (in Python) by Bartek Wilczyns...
Understanding your data with Bayesian networks (in Python) by Bartek Wilczyns...Understanding your data with Bayesian networks (in Python) by Bartek Wilczyns...
Understanding your data with Bayesian networks (in Python) by Bartek Wilczyns...PyData
 

Viewers also liked (6)

Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
 
Twitter Author Prediction from Tweets using Bayesian Network
Twitter Author Prediction from Tweets using Bayesian NetworkTwitter Author Prediction from Tweets using Bayesian Network
Twitter Author Prediction from Tweets using Bayesian Network
 
Bayesian networks and the search for causality
Bayesian networks and the search for causalityBayesian networks and the search for causality
Bayesian networks and the search for causality
 
Ralf Herbrich - Introduction to Graphical models in Industry
Ralf Herbrich - Introduction to Graphical models in IndustryRalf Herbrich - Introduction to Graphical models in Industry
Ralf Herbrich - Introduction to Graphical models in Industry
 
David Barber - Deep Nets, Bayes and the story of AI
David Barber - Deep Nets, Bayes and the story of AIDavid Barber - Deep Nets, Bayes and the story of AI
David Barber - Deep Nets, Bayes and the story of AI
 
Understanding your data with Bayesian networks (in Python) by Bartek Wilczyns...
Understanding your data with Bayesian networks (in Python) by Bartek Wilczyns...Understanding your data with Bayesian networks (in Python) by Bartek Wilczyns...
Understanding your data with Bayesian networks (in Python) by Bartek Wilczyns...
 

Similar to Bayes Nets Meetup Sept 29th 2016 - Bayesian Network Modelling by Marco Scutari

Bayesian Networks - A Brief Introduction
Bayesian Networks - A Brief IntroductionBayesian Networks - A Brief Introduction
Bayesian Networks - A Brief IntroductionAdnan Masood
 
712201907
712201907712201907
712201907IJRAT
 
CORRELATION AND REGRESSION ANALYSIS FOR NODE BETWEENNESS CENTRALITY
CORRELATION AND REGRESSION ANALYSIS FOR NODE BETWEENNESS CENTRALITYCORRELATION AND REGRESSION ANALYSIS FOR NODE BETWEENNESS CENTRALITY
CORRELATION AND REGRESSION ANALYSIS FOR NODE BETWEENNESS CENTRALITYijfcstjournal
 
A simple and fast exact clustering algorithm defined for complex
A simple and fast exact clustering algorithm defined for complexA simple and fast exact clustering algorithm defined for complex
A simple and fast exact clustering algorithm defined for complexAlexander Decker
 
A simple and fast exact clustering algorithm defined for complex
A simple and fast exact clustering algorithm defined for complexA simple and fast exact clustering algorithm defined for complex
A simple and fast exact clustering algorithm defined for complexAlexander Decker
 
ALTERNATIVES TO BETWEENNESS CENTRALITY: A MEASURE OF CORRELATION COEFFICIENT
ALTERNATIVES TO BETWEENNESS CENTRALITY: A MEASURE OF CORRELATION COEFFICIENTALTERNATIVES TO BETWEENNESS CENTRALITY: A MEASURE OF CORRELATION COEFFICIENT
ALTERNATIVES TO BETWEENNESS CENTRALITY: A MEASURE OF CORRELATION COEFFICIENTcsandit
 
Probabilistic Reasoning
Probabilistic ReasoningProbabilistic Reasoning
Probabilistic ReasoningJunya Tanaka
 
Medicinal Applications of Quantum Computing
Medicinal Applications of Quantum ComputingMedicinal Applications of Quantum Computing
Medicinal Applications of Quantum ComputingbGeniusLLC
 
Xtc a practical topology control algorithm for ad hoc networks (synopsis)
Xtc a practical topology control algorithm for ad hoc networks (synopsis)Xtc a practical topology control algorithm for ad hoc networks (synopsis)
Xtc a practical topology control algorithm for ad hoc networks (synopsis)Mumbai Academisc
 
Bayesian network based modeling and reliability analysis of quantum cellular
Bayesian network based modeling and reliability analysis of quantum cellularBayesian network based modeling and reliability analysis of quantum cellular
Bayesian network based modeling and reliability analysis of quantum cellularIAEME Publication
 
Lecture 5b graphs and hashing
Lecture 5b graphs and hashingLecture 5b graphs and hashing
Lecture 5b graphs and hashingVictor Palmar
 
Paper Explained: Understanding the wiring evolution in differentiable neural ...
Paper Explained: Understanding the wiring evolution in differentiable neural ...Paper Explained: Understanding the wiring evolution in differentiable neural ...
Paper Explained: Understanding the wiring evolution in differentiable neural ...Devansh16
 
Finding Neighbors in Images Represented By Quadtree
Finding Neighbors in Images Represented By QuadtreeFinding Neighbors in Images Represented By Quadtree
Finding Neighbors in Images Represented By Quadtreeiosrjce
 

Similar to Bayes Nets Meetup Sept 29th 2016 - Bayesian Network Modelling by Marco Scutari (20)

Bayesian Networks - A Brief Introduction
Bayesian Networks - A Brief IntroductionBayesian Networks - A Brief Introduction
Bayesian Networks - A Brief Introduction
 
Lecture10 xing
Lecture10 xingLecture10 xing
Lecture10 xing
 
712201907
712201907712201907
712201907
 
CORRELATION AND REGRESSION ANALYSIS FOR NODE BETWEENNESS CENTRALITY
CORRELATION AND REGRESSION ANALYSIS FOR NODE BETWEENNESS CENTRALITYCORRELATION AND REGRESSION ANALYSIS FOR NODE BETWEENNESS CENTRALITY
CORRELATION AND REGRESSION ANALYSIS FOR NODE BETWEENNESS CENTRALITY
 
Causal Bayesian Networks
Causal Bayesian NetworksCausal Bayesian Networks
Causal Bayesian Networks
 
A simple and fast exact clustering algorithm defined for complex
A simple and fast exact clustering algorithm defined for complexA simple and fast exact clustering algorithm defined for complex
A simple and fast exact clustering algorithm defined for complex
 
A simple and fast exact clustering algorithm defined for complex
A simple and fast exact clustering algorithm defined for complexA simple and fast exact clustering algorithm defined for complex
A simple and fast exact clustering algorithm defined for complex
 
Project3.ppt
Project3.pptProject3.ppt
Project3.ppt
 
Bayesian network
Bayesian networkBayesian network
Bayesian network
 
ALTERNATIVES TO BETWEENNESS CENTRALITY: A MEASURE OF CORRELATION COEFFICIENT
ALTERNATIVES TO BETWEENNESS CENTRALITY: A MEASURE OF CORRELATION COEFFICIENTALTERNATIVES TO BETWEENNESS CENTRALITY: A MEASURE OF CORRELATION COEFFICIENT
ALTERNATIVES TO BETWEENNESS CENTRALITY: A MEASURE OF CORRELATION COEFFICIENT
 
Probabilistic Reasoning
Probabilistic ReasoningProbabilistic Reasoning
Probabilistic Reasoning
 
Medicinal Applications of Quantum Computing
Medicinal Applications of Quantum ComputingMedicinal Applications of Quantum Computing
Medicinal Applications of Quantum Computing
 
Xtc a practical topology control algorithm for ad hoc networks (synopsis)
Xtc a practical topology control algorithm for ad hoc networks (synopsis)Xtc a practical topology control algorithm for ad hoc networks (synopsis)
Xtc a practical topology control algorithm for ad hoc networks (synopsis)
 
Bayesian network based modeling and reliability analysis of quantum cellular
Bayesian network based modeling and reliability analysis of quantum cellularBayesian network based modeling and reliability analysis of quantum cellular
Bayesian network based modeling and reliability analysis of quantum cellular
 
Lecture 5b graphs and hashing
Lecture 5b graphs and hashingLecture 5b graphs and hashing
Lecture 5b graphs and hashing
 
Paper Explained: Understanding the wiring evolution in differentiable neural ...
Paper Explained: Understanding the wiring evolution in differentiable neural ...Paper Explained: Understanding the wiring evolution in differentiable neural ...
Paper Explained: Understanding the wiring evolution in differentiable neural ...
 
AI Lesson 28
AI Lesson 28AI Lesson 28
AI Lesson 28
 
Lesson 28
Lesson 28Lesson 28
Lesson 28
 
Finding Neighbors in Images Represented By Quadtree
Finding Neighbors in Images Represented By QuadtreeFinding Neighbors in Images Represented By Quadtree
Finding Neighbors in Images Represented By Quadtree
 
Entropy 19-00079
Entropy 19-00079Entropy 19-00079
Entropy 19-00079
 

Recently uploaded

Physical pharmacy unit 1 solubility of drugs.pptx
Physical pharmacy unit 1 solubility of drugs.pptxPhysical pharmacy unit 1 solubility of drugs.pptx
Physical pharmacy unit 1 solubility of drugs.pptxchetanvgh
 
J. Jeong, AAAI 2024, MLILAB, KAIST AI..
J. Jeong,  AAAI 2024, MLILAB, KAIST AI..J. Jeong,  AAAI 2024, MLILAB, KAIST AI..
J. Jeong, AAAI 2024, MLILAB, KAIST AI..MLILAB
 
chapter 28 Reproductive System Power Point
chapter 28 Reproductive System Power Pointchapter 28 Reproductive System Power Point
chapter 28 Reproductive System Power Pointchelseakland
 
DEVIATION OF REAL GAS FROM IDEAL BEHAVIOUR.pptx
DEVIATION OF REAL GAS FROM IDEAL BEHAVIOUR.pptxDEVIATION OF REAL GAS FROM IDEAL BEHAVIOUR.pptx
DEVIATION OF REAL GAS FROM IDEAL BEHAVIOUR.pptxAiswarya47
 
Principles of formulation of oral care products .pptx
Principles of formulation of oral care products .pptxPrinciples of formulation of oral care products .pptx
Principles of formulation of oral care products .pptxsakshinakade52
 
First Direct Imaging of a Kelvin–Helmholtz Instability by PSP/WISPR
First Direct Imaging of a Kelvin–Helmholtz Instability by PSP/WISPRFirst Direct Imaging of a Kelvin–Helmholtz Instability by PSP/WISPR
First Direct Imaging of a Kelvin–Helmholtz Instability by PSP/WISPRSérgio Sacani
 
agrarian-reforms-and-friar-lands (1).pptx
agrarian-reforms-and-friar-lands (1).pptxagrarian-reforms-and-friar-lands (1).pptx
agrarian-reforms-and-friar-lands (1).pptxMarvinAlegado
 
The Sun’s differential rotation is controlled by high- latitude baroclinicall...
The Sun’s differential rotation is controlled by high- latitude baroclinicall...The Sun’s differential rotation is controlled by high- latitude baroclinicall...
The Sun’s differential rotation is controlled by high- latitude baroclinicall...Sérgio Sacani
 
GNU Linux - Introduction and Administration
GNU Linux - Introduction and AdministrationGNU Linux - Introduction and Administration
GNU Linux - Introduction and AdministrationXavier de Pedro
 
igcse_edexcel_math-_gold_qp1_-igcse_9-1_.pdf
igcse_edexcel_math-_gold_qp1_-igcse_9-1_.pdfigcse_edexcel_math-_gold_qp1_-igcse_9-1_.pdf
igcse_edexcel_math-_gold_qp1_-igcse_9-1_.pdfTeenaSheikh
 
Anatomy of Duodenum- detailed ppt presentation for science students
Anatomy of Duodenum- detailed ppt presentation for science studentsAnatomy of Duodenum- detailed ppt presentation for science students
Anatomy of Duodenum- detailed ppt presentation for science studentsgargipatil3210
 
Presentation about adversarial image attacks
Presentation about adversarial image attacksPresentation about adversarial image attacks
Presentation about adversarial image attacksKoshinKhodiyar
 
Monggo Seeds Experiment Day 1 to Day 7.pdf
Monggo Seeds Experiment Day 1 to Day 7.pdfMonggo Seeds Experiment Day 1 to Day 7.pdf
Monggo Seeds Experiment Day 1 to Day 7.pdfRoseAnnAquino8
 
Myofunctional treatments Myobrace treatments and protocols
Myofunctional treatments  Myobrace treatments and protocolsMyofunctional treatments  Myobrace treatments and protocols
Myofunctional treatments Myobrace treatments and protocolsnjengakelvin23
 
The SAMI Galaxy Sur v ey: galaxy spin is more strongly correlated with stella...
The SAMI Galaxy Sur v ey: galaxy spin is more strongly correlated with stella...The SAMI Galaxy Sur v ey: galaxy spin is more strongly correlated with stella...
The SAMI Galaxy Sur v ey: galaxy spin is more strongly correlated with stella...Sérgio Sacani
 
Angiogenesis Use Cases Solved with IKOSA
Angiogenesis Use Cases Solved with IKOSAAngiogenesis Use Cases Solved with IKOSA
Angiogenesis Use Cases Solved with IKOSAKML Vision
 
Physical Science Module 1 -Stellar Nucleosynthesis.pdf
Physical Science Module 1 -Stellar Nucleosynthesis.pdfPhysical Science Module 1 -Stellar Nucleosynthesis.pdf
Physical Science Module 1 -Stellar Nucleosynthesis.pdfFrancisCursoDuque
 

Recently uploaded (20)

Physical pharmacy unit 1 solubility of drugs.pptx
Physical pharmacy unit 1 solubility of drugs.pptxPhysical pharmacy unit 1 solubility of drugs.pptx
Physical pharmacy unit 1 solubility of drugs.pptx
 
J. Jeong, AAAI 2024, MLILAB, KAIST AI..
J. Jeong,  AAAI 2024, MLILAB, KAIST AI..J. Jeong,  AAAI 2024, MLILAB, KAIST AI..
J. Jeong, AAAI 2024, MLILAB, KAIST AI..
 
chapter 28 Reproductive System Power Point
chapter 28 Reproductive System Power Pointchapter 28 Reproductive System Power Point
chapter 28 Reproductive System Power Point
 
DEVIATION OF REAL GAS FROM IDEAL BEHAVIOUR.pptx
DEVIATION OF REAL GAS FROM IDEAL BEHAVIOUR.pptxDEVIATION OF REAL GAS FROM IDEAL BEHAVIOUR.pptx
DEVIATION OF REAL GAS FROM IDEAL BEHAVIOUR.pptx
 
Principles of formulation of oral care products .pptx
Principles of formulation of oral care products .pptxPrinciples of formulation of oral care products .pptx
Principles of formulation of oral care products .pptx
 
First Direct Imaging of a Kelvin–Helmholtz Instability by PSP/WISPR
First Direct Imaging of a Kelvin–Helmholtz Instability by PSP/WISPRFirst Direct Imaging of a Kelvin–Helmholtz Instability by PSP/WISPR
First Direct Imaging of a Kelvin–Helmholtz Instability by PSP/WISPR
 
agrarian-reforms-and-friar-lands (1).pptx
agrarian-reforms-and-friar-lands (1).pptxagrarian-reforms-and-friar-lands (1).pptx
agrarian-reforms-and-friar-lands (1).pptx
 
The Sun’s differential rotation is controlled by high- latitude baroclinicall...
The Sun’s differential rotation is controlled by high- latitude baroclinicall...The Sun’s differential rotation is controlled by high- latitude baroclinicall...
The Sun’s differential rotation is controlled by high- latitude baroclinicall...
 
GNU Linux - Introduction and Administration
GNU Linux - Introduction and AdministrationGNU Linux - Introduction and Administration
GNU Linux - Introduction and Administration
 
igcse_edexcel_math-_gold_qp1_-igcse_9-1_.pdf
igcse_edexcel_math-_gold_qp1_-igcse_9-1_.pdfigcse_edexcel_math-_gold_qp1_-igcse_9-1_.pdf
igcse_edexcel_math-_gold_qp1_-igcse_9-1_.pdf
 
Anatomy of Duodenum- detailed ppt presentation for science students
Anatomy of Duodenum- detailed ppt presentation for science studentsAnatomy of Duodenum- detailed ppt presentation for science students
Anatomy of Duodenum- detailed ppt presentation for science students
 
Presentation about adversarial image attacks
Presentation about adversarial image attacksPresentation about adversarial image attacks
Presentation about adversarial image attacks
 
Monggo Seeds Experiment Day 1 to Day 7.pdf
Monggo Seeds Experiment Day 1 to Day 7.pdfMonggo Seeds Experiment Day 1 to Day 7.pdf
Monggo Seeds Experiment Day 1 to Day 7.pdf
 
Myofunctional treatments Myobrace treatments and protocols
Myofunctional treatments  Myobrace treatments and protocolsMyofunctional treatments  Myobrace treatments and protocols
Myofunctional treatments Myobrace treatments and protocols
 
The SAMI Galaxy Sur v ey: galaxy spin is more strongly correlated with stella...
The SAMI Galaxy Sur v ey: galaxy spin is more strongly correlated with stella...The SAMI Galaxy Sur v ey: galaxy spin is more strongly correlated with stella...
The SAMI Galaxy Sur v ey: galaxy spin is more strongly correlated with stella...
 
Solid - Phase Extraction By GUNTI SHASHIKANTH
Solid - Phase Extraction By GUNTI SHASHIKANTHSolid - Phase Extraction By GUNTI SHASHIKANTH
Solid - Phase Extraction By GUNTI SHASHIKANTH
 
Angiogenesis Use Cases Solved with IKOSA
Angiogenesis Use Cases Solved with IKOSAAngiogenesis Use Cases Solved with IKOSA
Angiogenesis Use Cases Solved with IKOSA
 
Battery Reasearch Reagents from TCI Chemicals
Battery Reasearch Reagents from TCI ChemicalsBattery Reasearch Reagents from TCI Chemicals
Battery Reasearch Reagents from TCI Chemicals
 
Physical Science Module 1 -Stellar Nucleosynthesis.pdf
Physical Science Module 1 -Stellar Nucleosynthesis.pdfPhysical Science Module 1 -Stellar Nucleosynthesis.pdf
Physical Science Module 1 -Stellar Nucleosynthesis.pdf
 
Principles and Rules of ICBN, IBC, The Hisory of ICBN
Principles and Rules of ICBN, IBC, The Hisory of ICBNPrinciples and Rules of ICBN, IBC, The Hisory of ICBN
Principles and Rules of ICBN, IBC, The Hisory of ICBN
 

Bayes Nets Meetup Sept 29th 2016 - Bayesian Network Modelling by Marco Scutari

  • 1. Bayesian Network Modelling with Examples in Genetics and Systems Biology Marco Scutari scutari@stats.ox.ac.uk Department of Statistics University of Oxford September 29, 2016
  • 2. What Are Bayesian Networks? Marco Scutari University of Oxford
  • 3. What Are Bayesian Networks? A Graph and a Probability Distribution Bayesian networks (BNs) are defined by: ˆ a network structure, a directed acyclic graph G = (V, A), in which each node vi ∈ V corresponds to a random variable Xi; ˆ a global probability distribution X with parameters Θ, which can be factorised into smaller local probability distributions according to the arcs aij ∈ A present in the graph. The main role of the network structure is to express the conditional independence relationships among the variables in the model through graphical separation, thus specifying the factorisation of the global distribution: P(X) = p i=1 P(Xi | ΠXi ; ΘXi ) where ΠXi = {parents of Xi} Marco Scutari University of Oxford
  • 4. What Are Bayesian Networks? Key Books to Reference (Best perused as ebooks, the Koller & Friedman is ≈ 21/2 inches thick.) Marco Scutari University of Oxford
  • 5. What Are Bayesian Networks? How the DAG Maps to the Probability Distribution C A B D E F DAG Graphical separation Probabilistic independence Formally, the DAG is an independence map of the probability distribution of X, with graphical separation (⊥⊥G) implying probabilistic independence (⊥⊥P ). Marco Scutari University of Oxford
  • 6. What Are Bayesian Networks? Graphical Separation in DAGs (Fundamental Connections) separation (undirected graphs) d-separation (directed acyclic graphs) C A B C A B C A B C A B Marco Scutari University of Oxford
  • 7. What Are Bayesian Networks? Graphical Separation in DAGs (General Case) Now, in the general case we can extend the patterns from the fundamental connections and apply them to every possible path between A and B for a given C; this is how d-separation is defined. If A, B and C are three disjoint subsets of nodes in a directed acyclic graph G, then C is said to d-separate A from B, denoted A ⊥⊥G B | C, if along every path between a node in A and a node in B there is a node v satisfying one of the following two conditions: 1. v has converging edges (i.e. there are two edges pointing to v from the adjacent nodes in the path) and none of v or its descendants (i.e. the nodes that can be reached from v) are in C. 2. v is in C and does not have converging edges. This definition clearly does not provide a computationally feasible approach to assess d-separation; but there are other ways. Marco Scutari University of Oxford
  • 8. What Are Bayesian Networks? A Simple Algorithm to Check D-Separation (I) C A B D E F C A B D E F Say we want to check whether A and E are d-separated by B. First, we can drop all the nodes that are not ancestors (i.e. parents, parents’ parents, etc.) of A, E and B since each node only depends on its parents. Marco Scutari University of Oxford
  • 9. What Are Bayesian Networks? A Simple Algorithm to Check D-Separation (II) C A B E C A B E Transform the subgraph into its moral graph by 1. connecting all nodes that have one parent in common; and 2. removing all arc directions to obtain an undirected graph. This transformation has the double effect of making the dependence between parents explicit by “marrying” them and of allowing us to use the classic definition of graphical separation. Marco Scutari University of Oxford
  • 10. What Are Bayesian Networks? A Simple Algorithm to Check D-Separation (III) C A B E Finally, we can just perform e.g. a depth-first or breadth-first search and see if we can find an open path between A and B, that is, a path that is not blocked by C. Marco Scutari University of Oxford
  • 11. What Are Bayesian Networks? Completely D-Separating: Markov Blankets Parents Children Children's other parents (Spouses) Markov blanket of A A FI H E D C B G We can easily use the DAG to solve the feature selection problem. The set of nodes that graphically isolates a target node from the rest of the DAG is called its Markov blanket and includes: ˆ its parents; ˆ its children; ˆ other nodes sharing a child. Since ⊥⊥G implies ⊥⊥P , we can restrict ourselves to the Markov blanket to perform any kind of inference on the target node, and disregard the rest. Marco Scutari University of Oxford
  • 12. What Are Bayesian Networks? Different DAGs, Same Distribution: Topological Ordering A DAG uniquely identifies a factorisation of P(X); the converse is not true. Consider again the DAG on the left: P(X) = P(A) P(B) P(C | A, B) P(D | C) P(E | C) P(F | D). We can rearrange the dependencies using Bayes theorem to obtain: P(X) = P(A | B, C) P(B | C) P(C | D) P(D | F) P(E | C) P(F), which gives the DAG on the right, with a different topological ordering. C A B D E F C A B D E F Marco Scutari University of Oxford
  • 13. What Are Bayesian Networks? Different DAGs, Same Distribution: Equivalence Classes On a smaller scale, even keeping the same underlying undirected graph we can reverse a number of arcs without changing the dependence structure of X. Since the triplets A → B → C and A ← B → C are probabilistically equivalent, we can reverse the directions of their arcs as we like as long as we do not create any new v-structure (A → B ← C, with no arc between A and C). This means that we can group DAGs into equivalence classes that are uniquely identified by the underlying undirected graph and the v-structures. The directions of other arcs can be either: ˆ uniquely identifiable because one of the directions would introduce cycles or new v-structures in the graph (compelled arcs); ˆ completely undetermined. Marco Scutari University of Oxford
  • 14. What Are Bayesian Networks? Completed Partially Directed Acyclic Graphs (CPDAGs) C A B D E F C A B D E F C A B D E F D E F C BA DAG CPDAG Marco Scutari University of Oxford
  • 15. What Are Bayesian Networks? What About the Probability Distributions? The second component of a BN is the probability distribution P(X). The choice should such that the BN: ˆ can be learned efficiently from data; ˆ is flexible (distributional assumptions should not be too strict); ˆ is easy to query to perform inference. The three most common choices in the literature (by far), are: ˆ discrete BNs (DBNs), in which X and the Xi | ΠXi are multinomial; ˆ Gaussian BNs (GBNs), in which X is multivariate normal and the Xi | ΠXi are univariate normal; ˆ conditional linear Gaussian BNs (CLGBNs), in which X is a mixture of multivariate normals and the Xi | ΠXi are either multinomial, univariate normal or mixtures of normals. It has been proved in the literature that exact inference is possible in these three cases, hence their popularity. Marco Scutari University of Oxford
  • 16. What Are Bayesian Networks? Discrete Bayesian Networks visit to Asia? smoking? tuberculosis? lung cancer? bronchitis? either tuberculosis or lung cancer? positive X-ray? dyspnoea? A classic example of DBN is the ASIA network from Lauritzen & Spiegelhalter (1988), which includes a collection of binary variables. It describes a simple diagnostic problem for tuberculosis and lung cancer. Total parameters of X : 28 − 1 = 255 Marco Scutari University of Oxford
  • 17. What Are Bayesian Networks? Conditional Probability Tables (CPTs) visit to Asia? tuberculosis? smoking? lung cancer? smoking? bronchitis? tuberculosis? lung cancer? either tuberculosis or lung cancer? either tuberculosis or lung cancer? positive X-ray? bronchitis?either tuberculosis or lung cancer? dyspnoea? visit to Asia? smoking? The local distributions Xi | ΠXi take the form of conditional probability tables for each node given all the configurations of the values of its parents. Overall parameters of the Xi | ΠXi : 18 Marco Scutari University of Oxford
  • 18. What Are Bayesian Networks? Gaussian Bayesian Networks mechanics analysis vectors statistics algebra A classic example of GBN is the MARKS networks from Mardia, Kent & Bibby JM (1979), which describes the relationships between the marks on 5 math-related topics. Assuming X ∼ N(µ, Σ), we can compute Ω = Σ−1. Then Ωij = 0 implies Xi ⊥⊥P Xj | X {X,Xj}. The absence of an arc Xi → Xj in the DAG implies Xi ⊥⊥G Xj | X {X,Xj}, which in turn implies Xi ⊥⊥P Xj | X {X,Xj}. Total parameters of X : 5 + 15 = 20 Marco Scutari University of Oxford
  • 19. What Are Bayesian Networks? Partial Correlations and Linear Regressions The local distributions Xi | ΠXi take the form of linear regression models with the ΠXi acting as regressors and with independent error terms. ALG = 50.60 + εALG ∼ N(0, 112.8) ANL = −3.57 + 0.99ALG + εANL ∼ N(0, 110.25) MECH = −12.36 + 0.54ALG + 0.46VECT + εMECH ∼ N(0, 195.2) STAT = −11.19 + 0.76ALG + 0.31ANL + εSTAT ∼ N(0, 158.8) VECT = 12.41 + 0.75ALG + εVECT ∼ N(0, 109.8) (That is because Ωij ∝ βj for Xi, so βj > 0 if and only if Ωij > 0. Also Ωij ∝ ρij, the partial correlation between Xi and Xj, so we are implicitly assuming all probabilistic dependencies are linear.) Overall parameters of the Xi | ΠXi : 11 + 5 = 16 Marco Scutari University of Oxford
  • 20. What Are Bayesian Networks? Conditional Linear Gaussian Bayesian Networks CLGBNs contain both discrete and continuous nodes, and combine DBNs and GBNs as follows to obtain a mixture-of-Gaussians network: ˆ continuous nodes cannot be parents of discrete nodes; ˆ the local distribution of each discrete node is a CPT; ˆ the local distribution of each continuous node is a set of linear regression models, one for each configurations of the discrete parents, with the continuous parents acting as regressors. sexdrug weight loss (week 1) weight loss (week 2) One of the classic examples is the RATS’ WEIGHTS network from Edwards (1995), which describes weight loss in a drug trial performed on rats. Marco Scutari University of Oxford
  • 21. What Are Bayesian Networks? Mixtures of Linear Regressions The resulting local distribution for the first weight loss for drugs D1, D2 and D3 is: W1,D1 = 7 + εD1 ∼ N(0, 2.5) W1,D2 = 7.50 + εD2 ∼ N(0, 2) W1,D3 = 14.75 + εD3 ∼ N(0, 11) with just the intercepts since the node has no continuous parents. The local distribution for the second loss is: W2,D1 = 1.02 + 0.89βW1 + εD1 ∼ N(0, 3.2) W2,D2 = −1.68 + 1.35βW1 + εD2 ∼ N(0, 4) W2,D3 = −1.83 + 0.82βW1 + εD3 ∼ N(0, 1.9) Overall, they look like random effect models with random intercepts and random slopes. Marco Scutari University of Oxford
  • 22. Case Study: A Protein Signalling Network Marco Scutari University of Oxford
  • 23. Case Study: A Protein Signalling Network Source and Overview of the Data Causal Protein-Signalling Networks Derived from Multiparameter Single Cell Data Karen Sachs, et al., Science, 308, 523 (2005); DOI: 10.1126/science.1105809 That’s a landmark paper in applying BNs because it highlights the use of interventional data; and because results are validated using existing literature. The data consist in the 5400 simultaneous measurements of 11 phosphorylated proteins and phospholypids: ˆ 1800 data subject only to general stimolatory cues, so that the protein signalling paths are active; ˆ 600 data with with specific stimolatory/inhibitory cues for each of the following 4 proteins: Mek, PIP2, Akt, PKA; ˆ 1200 data with specific cues for PKA. The goal of the analysis is to learn what relationships link these 11 proteins, that is, the signalling pathways they are part of. Marco Scutari University of Oxford
  • 24. Case Study: A Protein Signalling Network Analysis and Validated Network Akt Erk Jnk Mek P38 PIP2 PIP3 PKA PKC Plcg Raf 1. Outliers were removed and the data were discretised, since it was impossible to model them with a GBN. 2. A large number of DAGs were learned and averaged to produce a more robust model. The averaged DAG was created using the arcs present in at least 85% of the DAGs. 3. The validity of the averaged BN was evaluated against established signalling pathways from literature. Marco Scutari University of Oxford
  • 25. Case Study: A Protein Signalling Network Bayesian Network Structure Learning Learning a BN B = (G, Θ) from a data set D is performed in two steps: P(B | D) = P(G, Θ | D) learning = P(G | D) structure learning · P(Θ | G, D) parameter learning . In a Bayesian setting structure learning consists in finding the DAG with the best P(G | D). We can decompose P(G | D) into P(G | D) ∝ P(G) P(D | G) = P(G) P(D | G, Θ) P(Θ | G)dΘ where P(G) is the prior distribution over the space of the DAGs and P(D | G) is the marginal likelihood of the data given G averaged over all possible parameter sets Θ; and then P(D | G) = N i=1 P(Xi | ΠXi , ΘXi ) P(ΘXi | ΠXi )dΘXi . Marco Scutari University of Oxford
  • 26. Case Study: A Protein Signalling Network The Hill-Climbing Algorithm The most common score-based structure learning algorithm, in which we are looking for the DAG that maximises a score such as the posterior P(G | D) or BIC, is a greedy search such as hill-climbing: 1. Choose an initial DAG G, usually (but not necessarily) empty. 2. Compute the score of G, denoted as ScoreG = Score(G). 3. Set maxscore = ScoreG. 4. Repeat the following steps as long as maxscore increases: 4.1 for every possible arc addition, deletion or reversal not introducing cycles: 4.1.1 compute the score of the modified DAG G∗ , ScoreG∗ = Score(G∗ ): 4.1.2 if ScoreG∗ > ScoreG, set G = G∗ and ScoreG = ScoreG∗ . 4.2 update maxscore with the new value of ScoreG. 5. Return the DAG G. Only one local distribution changes in each step, which makes algorithm computationally efficient and easy to speed up with caching. Marco Scutari University of Oxford
  • 27. Case Study: A Protein Signalling Network Learning Multiple DAGs from the Data Searching from different starting points increases our coverage of the space of the possible DAGs; the frequency with which an arc appears is a measure of the strength of the dependence. Marco Scutari University of Oxford
  • 28. Case Study: A Protein Signalling Network Model Averaging for DAGs0.00.20.40.60.81.0 arc strength ECDF(arcstrength) q q q q q q q q q q q q q q q q q q q q q q q q q q q q significant arcs estimatedthreshold Sachs'threshold 0.0 0.2 0.4 0.6 0.8 1.0 q q q qq q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q qq q q q q q q q q q q q qq q q q q q q qq q q q q q q q q q q q q q q q q q q q q qq q qq q q q q qqqqqqqqq q qq q qqqqq arc strength ECDF(arcstrength) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 0.2 0.4 0.6 0.8 1.00.00.20.40.60.81.0 Arcs with significant strength can be identified using a threshold estimated from the data by minimising the distance from the observed ECDF and the ideal, asymptotic one (the blue area in the right panel). Marco Scutari University of Oxford
  • 29. Case Study: A Protein Signalling Network Combining Observational and Interventional Data model without interventions Akt Erk Jnk Mek P38 PIP2 PIP3 PKA PKC Plcg Raf model with interventions Akt Erk Jnk Mek P38 PIP2 PIP3 PKA PKC Plcg Raf Observations must be scored taking into account the effects of the interventions, which break biological pathways; the overall network score is a mixture of scores adjusted for each experiment. Marco Scutari University of Oxford
  • 30. Case Study: A Protein Signalling Network Using The Protein Network to Plan Experiments This idea goes by the name of hypothesis generation: using a statistical model to decide which follow-up experiments to perform. BNs are especially easy to use for this because they automate the computation of arbitrary events. P(Akt) probability Akt LOW AVG HIGH 0.0 0.2 0.4 0.6 without intervention with intervention P(PKA) probability PKA LOW AVG HIGH 0.2 0.4 0.6 without intervention with intervention Marco Scutari University of Oxford
  • 31. Case Study: A Protein Signalling Network Conditional Probability Queries DBNs, GBNs, CLGBNs all have exact inference algorithms to compute conditional probabilities for an event given some evidence. However, approximate inference scales better and the same algorithms can be used for all kinds of BNs. An example is likelihood weighting below. Input: a BN B = (G, Θ), an query Q = q and a set of evidence E Output: an approximate estimate of P(Q | E, G, Θ). 1. Order the Xi according to the topological ordering in G. 2. Set wE = 0 and wE,q = 0. 3. For a suitably large number of samples x = (x1, . . . , xp): 3.1 generate x(i), i = 1, . . . , p from X(i) | ΠX(i) using the values e1, . . . , ek specified by the hard evidence E for Xi1 , . . . , Xik . 3.2 compute the weight wx = P(Xi∗ = e∗ | ΠXi∗ ) 3.3 set wE = wE + wx; 3.4 if x includes Q = q , set wE,q = wE,q + wx. 4. Estimate P(Q | E, G, Θ) with wE,q/wE. Marco Scutari University of Oxford
  • 32. Case Study: Genome-Wide Predictions Marco Scutari University of Oxford
  • 33. Case Study: Genome-Wide Predictions MAGIC Populations: Wheat and Rice Sequence data (e.g. SNP markers) is routinely used in statistical genetics to understand the genetic basis of human diseases, and to breed traits of commercial interest in plants and animals. Multiparent (MAGIC) populations are ideal for the latter. Here we consider two: ˆ A winter wheat population: 721 varieties, 16K markers, 7 traits. ˆ An indica rice population: 1087 varieties, 4K markers, 10 traits. Phenotypic traits include flowering time, height, yield, a number of disease scores; and physical and quality traits for grains in rice. The goal of the analysis is to find key markers controlling the traits; the causal relationships between them; keep a good predictive accuracy. Multiple Quantitative Trait Analysis Using Bayesian Networks Marco Scutari, et al., Genetics, 198, 129–137 (2014); DOI: 10.1534/genetics.114.165704 Marco Scutari University of Oxford
  • 34. Case Study: Genome-Wide Predictions Bayesian Networks for Selection and Association Studies If we have a set of traits and markers for each variety, all we need are the Markov blankets of the traits; most markers are discarded in the process. Using common sense, we can make some assumptions: ˆ traits can depend on markers, but not vice versa; ˆ dependencies between traits should follow the order of the respective measurements (e.g. longitudinal traits, traits measured before and after harvest, etc.); ˆ dependencies in multiple kinds of genetic data (e.g. SNP + gene expression or SNPs + methylation) should follow the central dogma of molecular biology. Assumptions on the direction of the dependencies allow to reduce Markov blankets learning to learning the parents and the children of each trait, which is a much simpler task. Marco Scutari University of Oxford
  • 35. Case Study: Genome-Wide Predictions Parametric Assumptions In the spirit of classic additive genetics models, we use a Gaussian BN. Then the local distribution of each trait Ti is a linear regression model Ti = µTi + ΠTi βTi + εTi = µTi + TjβTj + . . . + TkβTk traits + GlβGl + . . . + GmβGm markers +εTi and the local distribution of each marker Gi is likewise Gi = µGi + ΠGi βGi + εGi = = µGi + GlβGl + . . . + GmβGm markers +εGi in which the regressors (ΠTi or ΠGi ) are treated as fixed effects. ΠTi can be interpreted as causal effects for the traits, ΠGi as markers being in linkage disequilibrium with each other. Marco Scutari University of Oxford
  • 36. Case Study: Genome-Wide Predictions Learning the Bayesian Network (I) 1. Feature Selection. 1.1 Independently learn the parents and the children of each trait with the SI-HITON-PC algorithm; children can only be other traits, parents are mostly markers, spouses can be either. Both are selected using the exact Student’s t test for partial correlations. 1.2 Drop all the markers that are not parents of any trait. Parents and children of T1 Parents and children of T2 Parents and children of T3 Parents and children of T4 Redundant markers that are not in the Markov blanket of any trait Marco Scutari University of Oxford
  • 37. Case Study: Genome-Wide Predictions Constraint-Based Structure Learning Algorithms C A B D E F CPDAG Graphical separation Conditional independence tests The mapping between edges and conditional independence relationships lies at the core of BNs; therefore, one way to learn the structure of a BN is to check which such relationships hold using a suitable conditional independence test. Such an approach results in a set of conditional independence constraints that identify a single equivalence class. Marco Scutari University of Oxford
  • 38. Case Study: Genome-Wide Predictions The Semi-Interleaved HITON-PC Algorithm Input: each trait Ti in turn, other traits (Tj) and all markers (Gl), a significance threshold α. Output: the set CPC parents and children of Ti in the BN. 1. Perform a marginal independence test between Ti and each Tj (Ti ⊥⊥ Tj) and Gl (Ti ⊥⊥ Gl) in turn. 2. Discard all Tj and Gl whose p-values are greater than α. 3. Set CPC = {∅}. 4. For each the Tj and Gl in order of increasing p-value: 4.1 Perform a conditional independence test between Ti and Tj/Gl conditional on all possible subsets Z of the current CPC (Ti ⊥⊥ Tj | Z ⊆ CPC or Ti ⊥⊥ Gl | Z ⊆ CPC). 4.2 If the p-value is smaller than α for all subsets then CPC = CPC ∪ {Tj} or CPC = CPC ∪ {Gl}. NOTE: the algorithm is defined for a generic independence test, you can plug in any test that is appropriate for the data. Marco Scutari University of Oxford
  • 39. Case Study: Genome-Wide Predictions Learning the Bayesian Network (II) 2. Structure Learning. Learn the structure of the network from the nodes selected in the previous step, setting the directions of the arcs according to the assumptions above. The optimal structure can be identified with a suitable goodness-of-fit criterion such as BIC. This follows the spirit of other hybrid approaches that have shown to be well-performing in literature. Empty network Learned network Marco Scutari University of Oxford
  • 40. Case Study: Genome-Wide Predictions Learning the Bayesian Network (III) 3. Parameter Learning. Learn the parameters: each local distribution is a linear regression and the global distribution is a hierarchical linear model. Typically least squares works well because SI-HITON-PC selects sets of weakly correlated parents; ridge regression can be used otherwise. Learned network Local distributions Marco Scutari University of Oxford
  • 41. Case Study: Genome-Wide Predictions Model Averaging and Assessing Predictive Accuracy We perform all the above in 10 runs of 10-fold cross-validation to ˆ assess predictive accuracy with e.g. predictive correlation; ˆ obtain a set of DAGs to produce an averaged, de-noised consensus DAG with model averaging. Marco Scutari University of Oxford
  • 42. Case Study: Genome-Wide Predictions WHEAT: a Bayesian Network (44 nodes, 66 arcs) YR.GLASS HT YR.FIELDMIL FT G418 G311 G1217 G800 G866 G795 G2570 G260 G2920 G832 G1896 G2953 G266 G847 G942 G200 G257 G2208 G1373 G599 G261 G383 G1853 G1033 G1945 G1338 G1276 G1263 G1789 G2318 G1294 G1800 YLD FUS G1750 G524 G775 G2835 G43 PHYSICAL TRAITS OF THE PLANT DISEASES Marco Scutari University of Oxford
  • 43. Case Study: Genome-Wide Predictions RICE: a Bayesian Network (64 nodes, 102 arcs) HT FT AMY GL GW GTEMP G4432 G2670 G4744 G3092 G3486 G1533 G2639 G3823 G1378 G3105 G3102 G3098 G317 G5167 G1778 G3872 G4529 G1440 G5794 G4668 G2764 G457 G3862 G3964 G4109 G2089 G3219 G3209 G3222 G3212 G4573 G1311 G2949 G6003 G6010 G6123 G2815 G3049 YLD CHALK BROWN SUB G1888 G1958 G4156 G4382 G3136 G4145 G3106 G1266 G3927 G5997 G4553 G2179 G5006 G3992 G678 G3925 PHYSICAL TRAITS OF THE PLANT DISEASES ABIOTIC STRESS PHYSICAL AND QUALITY TRAITS OF THE GRAINS Marco Scutari University of Oxford
  • 44. Case Study: Genome-Wide Predictions Predicting Traits for New Individuals We can predict the traits: 1. from the averaged consensus network; 2. from each of the 10 × 10 networks we learn during cross-validation, and average the predictions for each new individual and trait. Option 2. almost always provides better accuracy than option 1., especially for polygenic traits; 10 × 10 networks can cover the genome much better, and we have to learn them anyway. So: averaged network for interpretation, ensemble of networks for predictions. cross−validatedcorrelation 0.2 0.4 0.6 0.8 YR.GLASS YLD HT YR.FIELD FUS MIL FT AVERAGED NETWORK(α = 0.05, ρC) AVERAGED PREDICTIONS(α = 0.05, ρC) AVERAGED NETWORK(α = 0.05, ρG) AVERAGED PREDICTIONS(α = 0.05, ρG) cross−validatedcorrelation 0.2 0.4 0.6 0.8 YLD HT FT AMY GL GW CHALKGTEMP SUB BROWN AVERAGED NETWORK(α = 0.05, ρC) AVERAGED PREDICTIONS(α = 0.05, ρC) AVERAGED NETWORK(α = 0.05, ρG) AVERAGED PREDICTIONS(α = 0.05, ρG) Marco Scutari University of Oxford
  • 45. Case Study: Genome-Wide Predictions Causal Relationships Between Traits One of the key properties of BNs is their ability of capturing the direction of the causal relationships among traits in the absence of latent confounders (the experimental design behind the data collection should take care of a number of them). It works because each trait will have at least one incoming arc from the markers, say Gl → Tj, and then (Gl →) Tj ← Tk and (Gl →) Tj → Tk are not probabilistically equivalent. So the network can ˆ suggest the direction of novel relationships; ˆ confirm the direction of known relationships, troubleshooting the experimental design and data collection. HT YR.FIELD FT G418 1217 0 G260 G2920 832 896 G2953 G266 G847 G942 G257 G2208 G G1338 G G1294 G1800 YLD G2835 YR.GLASS HT YRMIL G418 G1217 0 G2570 G832 G1896 G2953 G9 G257 G2208 G1373G1945 G1338 G1800 YLD FUS G1750 G2835 (WHEAT) (WHEAT) Marco Scutari University of Oxford
  • 46. Case Study: Genome-Wide Predictions Spotting Confounding Effects HT G2570 G832 G1896 G2953 YLD FUS G2835 (WHEAT) Traits can interact in complex ways that may not be obvious when they are studied individually, but that can be explained by considering neighbouring variables in the network. An example: in the WHEAT data, the difference in the mean YLD between the bottom and top quartiles of the FUS disease scores is +0.08. So apparently FUS is associated with increased YLD! What we are actually measuring is the confounding effect of HT (FUS ← HT → YLD); conditional on each quartile of HT, FUS has a negative effect on YLD ranging from -0.04 to -0.06. This is reassuring since it is known that susceptibility to fusarium is positively related to HT, which in turn affects YLD. Marco Scutari University of Oxford
  • 47. Case Study: Genome-Wide Predictions Disentangling Pleiotropic Effects When a marker is shown to be associated to multiple traits, we should separate its direct and indirect effects on each of the traits. (Especially if the traits themselves are linked!) Take for example G1533 in the RICE data set: it is putative causal for YLD, HT and FT. HT FT G4432 G1533 G4109 YLD (RICE) ˆ The difference in mean between the two homozygotes is +4.5cm in HT, +2.28 weeks in FT and +0.28 t/ha in YLD. ˆ Controlling for YLD and FT, the difference for HT halves (+2.1cm); ˆ Controlling for YLD and HT, the difference for FT is about the same (+2.3 weeks); ˆ Controlling for HT and FT the difference for YLD halves (+0.16 t/ha). So, the model suggests the marker is causal for FT and that the effect on the other traits is partly indirect. This agrees from the p-values from an independent association study (FT: 5.87e-28 < YLD: 4.18e-10, HT:1e-11). Marco Scutari University of Oxford
  • 48. Case Study: Genome-Wide Predictions Identifying Causal (Sets of) Markers Compared to additive regression models, BNs make it trivial to compute: ˆ posterior probability of association for a marker and a trait after all the other markers and traits are taken into account to rule out linkage disequilibrium, confounding, pleiotropy, etc.; ˆ expected allele counts nLOW and nHIGH for a marker and low/high values of a set of traits (nLOW − nHIGH should be large if the marker tags a causal variant and thus should show which allele is favourable). G1778 G3872 G4529 G1440 G5794 small grains 0.00 0.78 0.29 0.16 0.74 large grains 2.00 0.47 0.63 0.35 0.12 G4668 G2764 G3927 G3992 G4432 small grains 0.24 0.29 0.18 0.09 0.00 large grains 0.17 0.00 0.62 0.29 0.82 small grains = bottom 10% GL, bottom 10% GW in the RICE data. large grains = top 10% GL, top 10% GW. Marco Scutari University of Oxford
  • 50. Conclusions Conclusions and Remarks ˆ BNs provide an intuitive representation of the relationships linking heterogeneous sets of variables, which we can use for qualitative and causal reasoning. ˆ We can learn BNs from data while including prior knowledge as needed to improve the quality of the model. ˆ BNs subsume a large number of probabilistic models and thus can readily incorporate other techniques from statistics and computer science (via information theory). ˆ For most tasks we can start just reusing state-of-the-art, general purpose algorithms. ˆ Once learned, BNs provide a flexible tool for inference, while maintaining good predictive power. ˆ Markov blankets are a valuable tool for feature selection, and make it possible to learn just part of the BN depending on the goals of the analysis. Marco Scutari University of Oxford
  • 51. That’s all, Thanks! Marco Scutari University of Oxford