variational bayes in biophysics

bio+stats vbem/networks hierarchical
variational and hierarchical modeling
for biological data
chris wiggins
columbia
april 23, 2012
chris.wiggins@columbia.edu 4/23/12
Chris Wiggins
• APAM: Department of Applied Physics and Applied Mathematics;
• C2B2: Center for Computational Biology and Bioinformatics;
• CISB: Columbia University Initiative in Systems Biology
• ISDE: Institute for Data Sciences and Engineering
Columbia University
September 28, 2012

bio+stats vbem/networks hierarchical biological challenges inference model selection
thanks. . .
- jake hofman (vbmod,vbfret)
- jonathan bronson (vbfret)
- jan-willem van de meent (hfret)
- ruben gonzalez (vbfret, hfret)
for more info:
- vbfret.sourceforge.net
- vbmod.sourceforge.net
- hfret.sourceforge.net (soon)
BMC bioinformatics, 2010;
PNAS 2009;
Biophysical Journal 2009;

bio+stats vbem/networks hierarchical
1 biology and statistics
genomics
generative modeling
2 variational/biological networks
variational Bayesian expectation maximization
inference
model selection
3 hierarchical/time series
biological challenges
inference
model selection

bio+stats vbem/networks hierarchical genomics generative modeling
biology and statistics:

genomics:
plos comp bio `10, nyas `07, bmc bioinfo `06a, bmc bioinfo `06b,
bioinfo `04, regulatory genomics 04, IEEE `05; NIPS (MLCB`03, `06 )

generative modeling:
bmc bioinfo '10, PNAS`09, biophys
j`09, PNAS`07, PNAS`06 NIPS
(MLCB '09, '10, '11) IEEE sig. proc.
`12; plos one `08,cell `07,prl `06, jcs
`06, biophys j `06
pami `08, prl `08, NIPS (MLCB08),
PNAS `05, bmc bioinfo `04

bio+stats vbem/networks hierarchical vbem inference model selection
variational/biological networks:
- variational bayesian expectation maximization
- inference
- model selection

introduction formulation results extensions motivation history
introduction:
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
Hartwell, Hopﬁeld, Leibler and Murray
NATURE|VOL 402 | SUPP | 2
DECEMBER 1999 | www.nature.com

motivation:
community detection in networks
social networks
biological networks
problem: over-ﬁtting/resolution limit

history:
by community
math/cs: spectral methods (Fiedler ’74, Shi + Malik ’00)
math/cs: clustering generally (Taskar, Koller, Getoor)
physics: modularity
common thread: test w/ stochastic block model (’76, ’83)
ergo: use as inference tool (Hastings 0604429,
Newman+Liecht 061148)

introduction formulation results extensions generative model max likelihood max evidence algo
formulation:
generative model
maximum likelihood
maximum evidence
complexity control. . .
variational/mft. . .
algorithm
in physics: “test hamiltonian”
in ML “variational bayesian methods” (Jordan, Mackay)

generative model:
foreach node roll K-sided die with bias π to choose
zi {1, . . . , K}
foreach edge ﬂip coin with bias ϑ+ if zi = zj , else ϑ−
draw edge if coin lands heads up
Stochastic block models (Holland, Laskey, Leinhardt 1983; Wang and Wong, 1987)
i≠j
zi zj
Aij
π
θ

generative model. . . (bis)
Die rolling, coin ﬂipping, and priors: where counts are:
non-edges within
modules
edges within
modules
edges between
modules
non-edges
between modules
nodes in each
module

max likelihood:
H ⇥ ln p(A, ⌦z|⌦⇤, ⌦⇥) =
i,j
(JLAij JG) zi,zj +
K
µ=1
hµ
N
i=1
zi,µ
JG ⇥ ln ⇥c/⇥d
JL ⇥ ln(1 ⇥d)/(1 ⇥c) + JG
hµ ⇥ ln µ
Extends Newman (2004, 2006), Hastings (2006), Bornholdt & Reichardt (2006)
•Die rolling, coin ﬂipping <-> inﬁnite-range spin-glass Potts model:

max evidence:
Increasing complexity

max evidence:
http://research.microsoft.com/~minka/statlearn/demo/

max evidence:

max evidence:
cf. “BIC” Schwartz, 1978

generative model:
foreach node roll K-sided die with bias π to choose
zi {1, . . . , K}
foreach edge ﬂip coin with bias ϑ+ if zi = zj , else ϑ−
draw edge if coin lands heads up
i≠j
zi zj
Aij
π
θ c
n

max likelihood:
•Infer distributions over spin assignments, coupling constants, and
chemical potentials and ﬁnd number of occupied spin states
JG ⇥ ln ⇥c/⇥d
JL ⇥ ln(1 ⇥d)/(1 ⇥c) + JG
hµ ⇥ ln µ
H ⇥ ln p(A, ⌦z|⌦⇤, ⌦⇥) =
i,j
(JLAij JG) zi,zj +
K
µ=1
hµ
N
i=1
zi,µ

max evidence:
H ⇥ ln p(A, ⌦z|⌦⇤, ⌦⇥) =
i,j
(JLAij JG) zi,zj +
K
µ=1
hµ
N
i=1
zi,µ
JG ⇥ ln ⇥c/⇥d
JL ⇥ ln(1 ⇥d)/(1 ⇥c) + JG
hµ ⇥ ln µ
p(A|K) =
⇥z
⇥
d⌦
⇥
d⌦⇥ p(A,⌦z,⌦⇥, ⌦) =
⇥z
⇥
d⌦
⇥
d⌦⇥ e H
p(⌦)p(⌦⇥)

max evidence:
Extends Newman (2004, 2006), Hastings (2006), Bornholdt & Reichardt (2004 & 2006)
H ⇥ ln p(A, ⌦z|⌦⇤, ⌦⇥) =
i,j
(JLAij JG) zi,zj +
K
µ=1
hµ
N
i=1
zi,µ
JG ⇥ ln ⇥c/⇥d
JL ⇥ ln(1 ⇥d)/(1 ⇥c) + JG
hµ ⇥ ln µ
p(A|K) =
⇥z
⇥
d⌦
⇥
d⌦⇥ p(A,⌦z,⌦⇥, ⌦) =
⇥z
⇥
d⌦
⇥
d⌦⇥ e H
p(⌦)p(⌦⇥)
Can do integrals,
but sum is
intractable, O(KN);
use mean-ﬁeld

max evidence:
• Gibbs’/Jensen’s inequality (log of expected value bounds expected value of log) for any distribution q
p(A|K) =
⇥z
⇥
d⌦
⇥
d⌦⇥ p(A,⌦z,⌦⇥, ⌦) =
⇥z
⇥
d⌦
⇥
d⌦⇥ e H
p(⌦)p(⌦⇥)
Variational Bayes (MacKay, Jordan, Ghahramani, Jaakola, Saul 1999; cf. Feynman 1972)

max evidence:
why would you do this? (A1):
Beal, 2003

max evidence:
Beal, 2003

max evidence:
• Gibbs’/Jensen’s inequality (log of expected value bounds expected value of log) for any distribution q
Variational Bayes (MacKay, Jordan, Ghahramani, Jaakola, Saul 1999; cf. Feynman 1972)
• F is a functional of q; ﬁnd approximation to posterior by optimizing approximation to
evidence
• Take q(z, π, θ)=q(z)q(π)q(θ); Qiμ is probability node i in module μ where expected counts
are:

algo:
where expected counts
are:

algo:

algo:
suggests hard limit in step 3; sparse in step 1

introduction formulation results extensions run time consistency good vs easy real data
results:
run time
consistency
required plot: good vs. easy
real data
karate
biology
american football

run time:
• Main loop runtime for 104 nodes in MATLAB ~30 seconds

consistency:

consistency:
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
2
4
6
8
10
12
θ
N=8, K=2, distribution after 2 iterations
p(θ+
)
p(θ
−
)

consistency:
• K=4?
• Automatic complexity control: probability of occupation for extraneous modules
goes to zero

consistency:
The “resolution limit” problem
1
2
3
4
5
57
6
7
8 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
4041
42
43 44
45
46
47
48
49
50
51
52
53
54
55
56
58
59
60
1
2
3
4
5
57
6
7
8 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
4041
42
43 44
45
46
47
48
49
50
51
52
53
54
55
56
58
59
60
Variational Bayes
Girvan-Newman
modularity

consistency:
The “resolution limit” problem
10 12 14 16 18 20
8
10
12
14
16
18
20
Ktrue
K*
K
*
=Ktrue
Variational Bayes
Modularity optimization
10 12 14 16 18 20
0.72
0.74
0.76
0.78
0.8
0.82
0.84
Ktrue
GNmodularity
Resolution limit problem on ring of 4−node cliques
Single−clique communities (correct)
Double−clique communities (incorrect)
GN modularity (Clauset’s algorithm)
Girvan-Newman modularity or Potts model w/ ﬁxed parameters suffers from a resolution limit,
where size of detected modules depends on network size
Fortunato et. al. (2007), Kumpula et. al. (2007),

good vs easy:

real data:
• Correctly infer K=12 conferences
Validation: NCAA football schedule
1
2 3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45 46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
nodes: teams
edges: games
shape: conference
color: inferred module

real data:
APS march meeting 2008
superconductivity
(experimentalists)
Nanotubes, Graphene
superconductivity
(theorists)

introduction formulation results extensions
extensions:
model extensions
full SBM (done)
hierarchical model p(Aij = 1|zi = zi ± 1)
hierarchical modeling (ensemble of graphs)
p(D|u, K) = ΠL
i dϑi p(D|ϑi )p(ϑi |u, K)
Rd
embedding (latent are real)
more ‘rigorous’ SOM?
‘correct’ for degree (allow variable aﬃnity) (cf. Bader, Karrer)
algorithm extensions
BP (see earlier talks)
map-reduce
cvmod (model selection via cross validation)

full SBM. . .
probability of edge depends only on block membership:
p(Aij |zi = µ, zj = ν) = ϑµν

full SBM. . .
• Nodes belong to “blocks” of
varying size
• Roll die for assignment of
nodes to blocks
• Probability of edge between two
nodes depends only on block
membership
• Flip (one of K2) coins for edges
• Result: mixture of Erdos-Renyi
graphs
0 20 40 60 80 100 120
0
20
40
60
80
100
120
nz = 2275
adjacency matrix

full SBM. . .
vs

full SBM. . .
0 20 40 60 80 100 120
0
20
40
60
80
100
120
nz = 2803
adjacency matrix

full SBM. . .
>> vbsbm_vs_vbmod(0)
running vbmod ...
Elapsed time is 1.136925 seconds.
running vbsbm ...
Fmod=13089.158019 Fsbm=13144.445782
vbmod wins

full SBM. . .
>> vbsbm_vs_vbmod(0.25)
running vbmod ...
running vbsbm ...
Fmod=20457.142416 Fsbm=19457.306022
vbsbm wins

full SBM. . .
>> vbsbm_vs_vbmod(0.5)
running vbmod ...
running vbsbm ...
Fmod=26133.351210 Fsbm=23921.797625
vbsbm wins

full SBM. . .
• Using same framework we can compare the
unconstrained and full stochastic block models via p(D|M,K*)
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
perturbation to constrained model
winpercentageforunconstrainedmodel
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
perturbation to constrained model
winpercentageforunconstrainedmodel
0 20 40 60 80 100 120
0
20
40
60
80
100
120
nz = 2100
adjacency matrix
0 20 40 60 80 100 120
0
20
40
60
80
100
120
nz = 2048
adjacency matrix
0 20 40 60 80 100 120
0
20
40
60
80
100
120
nz = 2108
adjacency matrix

extensions:
model extensions
full SBM (done)
p(D|u, K) = ΠL
Rd
map-reduce
i≠j
zi
zj
Aij
π
θ c
n

extensions:
model extensions
full SBM (done)
p(D|u, K) = ΠL
Rd
map-reduce
i≠j
zi
zj
Aij
π
θ c
n
L

for more info. . .
code: MATLAB & python (inc. “full” SBM) (vbmod.sf.net)
paper: arxiv 08 / prl 08
Hofman soon to come (not by me)
code in C++, inc. full ‘vblabel propagation’ algo
twitter-scale analysis

hierarchical/time series:
- biological challenges
- inference
- model selection

hfret. . .
Jan-Willem van de Meent, Ruben Gonzalez, Chris Wiggins
Columbia University

Ramakrishnan et al – http://www.mrc-lmb.cam.ac.uk/ribo/

hfret. . .
FRET = cy5 / (cy3 + cy5)
Tinoco and Gonzalez, Genes Dev, 2011

hfret. . .
Tinoco and Gonzalez, Genes Dev, 2011

hfret. . .
Unbound
EF-G bound
Tinoco and Gonzalez, Genes Dev, 2011 Fei et al, PNAS, 2009
(short-lived GS1 states correspond to an EF-G + GDPNP binding event)

hfret. . .
Unbound
EF-G bound
Tinoco and Gonzalez, Genes Dev, 2011 Fei et al, PNAS, 2009

hfret. . .
1. Identify states
2. Estimate Kinetic Rates
3. Average over many time series
4. Detect subpopulations

hfret. . .
FRET Signal

hfret. . .
FRET SignalHistogram

hfret. . .
FRET SignalHistogram
Idea: Find probability of belonging to each state

hfret. . .
Expectation Maximization
1. calculate p(z | x, θi)
2. calculate θi+1 from p(z | x, θi)

hfret. . .
Log-Likelihood
L = log p(x θ) = log
z
p(x, z θ)
Expectation Maximization
1. calculate p(z | x, θi)
2. calculate θi+1 from p(z | x, θi)

hfret. . .
Learned Truth
Accurate for occupancy of states,
not so good for rate estimates

hfret. . .
p(x, z µ, σ, π) = p(x z, µ, σ)p(z π)

hfret. . .
probability of state depends on previous state
p(zt+ =l zt =k) = Akl

hfret. . .

hfret. . .
p(z =k) = πk

hfret. . .
p(zt+ =l zt =k) = Akl

hfret. . .
p(xt zt = k) = N(xt µk , σk)

hfret. . .
Learned Real
We’ve learned:
parameters: θ = {µ, σ, π, A} states: p(z | x, θ)

hfret. . .
2 States 3 States

hfret. . .
Log-Likelihood
z
p(x, z θ)
Log-Evidence
L = log p(x u) = log
z
∫ dθ p(x, z θ)p(θu)
Log-Evidence

hfret. . .
Log-Likelihood
z
p(x, z θ)
Log-Evidence
z
Prior

hfret. . .
Log-Likelihood
z
p(x, z θ)
Log-Evidence
z
Ensemble

hfret. . .
Log-Likelihood
z
p(x, z θ)
Log-Evidence
best model has highest average likelihood
z
Log-Evidence

hfret. . .
Log-Evidence
31
z
Lower Bound
L =
z
∫ dθ q(z)q(θ w)log
p(x, z, θ u)
q(z)q(θ w)

≥ log p(x u)
q(z)q(θ w) p(z, θ x)

hfret. . .
31
Lower bound tight for true posterior
L =
z
∫ dθ p(z, θ x)log
p(x, z, θ u)
p(z, θ x)

=
z
∫ dθ p(z, θ x)log[p(x u)]
= log p(x u)
L = log p(x u) − Dkl [q(z)q(θ w) p(z, θ x)]

hfret. . .
31
We’ve learned:
parameters: q(θ | w) states: p(z | x, θ)
δLn
δq(zn)
=
VBEM Updates
δLn
δq(θn)
=

hfret. . .
31
variability: photophysical/experimental

hfret. . .
31

hfret. . .
31
Hierarchical Updates
∂
∂u

n
Ln =

hfret. . .
31
Hierarchical Updates
∂
∂u

n
Ln = “two-stage PEB model/CIHM”-Kass Steffey JASA 1989

hfret. . .
2. Update p(θ | u)
Until Σ Ln converges
Until Ln converges
• Update q(zn)
• Update q(θn | wn)
1. Run VBEM on each trace
δLn
δq(zn)
=
Hierarchical UpdatesVBEM Updates
δLn
δq(θn)
=
∂
∂u

n
Ln =

hfret. . .
2. Update p(θ | u)
Until Σ Ln converges
Until Ln converges
• Update q(zn)
• Update q(θn | wn)
1. Run VBEM on each trace
We’ve learned:
p(θn, zn | xn) ≃ q(θn) q(zn)
(for each trace)
p(θ | u)
(for ensemble)

hfret. . .
ξntkl = p(znt = k, znt+ = l xn)
1. Run mixture model on posterior counts
p(ξnA) =
tkl
Aξntkl
kl
p(ξn um) = ∫ dA p(Aum)p(ξn A)
2. Rerun with M x K block-diagonal form
uA
=

uA

uA

uA
M

hfret. . .
τfast
τslow
2
4
8
16
32
64

hfret. . .
τfast
τslow
2
4
8
16
32
64
reality

hfret. . .
no EF-G 50 nM EF-G 500 nM EF-G
Fei, Bronson, Hofman, Srinivas, Wiggins, Gonzalez, PNAS, 2009

hfret. . .
p(zk) ∼ e−Gk kB T
log p(zk) − log p(zl ) = −(Gk − Gl )kBT + cst.

hfret. . .
p(zk) ∼ e−Gk kB T
Δ∆G=logit(p)

hfret. . .
no EF-Gbound fraction and life-times

hfret. . .
5 nM EF-Gbound fraction and life-times

hfret. . .

model selection:

hfret. . .
Low Noise, UnderﬁttedInf Out - Inf In

hfret. . .
Low Noise, CorrectOut vs In

hfret. . .
Low Noise, OverﬁttedInf Out - Inf In

hfret. . .
High Noise, UnderﬁttedInf Out - Inf In

hfret. . .
High Noise, CorrectInf Out - Inf In

hfret. . .
High Noise, OverﬁttedInf Out - Inf In

hfret. . .
the future, in progress:
X

traditional role of statistics in biophysics
“if your experiment needs
statistics, you ought to
have done a better
experiment”
-lord rutherford

variational bayes in biophysics

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (8)

Ähnlich wie variational bayes in biophysics

Ähnlich wie variational bayes in biophysics (20)

Mehr von chris wiggins

Mehr von chris wiggins (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

variational bayes in biophysics