SlideShare ist ein Scribd-Unternehmen logo
1 von 139
bio+stats vbem/networks hierarchical
variational and hierarchical modeling
for biological data
chris wiggins
columbia
april 23, 2012
chris.wiggins@columbia.edu 4/23/12
Chris Wiggins
• APAM: Department of Applied Physics and Applied Mathematics;
• C2B2: Center for Computational Biology and Bioinformatics;
• CISB: Columbia University Initiative in Systems Biology
• ISDE: Institute for Data Sciences and Engineering
Columbia University
September 28, 2012
bio+stats vbem/networks hierarchical biological challenges inference model selection
thanks. . .
- jake hofman (vbmod,vbfret)
- jonathan bronson (vbfret)
- jan-willem van de meent (hfret)
- ruben gonzalez (vbfret, hfret)
for more info:
- vbfret.sourceforge.net
- vbmod.sourceforge.net
- hfret.sourceforge.net (soon)
chris.wiggins@columbia.edu 4/23/12
BMC bioinformatics, 2010;
PNAS 2009;
Biophysical Journal 2009;
bio+stats vbem/networks hierarchical
1 biology and statistics
genomics
generative modeling
2 variational/biological networks
variational Bayesian expectation maximization
inference
model selection
3 hierarchical/time series
biological challenges
inference
model selection
chris.wiggins@columbia.edu 4/23/12
bio+stats vbem/networks hierarchical genomics generative modeling
biology and statistics:
chris.wiggins@columbia.edu 4/23/12
bio+stats vbem/networks hierarchical genomics generative modeling
genomics:
chris.wiggins@columbia.edu 4/23/12
plos comp bio `10, nyas `07, bmc bioinfo `06a, bmc bioinfo `06b,
bioinfo `04, regulatory genomics 04, IEEE `05; NIPS (MLCB`03, `06 )
bio+stats vbem/networks hierarchical genomics generative modeling
generative modeling:
chris.wiggins@columbia.edu 4/23/12
bmc bioinfo '10, PNAS`09, biophys
j`09, PNAS`07, PNAS`06 NIPS
(MLCB '09, '10, '11) IEEE sig. proc.
`12; plos one `08,cell `07,prl `06, jcs
`06, biophys j `06
pami `08, prl `08, NIPS (MLCB08),
PNAS `05, bmc bioinfo `04
bio+stats vbem/networks hierarchical vbem inference model selection
variational/biological networks:
- variational bayesian expectation maximization
- inference
- model selection
chris.wiggins@columbia.edu 4/23/12
introduction formulation results extensions motivation history
introduction:
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
Hartwell, Hopfield, Leibler and Murray
NATURE|VOL 402 | SUPP | 2
DECEMBER 1999 | www.nature.com
introduction formulation results extensions motivation history
motivation:
community detection in networks
social networks
biological networks
problem: over-fitting/resolution limit
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
introduction formulation results extensions motivation history
history:
by community
math/cs: spectral methods (Fiedler ’74, Shi + Malik ’00)
math/cs: clustering generally (Taskar, Koller, Getoor)
physics: modularity
common thread: test w/ stochastic block model (’76, ’83)
ergo: use as inference tool (Hastings 0604429,
Newman+Liecht 061148)
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
introduction formulation results extensions generative model max likelihood max evidence algo
formulation:
generative model
maximum likelihood
maximum evidence
complexity control. . .
variational/mft. . .
algorithm
in physics: “test hamiltonian”
in ML “variational bayesian methods” (Jordan, Mackay)
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
introduction formulation results extensions generative model max likelihood max evidence algo
generative model:
foreach node roll K-sided die with bias π to choose
zi {1, . . . , K}
foreach edge flip coin with bias ϑ+ if zi = zj , else ϑ−
draw edge if coin lands heads up
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
Stochastic block models (Holland, Laskey, Leinhardt 1983; Wang and Wong, 1987)
i≠j
zi zj
Aij
π
θ
introduction formulation results extensions generative model max likelihood max evidence algo
generative model. . . (bis)
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
Die rolling, coin flipping, and priors: where counts are:
non-edges within
modules
edges within
modules
edges between
modules
non-edges
between modules
nodes in each
module
introduction formulation results extensions generative model max likelihood max evidence algo
max likelihood:
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
H ⇥ ln p(A, ⌦z|⌦⇤, ⌦⇥) =
i,j
(JLAij JG) zi,zj +
K
µ=1
hµ
N
i=1
zi,µ
JG ⇥ ln ⇥c/⇥d
JL ⇥ ln(1 ⇥d)/(1 ⇥c) + JG
hµ ⇥ ln µ
Extends Newman (2004, 2006), Hastings (2006), Bornholdt & Reichardt (2006)
•Die rolling, coin flipping <-> infinite-range spin-glass Potts model:
introduction formulation results extensions generative model max likelihood max evidence algo
formulation:
generative model
maximum likelihood
maximum evidence
complexity control. . .
variational/mft. . .
algorithm
in physics: “test hamiltonian”
in ML “variational bayesian methods” (Jordan, Mackay)
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
introduction formulation results extensions generative model max likelihood max evidence algo
formulation:
generative model
maximum likelihood
maximum evidence
complexity control. . .
variational/mft. . .
algorithm
in physics: “test hamiltonian”
in ML “variational bayesian methods” (Jordan, Mackay)
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
introduction formulation results extensions generative model max likelihood max evidence algo
max evidence:
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
Increasing complexity
introduction formulation results extensions generative model max likelihood max evidence algo
max evidence:
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
http://research.microsoft.com/~minka/statlearn/demo/
introduction formulation results extensions generative model max likelihood max evidence algo
max evidence:
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
introduction formulation results extensions generative model max likelihood max evidence algo
max evidence:
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
cf. “BIC” Schwartz, 1978
introduction formulation results extensions generative model max likelihood max evidence algo
generative model:
foreach node roll K-sided die with bias π to choose
zi {1, . . . , K}
foreach edge flip coin with bias ϑ+ if zi = zj , else ϑ−
draw edge if coin lands heads up
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
Stochastic block models (Holland, Laskey, Leinhardt 1983; Wang and Wong, 1987)
i≠j
zi zj
Aij
π
θ
introduction formulation results extensions generative model max likelihood max evidence algo
generative model:
foreach node roll K-sided die with bias π to choose
zi {1, . . . , K}
foreach edge flip coin with bias ϑ+ if zi = zj , else ϑ−
draw edge if coin lands heads up
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
Stochastic block models (Holland, Laskey, Leinhardt 1983; Wang and Wong, 1987)
i≠j
zi zj
Aij
π
θ c
n
introduction formulation results extensions generative model max likelihood max evidence algo
generative model. . . (bis)
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
Die rolling, coin flipping, and priors: where counts are:
non-edges within
modules
edges within
modules
edges between
modules
non-edges
between modules
nodes in each
module
introduction formulation results extensions generative model max likelihood max evidence algo
generative model. . . (bis)
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
Die rolling, coin flipping, and priors: where counts are:
non-edges within
modules
edges within
modules
edges between
modules
non-edges
between modules
nodes in each
module
introduction formulation results extensions generative model max likelihood max evidence algo
max likelihood:
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
Extends Newman (2004, 2006), Hastings (2006), Bornholdt & Reichardt (2006)
•Die rolling, coin flipping <-> infinite-range spin-glass Potts model:
•Infer distributions over spin assignments, coupling constants, and
chemical potentials and find number of occupied spin states
JG ⇥ ln ⇥c/⇥d
JL ⇥ ln(1 ⇥d)/(1 ⇥c) + JG
hµ ⇥ ln µ
•Die rolling, coin flipping <-> infinite-range spin-glass Potts model:
•Infer distributions over spin assignments, coupling constants, and
chemical potentials and find number of occupied spin states
H ⇥ ln p(A, ⌦z|⌦⇤, ⌦⇥) =
i,j
(JLAij JG) zi,zj +
K
µ=1
hµ
N
i=1
zi,µ
introduction formulation results extensions generative model max likelihood max evidence algo
max evidence:
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
Extends Newman (2004, 2006), Hastings (2006), Bornholdt & Reichardt (2006)
•Die rolling, coin flipping <-> infinite-range spin-glass Potts model:
•Infer distributions over spin assignments, coupling constants, and
chemical potentials and find number of occupied spin states
H ⇥ ln p(A, ⌦z|⌦⇤, ⌦⇥) =
i,j
(JLAij JG) zi,zj +
K
µ=1
hµ
N
i=1
zi,µ
JG ⇥ ln ⇥c/⇥d
JL ⇥ ln(1 ⇥d)/(1 ⇥c) + JG
hµ ⇥ ln µ
p(A|K) =
⇥z
⇥
d⌦
⇥
d⌦⇥ p(A,⌦z,⌦⇥, ⌦) =
⇥z
⇥
d⌦
⇥
d⌦⇥ e H
p(⌦)p(⌦⇥)
introduction formulation results extensions generative model max likelihood max evidence algo
max evidence:
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
Extends Newman (2004, 2006), Hastings (2006), Bornholdt & Reichardt (2004 & 2006)
•Die rolling, coin flipping <-> infinite-range spin-glass Potts model:
•Infer distributions over spin assignments, coupling constants, and
chemical potentials and find number of occupied spin states
H ⇥ ln p(A, ⌦z|⌦⇤, ⌦⇥) =
i,j
(JLAij JG) zi,zj +
K
µ=1
hµ
N
i=1
zi,µ
JG ⇥ ln ⇥c/⇥d
JL ⇥ ln(1 ⇥d)/(1 ⇥c) + JG
hµ ⇥ ln µ
p(A|K) =
⇥z
⇥
d⌦
⇥
d⌦⇥ p(A,⌦z,⌦⇥, ⌦) =
⇥z
⇥
d⌦
⇥
d⌦⇥ e H
p(⌦)p(⌦⇥)
Can do integrals,
but sum is
intractable, O(KN);
use mean-field
introduction formulation results extensions generative model max likelihood max evidence algo
max evidence:
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
• Gibbs’/Jensen’s inequality (log of expected value bounds expected value of log) for any distribution q
p(A|K) =
⇥z
⇥
d⌦
⇥
d⌦⇥ p(A,⌦z,⌦⇥, ⌦) =
⇥z
⇥
d⌦
⇥
d⌦⇥ e H
p(⌦)p(⌦⇥)
Variational Bayes (MacKay, Jordan, Ghahramani, Jaakola, Saul 1999; cf. Feynman 1972)
introduction formulation results extensions generative model max likelihood max evidence algo
max evidence:
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
why would you do this? (A1):
Beal, 2003
introduction formulation results extensions generative model max likelihood max evidence algo
max evidence:
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
why would you do this? (A2):
Beal, 2003
introduction formulation results extensions generative model max likelihood max evidence algo
max evidence:
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
why would you do this? (A3):
Beal, 2003
introduction formulation results extensions generative model max likelihood max evidence algo
max evidence:
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
• Gibbs’/Jensen’s inequality (log of expected value bounds expected value of log) for any distribution q
Variational Bayes (MacKay, Jordan, Ghahramani, Jaakola, Saul 1999; cf. Feynman 1972)
• F is a functional of q; find approximation to posterior by optimizing approximation to
evidence
• Take q(z, π, θ)=q(z)q(π)q(θ); Qiμ is probability node i in module μ where expected counts
are:
introduction formulation results extensions generative model max likelihood max evidence algo
algo:
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
where expected counts
are:
introduction formulation results extensions generative model max likelihood max evidence algo
algo:
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
where expected counts
are:
introduction formulation results extensions generative model max likelihood max evidence algo
algo:
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
introduction formulation results extensions generative model max likelihood max evidence algo
algo:
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
suggests hard limit in step 3; sparse in step 1
introduction formulation results extensions run time consistency good vs easy real data
results:
run time
consistency
required plot: good vs. easy
real data
karate
biology
american football
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
introduction formulation results extensions run time consistency good vs easy real data
run time:
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
• Main loop runtime for 104 nodes in MATLAB ~30 seconds
introduction formulation results extensions run time consistency good vs easy real data
consistency:
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
introduction formulation results extensions run time consistency good vs easy real data
consistency:
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
2
4
6
8
10
12
θ
N=8, K=2, distribution after 2 iterations
p(θ+
)
p(θ
−
)
introduction formulation results extensions run time consistency good vs easy real data
consistency:
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
• K=4?
• Automatic complexity control: probability of occupation for extraneous modules
goes to zero
introduction formulation results extensions run time consistency good vs easy real data
consistency:
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
• K=4?
• Automatic complexity control: probability of occupation for extraneous modules
goes to zero
introduction formulation results extensions run time consistency good vs easy real data
consistency:
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
The “resolution limit” problem
1
2
3
4
5
57
6
7
8 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
4041
42
43 44
45
46
47
48
49
50
51
52
53
54
55
56
58
59
60
1
2
3
4
5
57
6
7
8 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
4041
42
43 44
45
46
47
48
49
50
51
52
53
54
55
56
58
59
60
Variational Bayes
Girvan-Newman
modularity
introduction formulation results extensions run time consistency good vs easy real data
consistency:
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
The “resolution limit” problem
10 12 14 16 18 20
8
10
12
14
16
18
20
Ktrue
K*
K
*
=Ktrue
Variational Bayes
Modularity optimization
10 12 14 16 18 20
0.72
0.74
0.76
0.78
0.8
0.82
0.84
Ktrue
GNmodularity
Resolution limit problem on ring of 4−node cliques
Single−clique communities (correct)
Double−clique communities (incorrect)
GN modularity (Clauset’s algorithm)
Girvan-Newman modularity or Potts model w/ fixed parameters suffers from a resolution limit,
where size of detected modules depends on network size
Fortunato et. al. (2007), Kumpula et. al. (2007),
introduction formulation results extensions run time consistency good vs easy real data
good vs easy:
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
introduction formulation results extensions run time consistency good vs easy real data
real data:
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
• Correctly infer K=12 conferences
Validation: NCAA football schedule
1
2 3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45 46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
nodes: teams
edges: games
shape: conference
color: inferred module
introduction formulation results extensions run time consistency good vs easy real data
real data:
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
APS march meeting 2008
superconductivity
(experimentalists)
Nanotubes, Graphene
superconductivity
(theorists)
introduction formulation results extensions
extensions:
model extensions
full SBM (done)
hierarchical model p(Aij = 1|zi = zi ± 1)
hierarchical modeling (ensemble of graphs)
p(D|u, K) = ΠL
i dϑi p(D|ϑi )p(ϑi |u, K)
Rd
embedding (latent are real)
more ‘rigorous’ SOM?
‘correct’ for degree (allow variable affinity) (cf. Bader, Karrer)
algorithm extensions
BP (see earlier talks)
map-reduce
cvmod (model selection via cross validation)
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
introduction formulation results extensions
full SBM. . .
probability of edge depends only on block membership:
p(Aij |zi = µ, zj = ν) = ϑµν
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
introduction formulation results extensions
full SBM. . .
probability of edge depends only on block membership:
p(Aij |zi = µ, zj = ν) = ϑµν
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
• Nodes belong to “blocks” of
varying size
• Roll die for assignment of
nodes to blocks
• Probability of edge between two
nodes depends only on block
membership
• Flip (one of K2) coins for edges
• Result: mixture of Erdos-Renyi
graphs
0 20 40 60 80 100 120
0
20
40
60
80
100
120
nz = 2275
adjacency matrix
introduction formulation results extensions
full SBM. . .
probability of edge depends only on block membership:
p(Aij |zi = µ, zj = ν) = ϑµν
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
vs
introduction formulation results extensions
full SBM. . .
probability of edge depends only on block membership:
p(Aij |zi = µ, zj = ν) = ϑµν
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
0 20 40 60 80 100 120
0
20
40
60
80
100
120
nz = 2803
adjacency matrix
introduction formulation results extensions
full SBM. . .
probability of edge depends only on block membership:
p(Aij |zi = µ, zj = ν) = ϑµν
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
vs
introduction formulation results extensions
full SBM. . .
probability of edge depends only on block membership:
p(Aij |zi = µ, zj = ν) = ϑµν
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
>> vbsbm_vs_vbmod(0)
running vbmod ...
Elapsed time is 1.136925 seconds.
running vbsbm ...
Elapsed time is 1.398904 seconds.
Fmod=13089.158019 Fsbm=13144.445782
vbmod wins
introduction formulation results extensions
full SBM. . .
probability of edge depends only on block membership:
p(Aij |zi = µ, zj = ν) = ϑµν
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
>> vbsbm_vs_vbmod(0.25)
running vbmod ...
Elapsed time is 1.557298 seconds.
running vbsbm ...
Elapsed time is 1.759527 seconds.
Fmod=20457.142416 Fsbm=19457.306022
vbsbm wins
introduction formulation results extensions
full SBM. . .
probability of edge depends only on block membership:
p(Aij |zi = µ, zj = ν) = ϑµν
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
>> vbsbm_vs_vbmod(0.5)
running vbmod ...
Elapsed time is 2.624886 seconds.
running vbsbm ...
Elapsed time is 1.440242 seconds.
Fmod=26133.351210 Fsbm=23921.797625
vbsbm wins
introduction formulation results extensions
full SBM. . .
probability of edge depends only on block membership:
p(Aij |zi = µ, zj = ν) = ϑµν
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
• Using same framework we can compare the
unconstrained and full stochastic block models via p(D|M,K*)
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
perturbation to constrained model
winpercentageforunconstrainedmodel
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
perturbation to constrained model
winpercentageforunconstrainedmodel
0 20 40 60 80 100 120
0
20
40
60
80
100
120
nz = 2100
adjacency matrix
0 20 40 60 80 100 120
0
20
40
60
80
100
120
nz = 2048
adjacency matrix
0 20 40 60 80 100 120
0
20
40
60
80
100
120
nz = 2108
adjacency matrix
introduction formulation results extensions
extensions:
model extensions
full SBM (done)
hierarchical model p(Aij = 1|zi = zi ± 1)
hierarchical modeling (ensemble of graphs)
p(D|u, K) = ΠL
i dϑi p(D|ϑi )p(ϑi |u, K)
Rd
embedding (latent are real)
more ‘rigorous’ SOM?
‘correct’ for degree (allow variable affinity) (cf. Bader, Karrer)
algorithm extensions
BP (see earlier talks)
map-reduce
cvmod (model selection via cross validation)
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
introduction formulation results extensions
extensions:
model extensions
full SBM (done)
hierarchical model p(Aij = 1|zi = zi ± 1)
hierarchical modeling (ensemble of graphs)
p(D|u, K) = ΠL
i dϑi p(D|ϑi )p(ϑi |u, K)
Rd
embedding (latent are real)
more ‘rigorous’ SOM?
‘correct’ for degree (allow variable affinity) (cf. Bader, Karrer)
algorithm extensions
BP (see earlier talks)
map-reduce
cvmod (model selection via cross validation)
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
introduction formulation results extensions
extensions:
model extensions
full SBM (done)
hierarchical model p(Aij = 1|zi = zi ± 1)
hierarchical modeling (ensemble of graphs)
p(D|u, K) = ΠL
i dϑi p(D|ϑi )p(ϑi |u, K)
Rd
embedding (latent are real)
more ‘rigorous’ SOM?
‘correct’ for degree (allow variable affinity) (cf. Bader, Karrer)
algorithm extensions
BP (see earlier talks)
map-reduce
cvmod (model selection via cross validation)
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
introduction formulation results extensions
extensions:
model extensions
full SBM (done)
hierarchical model p(Aij = 1|zi = zi ± 1)
hierarchical modeling (ensemble of graphs)
p(D|u, K) = ΠL
i dϑi p(D|ϑi )p(ϑi |u, K)
Rd
embedding (latent are real)
more ‘rigorous’ SOM?
‘correct’ for degree (allow variable affinity) (cf. Bader, Karrer)
algorithm extensions
BP (see earlier talks)
map-reduce
cvmod (model selection via cross validation)
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
Stochastic block models (Holland, Laskey, Leinhardt 1983; Wang and Wong, 1987)
i≠j
zi
zj
Aij
π
θ c
n
introduction formulation results extensions
extensions:
model extensions
full SBM (done)
hierarchical model p(Aij = 1|zi = zi ± 1)
hierarchical modeling (ensemble of graphs)
p(D|u, K) = ΠL
i dϑi p(D|ϑi )p(ϑi |u, K)
Rd
embedding (latent are real)
more ‘rigorous’ SOM?
‘correct’ for degree (allow variable affinity) (cf. Bader, Karrer)
algorithm extensions
BP (see earlier talks)
map-reduce
cvmod (model selection via cross validation)
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
Stochastic block models (Holland, Laskey, Leinhardt 1983; Wang and Wong, 1987)
i≠j
zi
zj
Aij
π
θ c
n
L
introduction formulation results extensions
extensions:
model extensions
full SBM (done)
hierarchical model p(Aij = 1|zi = zi ± 1)
hierarchical modeling (ensemble of graphs)
p(D|u, K) = ΠL
i dϑi p(D|ϑi )p(ϑi |u, K)
Rd
embedding (latent are real)
more ‘rigorous’ SOM?
‘correct’ for degree (allow variable affinity) (cf. Bader, Karrer)
algorithm extensions
BP (see earlier talks)
map-reduce
cvmod (model selection via cross validation)
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
introduction formulation results extensions
for more info. . .
code: MATLAB & python (inc. “full” SBM) (vbmod.sf.net)
paper: arxiv 08 / prl 08
Hofman soon to come (not by me)
code in C++, inc. full ‘vblabel propagation’ algo
twitter-scale analysis
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
bio+stats vbem/networks hierarchical biological challenges inference model selection
hierarchical/time series:
- biological challenges
- inference
- model selection
chris.wiggins@columbia.edu 4/23/12
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
Jan-Willem van de Meent, Ruben Gonzalez, Chris Wiggins
Columbia University
Ramakrishnan et al – http://www.mrc-lmb.cam.ac.uk/ribo/
Ramakrishnan et al – http://www.mrc-lmb.cam.ac.uk/ribo/
Ramakrishnan et al – http://www.mrc-lmb.cam.ac.uk/ribo/
Ramakrishnan et al – http://www.mrc-lmb.cam.ac.uk/ribo/
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
FRET = cy5 / (cy3 + cy5)
Tinoco and Gonzalez, Genes Dev, 2011
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
Tinoco and Gonzalez, Genes Dev, 2011
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
Unbound
EF-G bound
Tinoco and Gonzalez, Genes Dev, 2011 Fei et al, PNAS, 2009
(short-lived GS1 states correspond to an EF-G + GDPNP binding event)
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
Unbound
EF-G bound
Tinoco and Gonzalez, Genes Dev, 2011 Fei et al, PNAS, 2009
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
1. Identify states
2. Estimate Kinetic Rates
3. Average over many time series
4. Detect subpopulations
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
FRET Signal
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
FRET SignalHistogram
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
FRET SignalHistogram
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
FRET SignalHistogram
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
FRET SignalHistogram
Idea: Find probability of belonging to each state
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
Expectation Maximization
1. calculate p(z | x, θi)
2. calculate θi+1 from p(z | x, θi)
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
Log-Likelihood
L = log p(x  θ) = log
z
p(x, z  θ)
Expectation Maximization
1. calculate p(z | x, θi)
2. calculate θi+1 from p(z | x, θi)
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
Learned Truth
Accurate for occupancy of states,
not so good for rate estimates
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
p(x, z  µ, σ, π) = p(x  z, µ, σ)p(z  π)
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
probability of state depends on previous state
p(zt+ =l  zt =k) = Akl
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
p(z =k) = πk
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
p(zt+ =l  zt =k) = Akl
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
p(xt  zt = k) = N(xt  µk , σk)
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
Learned Real
We’ve learned:
parameters: θ = {µ, σ, π, A} states: p(z | x, θ)
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
2 States 3 States
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
Log-Likelihood
L = log p(x  θ) = log
z
p(x, z  θ)
Log-Evidence
L = log p(x  u) = log
z
∫ dθ p(x, z  θ)p(θu)
Log-Evidence
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
Log-Likelihood
L = log p(x  θ) = log
z
p(x, z  θ)
Log-Evidence
L = log p(x  u) = log
z
∫ dθ p(x, z  θ)p(θu)
Prior
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
Log-Likelihood
L = log p(x  θ) = log
z
p(x, z  θ)
Log-Evidence
L = log p(x  u) = log
z
∫ dθ p(x, z  θ)p(θu)
Ensemble
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
Log-Likelihood
L = log p(x  θ) = log
z
p(x, z  θ)
Log-Evidence
best model has highest average likelihood
L = log p(x  u) = log
z
∫ dθ p(x, z  θ)p(θu)
Log-Evidence
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
Log-Evidence
31
L = log p(x  u) = log
z
∫ dθ p(x, z  θ)p(θu)
Lower Bound
L = 
z
∫ dθ q(z)q(θ  w)log
p(x, z, θ  u)
q(z)q(θ  w)

≥ log p(x  u)
q(z)q(θ  w)  p(z, θ  x)
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
31
Lower bound tight for true posterior
L = 
z
∫ dθ p(z, θ  x)log
p(x, z, θ  u)
p(z, θ  x)

= 
z
∫ dθ p(z, θ  x)log[p(x  u)]
= log p(x  u)
L = log p(x  u) − Dkl [q(z)q(θ  w)  p(z, θ  x)]
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
31
We’ve learned:
parameters: q(θ | w) states: p(z | x, θ)
δLn
δq(zn)
= 
VBEM Updates
δLn
δq(θn)
=
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
31
variability: photophysical/experimental
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
31
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
31
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
31
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
31
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
31
Hierarchical Updates
∂
∂u

n
Ln =
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
31
Hierarchical Updates
∂
∂u

n
Ln =  “two-stage PEB model/CIHM”-Kass  Steffey JASA 1989
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
2. Update p(θ | u)
Until Σ Ln converges
Until Ln converges
• Update q(zn)
• Update q(θn | wn)
1. Run VBEM on each trace
δLn
δq(zn)
= 
Hierarchical UpdatesVBEM Updates
δLn
δq(θn)
= 
∂
∂u

n
Ln =
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
2. Update p(θ | u)
Until Σ Ln converges
Until Ln converges
• Update q(zn)
• Update q(θn | wn)
1. Run VBEM on each trace
We’ve learned:
p(θn, zn | xn) ≃ q(θn) q(zn)
(for each trace)
p(θ | u)
(for ensemble)
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
Unbound
EF-G bound
Tinoco and Gonzalez, Genes Dev, 2011 Fei et al, PNAS, 2009
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
ξntkl = p(znt = k, znt+ = l  xn)
1. Run mixture model on posterior counts
p(ξnA) = 
tkl
Aξntkl
kl
p(ξn um) = ∫ dA p(Aum)p(ξn  A)
2. Rerun with M x K block-diagonal form
uA
=









uA

uA


uA
M
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
τfast
τslow
2
4
8
16
32
64
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
τfast
τslow
2
4
8
16
32
64
reality
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
no EF-G 50 nM EF-G 500 nM EF-G
Fei, Bronson, Hofman, Srinivas, Wiggins, Gonzalez, PNAS, 2009
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
no EF-G 50 nM EF-G 500 nM EF-G
p(zk) ∼ e−Gk kB T
log p(zk) − log p(zl ) = −(Gk − Gl )kBT + cst.
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
no EF-G 50 nM EF-G 500 nM EF-G
p(zk) ∼ e−Gk kB T
Δ∆G=logit(p)
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
no EF-Gbound fraction and life-times
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
5 nM EF-Gbound fraction and life-times
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
50 nM EF-Gbound fraction and life-times
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
250 nM EF-Gbound fraction and life-times
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
500 nM EF-Gbound fraction and life-times
bio+stats vbem/networks hierarchical biological challenges inference model selection
model selection:
chris.wiggins@columbia.edu 4/23/12
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
Low Noise, UnderfittedInf Out - Inf In
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
Low Noise, CorrectOut vs In
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
Low Noise, OverfittedInf Out - Inf In
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
High Noise, UnderfittedInf Out - Inf In
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
High Noise, CorrectInf Out - Inf In
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
High Noise, OverfittedInf Out - Inf In
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
High Noise, OverfittedInf Out - Inf In
bio+stats vbem/networks hierarchical biological challenges inference model selection
hfret. . .
chris.wiggins@columbia.edu 4/23/12
the future, in progress:
X
bio+stats vbem/networks hierarchical biological challenges inference model selection
thanks. . .
- jake hofman (vbmod,vbfret)
- jonathan bronson (vbfret)
- jan-willem van de meent (hfret)
- ruben gonzalez (vbfret, hfret)
for more info:
- vbfret.sourceforge.net
- vbmod.sourceforge.net
- hfret.sourceforge.net (soon)
chris.wiggins@columbia.edu 4/23/12
BMC bioinformatics, 2010;
PNAS 2009;
Biophysical Journal 2009;
traditional role of statistics in biophysics
“if your experiment needs
statistics, you ought to
have done a better
experiment”
-lord rutherford

Weitere ähnliche Inhalte

Andere mochten auch

Metrics that matter: Making the business case that documentation has value
Metrics that matter: Making the business case that documentation has valueMetrics that matter: Making the business case that documentation has value
Metrics that matter: Making the business case that documentation has valuePublishing Smarter
 
Introduction to OSGi (Tokyo JUG)
Introduction to OSGi (Tokyo JUG)Introduction to OSGi (Tokyo JUG)
Introduction to OSGi (Tokyo JUG)njbartlett
 
與太陽公公的七天對話
與太陽公公的七天對話與太陽公公的七天對話
與太陽公公的七天對話YesYou GotIt
 
Projeto: Uma careta para as drogas!
Projeto: Uma careta para as drogas!Projeto: Uma careta para as drogas!
Projeto: Uma careta para as drogas!Ivonilde Lima
 
Benchmarking the Accounting & Finance Function: 2014 Summary Presentation
Benchmarking the Accounting & Finance Function: 2014 Summary PresentationBenchmarking the Accounting & Finance Function: 2014 Summary Presentation
Benchmarking the Accounting & Finance Function: 2014 Summary PresentationRobert Half
 
Agile & Lean for Project Delivery
Agile & Lean for Project DeliveryAgile & Lean for Project Delivery
Agile & Lean for Project DeliveryMMT Digital
 
New Dell Notebooks
New Dell NotebooksNew Dell Notebooks
New Dell Notebooksjames bond
 

Andere mochten auch (8)

Metrics that matter: Making the business case that documentation has value
Metrics that matter: Making the business case that documentation has valueMetrics that matter: Making the business case that documentation has value
Metrics that matter: Making the business case that documentation has value
 
Introduction to OSGi (Tokyo JUG)
Introduction to OSGi (Tokyo JUG)Introduction to OSGi (Tokyo JUG)
Introduction to OSGi (Tokyo JUG)
 
與太陽公公的七天對話
與太陽公公的七天對話與太陽公公的七天對話
與太陽公公的七天對話
 
Projeto: Uma careta para as drogas!
Projeto: Uma careta para as drogas!Projeto: Uma careta para as drogas!
Projeto: Uma careta para as drogas!
 
Benchmarking the Accounting & Finance Function: 2014 Summary Presentation
Benchmarking the Accounting & Finance Function: 2014 Summary PresentationBenchmarking the Accounting & Finance Function: 2014 Summary Presentation
Benchmarking the Accounting & Finance Function: 2014 Summary Presentation
 
Agile & Lean for Project Delivery
Agile & Lean for Project DeliveryAgile & Lean for Project Delivery
Agile & Lean for Project Delivery
 
Student equity: policy and practice
Student equity: policy and practiceStudent equity: policy and practice
Student equity: policy and practice
 
New Dell Notebooks
New Dell NotebooksNew Dell Notebooks
New Dell Notebooks
 

Ähnlich wie variational bayes in biophysics

EDBT 2015: Summer School Overview
EDBT 2015: Summer School OverviewEDBT 2015: Summer School Overview
EDBT 2015: Summer School Overviewdgarijo
 
Knowledge graphs for knowing more and knowing for sure
Knowledge graphs for knowing more and knowing for sureKnowledge graphs for knowing more and knowing for sure
Knowledge graphs for knowing more and knowing for sureSteffen Staab
 
Lecture17 xing fei-fei
Lecture17 xing fei-feiLecture17 xing fei-fei
Lecture17 xing fei-feiTianlu Wang
 
Learning to assess Linked Data relationships using Genetic Programming
Learning to assess Linked Data relationships using Genetic ProgrammingLearning to assess Linked Data relationships using Genetic Programming
Learning to assess Linked Data relationships using Genetic ProgrammingVrije Universiteit Amsterdam
 
algorithmic-decisions, fairness, machine learning, provenance, transparency
algorithmic-decisions, fairness, machine learning, provenance, transparencyalgorithmic-decisions, fairness, machine learning, provenance, transparency
algorithmic-decisions, fairness, machine learning, provenance, transparencyPaolo Missier
 
Reviews on Deep Generative Models in the early days / GANs & VAEs paper review
Reviews on Deep Generative Models in the early days / GANs & VAEs paper reviewReviews on Deep Generative Models in the early days / GANs & VAEs paper review
Reviews on Deep Generative Models in the early days / GANs & VAEs paper reviewchangedaeoh
 
MOMENT: Temporal Meta-Fact Generation and Propagation in Knowledge Graphs
MOMENT: Temporal Meta-Fact Generation and Propagation in Knowledge GraphsMOMENT: Temporal Meta-Fact Generation and Propagation in Knowledge Graphs
MOMENT: Temporal Meta-Fact Generation and Propagation in Knowledge GraphsParis Sud University
 
The Nucleon Parton Distribution Functions from Lattice QCD
The Nucleon Parton Distribution Functions from Lattice QCDThe Nucleon Parton Distribution Functions from Lattice QCD
The Nucleon Parton Distribution Functions from Lattice QCDChristos Kallidonis
 
Output Units and Cost Function in FNN
Output Units and Cost Function in FNNOutput Units and Cost Function in FNN
Output Units and Cost Function in FNNLin JiaMing
 
slides of ABC talk at i-like workshop, Warwick, May 16
slides of ABC talk at i-like workshop, Warwick, May 16slides of ABC talk at i-like workshop, Warwick, May 16
slides of ABC talk at i-like workshop, Warwick, May 16Christian Robert
 
Studio 4 - workshop introduction
Studio 4 - workshop introductionStudio 4 - workshop introduction
Studio 4 - workshop introductionDanil Nagy
 
Object Detection Beyond Mask R-CNN and RetinaNet III
Object Detection Beyond Mask R-CNN and RetinaNet IIIObject Detection Beyond Mask R-CNN and RetinaNet III
Object Detection Beyond Mask R-CNN and RetinaNet IIIWanjin Yu
 
VLDB 2015 Tutorial: On Uncertain Graph Modeling and Queries
VLDB 2015 Tutorial: On Uncertain Graph Modeling and QueriesVLDB 2015 Tutorial: On Uncertain Graph Modeling and Queries
VLDB 2015 Tutorial: On Uncertain Graph Modeling and QueriesArijit Khan
 

Ähnlich wie variational bayes in biophysics (20)

EDBT 2015: Summer School Overview
EDBT 2015: Summer School OverviewEDBT 2015: Summer School Overview
EDBT 2015: Summer School Overview
 
Knowledge graphs for knowing more and knowing for sure
Knowledge graphs for knowing more and knowing for sureKnowledge graphs for knowing more and knowing for sure
Knowledge graphs for knowing more and knowing for sure
 
Lecture17 xing fei-fei
Lecture17 xing fei-feiLecture17 xing fei-fei
Lecture17 xing fei-fei
 
Learning to assess Linked Data relationships using Genetic Programming
Learning to assess Linked Data relationships using Genetic ProgrammingLearning to assess Linked Data relationships using Genetic Programming
Learning to assess Linked Data relationships using Genetic Programming
 
algorithmic-decisions, fairness, machine learning, provenance, transparency
algorithmic-decisions, fairness, machine learning, provenance, transparencyalgorithmic-decisions, fairness, machine learning, provenance, transparency
algorithmic-decisions, fairness, machine learning, provenance, transparency
 
Reviews on Deep Generative Models in the early days / GANs & VAEs paper review
Reviews on Deep Generative Models in the early days / GANs & VAEs paper reviewReviews on Deep Generative Models in the early days / GANs & VAEs paper review
Reviews on Deep Generative Models in the early days / GANs & VAEs paper review
 
MOMENT: Temporal Meta-Fact Generation and Propagation in Knowledge Graphs
MOMENT: Temporal Meta-Fact Generation and Propagation in Knowledge GraphsMOMENT: Temporal Meta-Fact Generation and Propagation in Knowledge Graphs
MOMENT: Temporal Meta-Fact Generation and Propagation in Knowledge Graphs
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
The Nucleon Parton Distribution Functions from Lattice QCD
The Nucleon Parton Distribution Functions from Lattice QCDThe Nucleon Parton Distribution Functions from Lattice QCD
The Nucleon Parton Distribution Functions from Lattice QCD
 
Output Units and Cost Function in FNN
Output Units and Cost Function in FNNOutput Units and Cost Function in FNN
Output Units and Cost Function in FNN
 
Why Data Science is a Science
Why Data Science is a ScienceWhy Data Science is a Science
Why Data Science is a Science
 
K-Means Algorithm
K-Means AlgorithmK-Means Algorithm
K-Means Algorithm
 
slides of ABC talk at i-like workshop, Warwick, May 16
slides of ABC talk at i-like workshop, Warwick, May 16slides of ABC talk at i-like workshop, Warwick, May 16
slides of ABC talk at i-like workshop, Warwick, May 16
 
Studio 4 - workshop introduction
Studio 4 - workshop introductionStudio 4 - workshop introduction
Studio 4 - workshop introduction
 
Lecture10 xing
Lecture10 xingLecture10 xing
Lecture10 xing
 
Object Detection Beyond Mask R-CNN and RetinaNet III
Object Detection Beyond Mask R-CNN and RetinaNet IIIObject Detection Beyond Mask R-CNN and RetinaNet III
Object Detection Beyond Mask R-CNN and RetinaNet III
 
Deep Generative Learning for All
Deep Generative Learning for AllDeep Generative Learning for All
Deep Generative Learning for All
 
Introduction to Genetic Algorithms
Introduction to Genetic AlgorithmsIntroduction to Genetic Algorithms
Introduction to Genetic Algorithms
 
VLDB 2015 Tutorial: On Uncertain Graph Modeling and Queries
VLDB 2015 Tutorial: On Uncertain Graph Modeling and QueriesVLDB 2015 Tutorial: On Uncertain Graph Modeling and Queries
VLDB 2015 Tutorial: On Uncertain Graph Modeling and Queries
 
Interactive High-Dimensional Visualization of Social Graphs
Interactive High-Dimensional Visualization of Social GraphsInteractive High-Dimensional Visualization of Social Graphs
Interactive High-Dimensional Visualization of Social Graphs
 

Mehr von chris wiggins

data science at the new york times
data science at the new york timesdata science at the new york times
data science at the new york timeschris wiggins
 
"data hum: a core approach to the ethics of data"
"data hum: a core approach to the ethics of data""data hum: a core approach to the ethics of data"
"data hum: a core approach to the ethics of data"chris wiggins
 
"data: past, present, and future" day 1 lecture 2020-01-20
"data: past, present, and future" day 1 lecture 2020-01-20"data: past, present, and future" day 1 lecture 2020-01-20
"data: past, present, and future" day 1 lecture 2020-01-20chris wiggins
 
a mission-driven approach to personalizing the customer journey
a mission-driven approach to personalizing the customer journeya mission-driven approach to personalizing the customer journey
a mission-driven approach to personalizing the customer journeychris wiggins
 
Data Science at The New York Times: what industry can learn from us; what we ...
Data Science at The New York Times: what industry can learn from us; what we ...Data Science at The New York Times: what industry can learn from us; what we ...
Data Science at The New York Times: what industry can learn from us; what we ...chris wiggins
 
Data Science at The New York Times
Data Science at The New York TimesData Science at The New York Times
Data Science at The New York Timeschris wiggins
 
history and ethics of data
history and ethics of datahistory and ethics of data
history and ethics of datachris wiggins
 
"data: past, present, and future" lecture 1 (intro) 1/22/19
"data: past, present, and future" lecture 1 (intro) 1/22/19"data: past, present, and future" lecture 1 (intro) 1/22/19
"data: past, present, and future" lecture 1 (intro) 1/22/19chris wiggins
 
"data: past, present, and future" lab 2 (EDA) notes by Prof. Matt Jones
"data: past, present, and future" lab 2 (EDA) notes by Prof. Matt Jones"data: past, present, and future" lab 2 (EDA) notes by Prof. Matt Jones
"data: past, present, and future" lab 2 (EDA) notes by Prof. Matt Joneschris wiggins
 
Data: Past, Present, and Future (Cornell Digital Life Seminar on Data Literac...
Data: Past, Present, and Future (Cornell Digital Life Seminar on Data Literac...Data: Past, Present, and Future (Cornell Digital Life Seminar on Data Literac...
Data: Past, Present, and Future (Cornell Digital Life Seminar on Data Literac...chris wiggins
 
Data: Past, Present, and Future (Lecture 1, Spring 2018)
Data: Past, Present, and Future (Lecture 1, Spring 2018)Data: Past, Present, and Future (Lecture 1, Spring 2018)
Data: Past, Present, and Future (Lecture 1, Spring 2018)chris wiggins
 
data science: past present & future [American Statistical Association (ASA) C...
data science: past present & future [American Statistical Association (ASA) C...data science: past present & future [American Statistical Association (ASA) C...
data science: past present & future [American Statistical Association (ASA) C...chris wiggins
 
Machine Learning Summer School 2016
Machine Learning Summer School 2016Machine Learning Summer School 2016
Machine Learning Summer School 2016chris wiggins
 
lean + design thinking in building data products
lean + design thinking in building data productslean + design thinking in building data products
lean + design thinking in building data productschris wiggins
 
data science @NYT ; inaugural Data Science Initiative Lecture
data science @NYT ; inaugural Data Science Initiative Lecturedata science @NYT ; inaugural Data Science Initiative Lecture
data science @NYT ; inaugural Data Science Initiative Lecturechris wiggins
 
data history / data science @ NYT
data history / data science @ NYTdata history / data science @ NYT
data history / data science @ NYTchris wiggins
 
data science history / data science @ NYT
data science history / data science @ NYTdata science history / data science @ NYT
data science history / data science @ NYTchris wiggins
 
data science: past, present, and future
data science: past, present, and futuredata science: past, present, and future
data science: past, present, and futurechris wiggins
 
Chris Wiggins: "engagement & reality"
Chris Wiggins: "engagement & reality"Chris Wiggins: "engagement & reality"
Chris Wiggins: "engagement & reality"chris wiggins
 
intro data science at NYT 2015-01-22
intro data science at NYT 2015-01-22intro data science at NYT 2015-01-22
intro data science at NYT 2015-01-22chris wiggins
 

Mehr von chris wiggins (20)

data science at the new york times
data science at the new york timesdata science at the new york times
data science at the new york times
 
"data hum: a core approach to the ethics of data"
"data hum: a core approach to the ethics of data""data hum: a core approach to the ethics of data"
"data hum: a core approach to the ethics of data"
 
"data: past, present, and future" day 1 lecture 2020-01-20
"data: past, present, and future" day 1 lecture 2020-01-20"data: past, present, and future" day 1 lecture 2020-01-20
"data: past, present, and future" day 1 lecture 2020-01-20
 
a mission-driven approach to personalizing the customer journey
a mission-driven approach to personalizing the customer journeya mission-driven approach to personalizing the customer journey
a mission-driven approach to personalizing the customer journey
 
Data Science at The New York Times: what industry can learn from us; what we ...
Data Science at The New York Times: what industry can learn from us; what we ...Data Science at The New York Times: what industry can learn from us; what we ...
Data Science at The New York Times: what industry can learn from us; what we ...
 
Data Science at The New York Times
Data Science at The New York TimesData Science at The New York Times
Data Science at The New York Times
 
history and ethics of data
history and ethics of datahistory and ethics of data
history and ethics of data
 
"data: past, present, and future" lecture 1 (intro) 1/22/19
"data: past, present, and future" lecture 1 (intro) 1/22/19"data: past, present, and future" lecture 1 (intro) 1/22/19
"data: past, present, and future" lecture 1 (intro) 1/22/19
 
"data: past, present, and future" lab 2 (EDA) notes by Prof. Matt Jones
"data: past, present, and future" lab 2 (EDA) notes by Prof. Matt Jones"data: past, present, and future" lab 2 (EDA) notes by Prof. Matt Jones
"data: past, present, and future" lab 2 (EDA) notes by Prof. Matt Jones
 
Data: Past, Present, and Future (Cornell Digital Life Seminar on Data Literac...
Data: Past, Present, and Future (Cornell Digital Life Seminar on Data Literac...Data: Past, Present, and Future (Cornell Digital Life Seminar on Data Literac...
Data: Past, Present, and Future (Cornell Digital Life Seminar on Data Literac...
 
Data: Past, Present, and Future (Lecture 1, Spring 2018)
Data: Past, Present, and Future (Lecture 1, Spring 2018)Data: Past, Present, and Future (Lecture 1, Spring 2018)
Data: Past, Present, and Future (Lecture 1, Spring 2018)
 
data science: past present & future [American Statistical Association (ASA) C...
data science: past present & future [American Statistical Association (ASA) C...data science: past present & future [American Statistical Association (ASA) C...
data science: past present & future [American Statistical Association (ASA) C...
 
Machine Learning Summer School 2016
Machine Learning Summer School 2016Machine Learning Summer School 2016
Machine Learning Summer School 2016
 
lean + design thinking in building data products
lean + design thinking in building data productslean + design thinking in building data products
lean + design thinking in building data products
 
data science @NYT ; inaugural Data Science Initiative Lecture
data science @NYT ; inaugural Data Science Initiative Lecturedata science @NYT ; inaugural Data Science Initiative Lecture
data science @NYT ; inaugural Data Science Initiative Lecture
 
data history / data science @ NYT
data history / data science @ NYTdata history / data science @ NYT
data history / data science @ NYT
 
data science history / data science @ NYT
data science history / data science @ NYTdata science history / data science @ NYT
data science history / data science @ NYT
 
data science: past, present, and future
data science: past, present, and futuredata science: past, present, and future
data science: past, present, and future
 
Chris Wiggins: "engagement & reality"
Chris Wiggins: "engagement & reality"Chris Wiggins: "engagement & reality"
Chris Wiggins: "engagement & reality"
 
intro data science at NYT 2015-01-22
intro data science at NYT 2015-01-22intro data science at NYT 2015-01-22
intro data science at NYT 2015-01-22
 

Kürzlich hochgeladen

Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersChitralekhaTherkar
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 

Kürzlich hochgeladen (20)

Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of Powders
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 

variational bayes in biophysics

  • 1. bio+stats vbem/networks hierarchical variational and hierarchical modeling for biological data chris wiggins columbia april 23, 2012 chris.wiggins@columbia.edu 4/23/12 Chris Wiggins • APAM: Department of Applied Physics and Applied Mathematics; • C2B2: Center for Computational Biology and Bioinformatics; • CISB: Columbia University Initiative in Systems Biology • ISDE: Institute for Data Sciences and Engineering Columbia University September 28, 2012
  • 2. bio+stats vbem/networks hierarchical biological challenges inference model selection thanks. . . - jake hofman (vbmod,vbfret) - jonathan bronson (vbfret) - jan-willem van de meent (hfret) - ruben gonzalez (vbfret, hfret) for more info: - vbfret.sourceforge.net - vbmod.sourceforge.net - hfret.sourceforge.net (soon) chris.wiggins@columbia.edu 4/23/12 BMC bioinformatics, 2010; PNAS 2009; Biophysical Journal 2009;
  • 3. bio+stats vbem/networks hierarchical 1 biology and statistics genomics generative modeling 2 variational/biological networks variational Bayesian expectation maximization inference model selection 3 hierarchical/time series biological challenges inference model selection chris.wiggins@columbia.edu 4/23/12
  • 4. bio+stats vbem/networks hierarchical genomics generative modeling biology and statistics: chris.wiggins@columbia.edu 4/23/12
  • 5. bio+stats vbem/networks hierarchical genomics generative modeling genomics: chris.wiggins@columbia.edu 4/23/12 plos comp bio `10, nyas `07, bmc bioinfo `06a, bmc bioinfo `06b, bioinfo `04, regulatory genomics 04, IEEE `05; NIPS (MLCB`03, `06 )
  • 6. bio+stats vbem/networks hierarchical genomics generative modeling generative modeling: chris.wiggins@columbia.edu 4/23/12 bmc bioinfo '10, PNAS`09, biophys j`09, PNAS`07, PNAS`06 NIPS (MLCB '09, '10, '11) IEEE sig. proc. `12; plos one `08,cell `07,prl `06, jcs `06, biophys j `06 pami `08, prl `08, NIPS (MLCB08), PNAS `05, bmc bioinfo `04
  • 7. bio+stats vbem/networks hierarchical vbem inference model selection variational/biological networks: - variational bayesian expectation maximization - inference - model selection chris.wiggins@columbia.edu 4/23/12
  • 8. introduction formulation results extensions motivation history introduction: chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net Hartwell, Hopfield, Leibler and Murray NATURE|VOL 402 | SUPP | 2 DECEMBER 1999 | www.nature.com
  • 9. introduction formulation results extensions motivation history motivation: community detection in networks social networks biological networks problem: over-fitting/resolution limit chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
  • 10. introduction formulation results extensions motivation history history: by community math/cs: spectral methods (Fiedler ’74, Shi + Malik ’00) math/cs: clustering generally (Taskar, Koller, Getoor) physics: modularity common thread: test w/ stochastic block model (’76, ’83) ergo: use as inference tool (Hastings 0604429, Newman+Liecht 061148) chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
  • 11. introduction formulation results extensions generative model max likelihood max evidence algo formulation: generative model maximum likelihood maximum evidence complexity control. . . variational/mft. . . algorithm in physics: “test hamiltonian” in ML “variational bayesian methods” (Jordan, Mackay) chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
  • 12. introduction formulation results extensions generative model max likelihood max evidence algo generative model: foreach node roll K-sided die with bias π to choose zi {1, . . . , K} foreach edge flip coin with bias ϑ+ if zi = zj , else ϑ− draw edge if coin lands heads up chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net Stochastic block models (Holland, Laskey, Leinhardt 1983; Wang and Wong, 1987) i≠j zi zj Aij π θ
  • 13. introduction formulation results extensions generative model max likelihood max evidence algo generative model. . . (bis) chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net Die rolling, coin flipping, and priors: where counts are: non-edges within modules edges within modules edges between modules non-edges between modules nodes in each module
  • 14. introduction formulation results extensions generative model max likelihood max evidence algo max likelihood: chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net H ⇥ ln p(A, ⌦z|⌦⇤, ⌦⇥) = i,j (JLAij JG) zi,zj + K µ=1 hµ N i=1 zi,µ JG ⇥ ln ⇥c/⇥d JL ⇥ ln(1 ⇥d)/(1 ⇥c) + JG hµ ⇥ ln µ Extends Newman (2004, 2006), Hastings (2006), Bornholdt & Reichardt (2006) •Die rolling, coin flipping <-> infinite-range spin-glass Potts model:
  • 15. introduction formulation results extensions generative model max likelihood max evidence algo formulation: generative model maximum likelihood maximum evidence complexity control. . . variational/mft. . . algorithm in physics: “test hamiltonian” in ML “variational bayesian methods” (Jordan, Mackay) chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
  • 16. introduction formulation results extensions generative model max likelihood max evidence algo formulation: generative model maximum likelihood maximum evidence complexity control. . . variational/mft. . . algorithm in physics: “test hamiltonian” in ML “variational bayesian methods” (Jordan, Mackay) chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
  • 17. introduction formulation results extensions generative model max likelihood max evidence algo max evidence: chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net Increasing complexity
  • 18. introduction formulation results extensions generative model max likelihood max evidence algo max evidence: chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net http://research.microsoft.com/~minka/statlearn/demo/
  • 19. introduction formulation results extensions generative model max likelihood max evidence algo max evidence: chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
  • 20. introduction formulation results extensions generative model max likelihood max evidence algo max evidence: chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net cf. “BIC” Schwartz, 1978
  • 21. introduction formulation results extensions generative model max likelihood max evidence algo generative model: foreach node roll K-sided die with bias π to choose zi {1, . . . , K} foreach edge flip coin with bias ϑ+ if zi = zj , else ϑ− draw edge if coin lands heads up chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net Stochastic block models (Holland, Laskey, Leinhardt 1983; Wang and Wong, 1987) i≠j zi zj Aij π θ
  • 22. introduction formulation results extensions generative model max likelihood max evidence algo generative model: foreach node roll K-sided die with bias π to choose zi {1, . . . , K} foreach edge flip coin with bias ϑ+ if zi = zj , else ϑ− draw edge if coin lands heads up chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net Stochastic block models (Holland, Laskey, Leinhardt 1983; Wang and Wong, 1987) i≠j zi zj Aij π θ c n
  • 23. introduction formulation results extensions generative model max likelihood max evidence algo generative model. . . (bis) chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net Die rolling, coin flipping, and priors: where counts are: non-edges within modules edges within modules edges between modules non-edges between modules nodes in each module
  • 24. introduction formulation results extensions generative model max likelihood max evidence algo generative model. . . (bis) chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net Die rolling, coin flipping, and priors: where counts are: non-edges within modules edges within modules edges between modules non-edges between modules nodes in each module
  • 25. introduction formulation results extensions generative model max likelihood max evidence algo max likelihood: chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net Extends Newman (2004, 2006), Hastings (2006), Bornholdt & Reichardt (2006) •Die rolling, coin flipping <-> infinite-range spin-glass Potts model: •Infer distributions over spin assignments, coupling constants, and chemical potentials and find number of occupied spin states JG ⇥ ln ⇥c/⇥d JL ⇥ ln(1 ⇥d)/(1 ⇥c) + JG hµ ⇥ ln µ •Die rolling, coin flipping <-> infinite-range spin-glass Potts model: •Infer distributions over spin assignments, coupling constants, and chemical potentials and find number of occupied spin states H ⇥ ln p(A, ⌦z|⌦⇤, ⌦⇥) = i,j (JLAij JG) zi,zj + K µ=1 hµ N i=1 zi,µ
  • 26. introduction formulation results extensions generative model max likelihood max evidence algo max evidence: chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net Extends Newman (2004, 2006), Hastings (2006), Bornholdt & Reichardt (2006) •Die rolling, coin flipping <-> infinite-range spin-glass Potts model: •Infer distributions over spin assignments, coupling constants, and chemical potentials and find number of occupied spin states H ⇥ ln p(A, ⌦z|⌦⇤, ⌦⇥) = i,j (JLAij JG) zi,zj + K µ=1 hµ N i=1 zi,µ JG ⇥ ln ⇥c/⇥d JL ⇥ ln(1 ⇥d)/(1 ⇥c) + JG hµ ⇥ ln µ p(A|K) = ⇥z ⇥ d⌦ ⇥ d⌦⇥ p(A,⌦z,⌦⇥, ⌦) = ⇥z ⇥ d⌦ ⇥ d⌦⇥ e H p(⌦)p(⌦⇥)
  • 27. introduction formulation results extensions generative model max likelihood max evidence algo max evidence: chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net Extends Newman (2004, 2006), Hastings (2006), Bornholdt & Reichardt (2004 & 2006) •Die rolling, coin flipping <-> infinite-range spin-glass Potts model: •Infer distributions over spin assignments, coupling constants, and chemical potentials and find number of occupied spin states H ⇥ ln p(A, ⌦z|⌦⇤, ⌦⇥) = i,j (JLAij JG) zi,zj + K µ=1 hµ N i=1 zi,µ JG ⇥ ln ⇥c/⇥d JL ⇥ ln(1 ⇥d)/(1 ⇥c) + JG hµ ⇥ ln µ p(A|K) = ⇥z ⇥ d⌦ ⇥ d⌦⇥ p(A,⌦z,⌦⇥, ⌦) = ⇥z ⇥ d⌦ ⇥ d⌦⇥ e H p(⌦)p(⌦⇥) Can do integrals, but sum is intractable, O(KN); use mean-field
  • 28. introduction formulation results extensions generative model max likelihood max evidence algo max evidence: chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net • Gibbs’/Jensen’s inequality (log of expected value bounds expected value of log) for any distribution q p(A|K) = ⇥z ⇥ d⌦ ⇥ d⌦⇥ p(A,⌦z,⌦⇥, ⌦) = ⇥z ⇥ d⌦ ⇥ d⌦⇥ e H p(⌦)p(⌦⇥) Variational Bayes (MacKay, Jordan, Ghahramani, Jaakola, Saul 1999; cf. Feynman 1972)
  • 29. introduction formulation results extensions generative model max likelihood max evidence algo max evidence: chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net why would you do this? (A1): Beal, 2003
  • 30. introduction formulation results extensions generative model max likelihood max evidence algo max evidence: chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net why would you do this? (A2): Beal, 2003
  • 31. introduction formulation results extensions generative model max likelihood max evidence algo max evidence: chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net why would you do this? (A3): Beal, 2003
  • 32. introduction formulation results extensions generative model max likelihood max evidence algo max evidence: chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net • Gibbs’/Jensen’s inequality (log of expected value bounds expected value of log) for any distribution q Variational Bayes (MacKay, Jordan, Ghahramani, Jaakola, Saul 1999; cf. Feynman 1972) • F is a functional of q; find approximation to posterior by optimizing approximation to evidence • Take q(z, π, θ)=q(z)q(π)q(θ); Qiμ is probability node i in module μ where expected counts are:
  • 33. introduction formulation results extensions generative model max likelihood max evidence algo algo: chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net where expected counts are:
  • 34. introduction formulation results extensions generative model max likelihood max evidence algo algo: chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net where expected counts are:
  • 35. introduction formulation results extensions generative model max likelihood max evidence algo algo: chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
  • 36. introduction formulation results extensions generative model max likelihood max evidence algo algo: chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net suggests hard limit in step 3; sparse in step 1
  • 37. introduction formulation results extensions run time consistency good vs easy real data results: run time consistency required plot: good vs. easy real data karate biology american football chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
  • 38. introduction formulation results extensions run time consistency good vs easy real data run time: chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net • Main loop runtime for 104 nodes in MATLAB ~30 seconds
  • 39. introduction formulation results extensions run time consistency good vs easy real data consistency: chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
  • 40. introduction formulation results extensions run time consistency good vs easy real data consistency: chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 2 4 6 8 10 12 θ N=8, K=2, distribution after 2 iterations p(θ+ ) p(θ − )
  • 41. introduction formulation results extensions run time consistency good vs easy real data consistency: chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net • K=4? • Automatic complexity control: probability of occupation for extraneous modules goes to zero
  • 42. introduction formulation results extensions run time consistency good vs easy real data consistency: chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net • K=4? • Automatic complexity control: probability of occupation for extraneous modules goes to zero
  • 43. introduction formulation results extensions run time consistency good vs easy real data consistency: chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net The “resolution limit” problem 1 2 3 4 5 57 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 4041 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 58 59 60 1 2 3 4 5 57 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 4041 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 58 59 60 Variational Bayes Girvan-Newman modularity
  • 44. introduction formulation results extensions run time consistency good vs easy real data consistency: chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net The “resolution limit” problem 10 12 14 16 18 20 8 10 12 14 16 18 20 Ktrue K* K * =Ktrue Variational Bayes Modularity optimization 10 12 14 16 18 20 0.72 0.74 0.76 0.78 0.8 0.82 0.84 Ktrue GNmodularity Resolution limit problem on ring of 4−node cliques Single−clique communities (correct) Double−clique communities (incorrect) GN modularity (Clauset’s algorithm) Girvan-Newman modularity or Potts model w/ fixed parameters suffers from a resolution limit, where size of detected modules depends on network size Fortunato et. al. (2007), Kumpula et. al. (2007),
  • 45. introduction formulation results extensions run time consistency good vs easy real data good vs easy: chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
  • 46. introduction formulation results extensions run time consistency good vs easy real data real data: chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net • Correctly infer K=12 conferences Validation: NCAA football schedule 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 nodes: teams edges: games shape: conference color: inferred module
  • 47. introduction formulation results extensions run time consistency good vs easy real data real data: chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net APS march meeting 2008 superconductivity (experimentalists) Nanotubes, Graphene superconductivity (theorists)
  • 48. introduction formulation results extensions extensions: model extensions full SBM (done) hierarchical model p(Aij = 1|zi = zi ± 1) hierarchical modeling (ensemble of graphs) p(D|u, K) = ΠL i dϑi p(D|ϑi )p(ϑi |u, K) Rd embedding (latent are real) more ‘rigorous’ SOM? ‘correct’ for degree (allow variable affinity) (cf. Bader, Karrer) algorithm extensions BP (see earlier talks) map-reduce cvmod (model selection via cross validation) chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
  • 49. introduction formulation results extensions full SBM. . . probability of edge depends only on block membership: p(Aij |zi = µ, zj = ν) = ϑµν chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
  • 50. introduction formulation results extensions full SBM. . . probability of edge depends only on block membership: p(Aij |zi = µ, zj = ν) = ϑµν chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net • Nodes belong to “blocks” of varying size • Roll die for assignment of nodes to blocks • Probability of edge between two nodes depends only on block membership • Flip (one of K2) coins for edges • Result: mixture of Erdos-Renyi graphs 0 20 40 60 80 100 120 0 20 40 60 80 100 120 nz = 2275 adjacency matrix
  • 51. introduction formulation results extensions full SBM. . . probability of edge depends only on block membership: p(Aij |zi = µ, zj = ν) = ϑµν chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net vs
  • 52. introduction formulation results extensions full SBM. . . probability of edge depends only on block membership: p(Aij |zi = µ, zj = ν) = ϑµν chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net 0 20 40 60 80 100 120 0 20 40 60 80 100 120 nz = 2803 adjacency matrix
  • 53. introduction formulation results extensions full SBM. . . probability of edge depends only on block membership: p(Aij |zi = µ, zj = ν) = ϑµν chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net vs
  • 54. introduction formulation results extensions full SBM. . . probability of edge depends only on block membership: p(Aij |zi = µ, zj = ν) = ϑµν chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net >> vbsbm_vs_vbmod(0) running vbmod ... Elapsed time is 1.136925 seconds. running vbsbm ... Elapsed time is 1.398904 seconds. Fmod=13089.158019 Fsbm=13144.445782 vbmod wins
  • 55. introduction formulation results extensions full SBM. . . probability of edge depends only on block membership: p(Aij |zi = µ, zj = ν) = ϑµν chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net >> vbsbm_vs_vbmod(0.25) running vbmod ... Elapsed time is 1.557298 seconds. running vbsbm ... Elapsed time is 1.759527 seconds. Fmod=20457.142416 Fsbm=19457.306022 vbsbm wins
  • 56. introduction formulation results extensions full SBM. . . probability of edge depends only on block membership: p(Aij |zi = µ, zj = ν) = ϑµν chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net >> vbsbm_vs_vbmod(0.5) running vbmod ... Elapsed time is 2.624886 seconds. running vbsbm ... Elapsed time is 1.440242 seconds. Fmod=26133.351210 Fsbm=23921.797625 vbsbm wins
  • 57. introduction formulation results extensions full SBM. . . probability of edge depends only on block membership: p(Aij |zi = µ, zj = ν) = ϑµν chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net • Using same framework we can compare the unconstrained and full stochastic block models via p(D|M,K*) 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 perturbation to constrained model winpercentageforunconstrainedmodel 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 perturbation to constrained model winpercentageforunconstrainedmodel 0 20 40 60 80 100 120 0 20 40 60 80 100 120 nz = 2100 adjacency matrix 0 20 40 60 80 100 120 0 20 40 60 80 100 120 nz = 2048 adjacency matrix 0 20 40 60 80 100 120 0 20 40 60 80 100 120 nz = 2108 adjacency matrix
  • 58. introduction formulation results extensions extensions: model extensions full SBM (done) hierarchical model p(Aij = 1|zi = zi ± 1) hierarchical modeling (ensemble of graphs) p(D|u, K) = ΠL i dϑi p(D|ϑi )p(ϑi |u, K) Rd embedding (latent are real) more ‘rigorous’ SOM? ‘correct’ for degree (allow variable affinity) (cf. Bader, Karrer) algorithm extensions BP (see earlier talks) map-reduce cvmod (model selection via cross validation) chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
  • 59. introduction formulation results extensions extensions: model extensions full SBM (done) hierarchical model p(Aij = 1|zi = zi ± 1) hierarchical modeling (ensemble of graphs) p(D|u, K) = ΠL i dϑi p(D|ϑi )p(ϑi |u, K) Rd embedding (latent are real) more ‘rigorous’ SOM? ‘correct’ for degree (allow variable affinity) (cf. Bader, Karrer) algorithm extensions BP (see earlier talks) map-reduce cvmod (model selection via cross validation) chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
  • 60. introduction formulation results extensions extensions: model extensions full SBM (done) hierarchical model p(Aij = 1|zi = zi ± 1) hierarchical modeling (ensemble of graphs) p(D|u, K) = ΠL i dϑi p(D|ϑi )p(ϑi |u, K) Rd embedding (latent are real) more ‘rigorous’ SOM? ‘correct’ for degree (allow variable affinity) (cf. Bader, Karrer) algorithm extensions BP (see earlier talks) map-reduce cvmod (model selection via cross validation) chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
  • 61. introduction formulation results extensions extensions: model extensions full SBM (done) hierarchical model p(Aij = 1|zi = zi ± 1) hierarchical modeling (ensemble of graphs) p(D|u, K) = ΠL i dϑi p(D|ϑi )p(ϑi |u, K) Rd embedding (latent are real) more ‘rigorous’ SOM? ‘correct’ for degree (allow variable affinity) (cf. Bader, Karrer) algorithm extensions BP (see earlier talks) map-reduce cvmod (model selection via cross validation) chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net Stochastic block models (Holland, Laskey, Leinhardt 1983; Wang and Wong, 1987) i≠j zi zj Aij π θ c n
  • 62. introduction formulation results extensions extensions: model extensions full SBM (done) hierarchical model p(Aij = 1|zi = zi ± 1) hierarchical modeling (ensemble of graphs) p(D|u, K) = ΠL i dϑi p(D|ϑi )p(ϑi |u, K) Rd embedding (latent are real) more ‘rigorous’ SOM? ‘correct’ for degree (allow variable affinity) (cf. Bader, Karrer) algorithm extensions BP (see earlier talks) map-reduce cvmod (model selection via cross validation) chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net Stochastic block models (Holland, Laskey, Leinhardt 1983; Wang and Wong, 1987) i≠j zi zj Aij π θ c n L
  • 63. introduction formulation results extensions extensions: model extensions full SBM (done) hierarchical model p(Aij = 1|zi = zi ± 1) hierarchical modeling (ensemble of graphs) p(D|u, K) = ΠL i dϑi p(D|ϑi )p(ϑi |u, K) Rd embedding (latent are real) more ‘rigorous’ SOM? ‘correct’ for degree (allow variable affinity) (cf. Bader, Karrer) algorithm extensions BP (see earlier talks) map-reduce cvmod (model selection via cross validation) chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
  • 64. introduction formulation results extensions for more info. . . code: MATLAB & python (inc. “full” SBM) (vbmod.sf.net) paper: arxiv 08 / prl 08 Hofman soon to come (not by me) code in C++, inc. full ‘vblabel propagation’ algo twitter-scale analysis chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
  • 65. bio+stats vbem/networks hierarchical biological challenges inference model selection hierarchical/time series: - biological challenges - inference - model selection chris.wiggins@columbia.edu 4/23/12
  • 66. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12 Jan-Willem van de Meent, Ruben Gonzalez, Chris Wiggins Columbia University
  • 67. Ramakrishnan et al – http://www.mrc-lmb.cam.ac.uk/ribo/
  • 68. Ramakrishnan et al – http://www.mrc-lmb.cam.ac.uk/ribo/
  • 69. Ramakrishnan et al – http://www.mrc-lmb.cam.ac.uk/ribo/
  • 70. Ramakrishnan et al – http://www.mrc-lmb.cam.ac.uk/ribo/
  • 71. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12 FRET = cy5 / (cy3 + cy5) Tinoco and Gonzalez, Genes Dev, 2011
  • 72. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12 Tinoco and Gonzalez, Genes Dev, 2011
  • 73. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12 Unbound EF-G bound Tinoco and Gonzalez, Genes Dev, 2011 Fei et al, PNAS, 2009 (short-lived GS1 states correspond to an EF-G + GDPNP binding event)
  • 74. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12 Unbound EF-G bound Tinoco and Gonzalez, Genes Dev, 2011 Fei et al, PNAS, 2009
  • 75. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12 1. Identify states 2. Estimate Kinetic Rates 3. Average over many time series 4. Detect subpopulations
  • 76. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12 FRET Signal
  • 77. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12 FRET SignalHistogram
  • 78. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12 FRET SignalHistogram
  • 79. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12 FRET SignalHistogram
  • 80. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12 FRET SignalHistogram Idea: Find probability of belonging to each state
  • 81. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12 Expectation Maximization 1. calculate p(z | x, θi) 2. calculate θi+1 from p(z | x, θi)
  • 82. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12 Log-Likelihood L = log p(x θ) = log z p(x, z θ) Expectation Maximization 1. calculate p(z | x, θi) 2. calculate θi+1 from p(z | x, θi)
  • 83. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12 Learned Truth Accurate for occupancy of states, not so good for rate estimates
  • 84. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12 p(x, z µ, σ, π) = p(x z, µ, σ)p(z π)
  • 85. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12 probability of state depends on previous state p(zt+ =l zt =k) = Akl
  • 86. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12
  • 87. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12
  • 88. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12
  • 89. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12 p(z =k) = πk
  • 90. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12 p(zt+ =l zt =k) = Akl
  • 91. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12 p(xt zt = k) = N(xt µk , σk)
  • 92. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12 Learned Real We’ve learned: parameters: θ = {µ, σ, π, A} states: p(z | x, θ)
  • 93. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12 2 States 3 States
  • 94. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12 Log-Likelihood L = log p(x θ) = log z p(x, z θ) Log-Evidence L = log p(x u) = log z ∫ dθ p(x, z θ)p(θu) Log-Evidence
  • 95. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12 Log-Likelihood L = log p(x θ) = log z p(x, z θ) Log-Evidence L = log p(x u) = log z ∫ dθ p(x, z θ)p(θu) Prior
  • 96. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12 Log-Likelihood L = log p(x θ) = log z p(x, z θ) Log-Evidence L = log p(x u) = log z ∫ dθ p(x, z θ)p(θu) Ensemble
  • 97. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12 Log-Likelihood L = log p(x θ) = log z p(x, z θ) Log-Evidence best model has highest average likelihood L = log p(x u) = log z ∫ dθ p(x, z θ)p(θu) Log-Evidence
  • 98. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12 Log-Evidence 31 L = log p(x u) = log z ∫ dθ p(x, z θ)p(θu) Lower Bound L = z ∫ dθ q(z)q(θ w)log p(x, z, θ u) q(z)q(θ w) ≥ log p(x u) q(z)q(θ w) p(z, θ x)
  • 99. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12 31 Lower bound tight for true posterior L = z ∫ dθ p(z, θ x)log p(x, z, θ u) p(z, θ x) = z ∫ dθ p(z, θ x)log[p(x u)] = log p(x u) L = log p(x u) − Dkl [q(z)q(θ w) p(z, θ x)]
  • 100. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12 31 We’ve learned: parameters: q(θ | w) states: p(z | x, θ) δLn δq(zn) = VBEM Updates δLn δq(θn) =
  • 101. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12 31 variability: photophysical/experimental
  • 102. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12 31
  • 103. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12 31
  • 104. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12 31
  • 105. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12 31
  • 106. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12 31 Hierarchical Updates ∂ ∂u n Ln =
  • 107. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12 31 Hierarchical Updates ∂ ∂u n Ln = “two-stage PEB model/CIHM”-Kass Steffey JASA 1989
  • 108. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12 2. Update p(θ | u) Until Σ Ln converges Until Ln converges • Update q(zn) • Update q(θn | wn) 1. Run VBEM on each trace δLn δq(zn) = Hierarchical UpdatesVBEM Updates δLn δq(θn) = ∂ ∂u n Ln =
  • 109. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12 2. Update p(θ | u) Until Σ Ln converges Until Ln converges • Update q(zn) • Update q(θn | wn) 1. Run VBEM on each trace We’ve learned: p(θn, zn | xn) ≃ q(θn) q(zn) (for each trace) p(θ | u) (for ensemble)
  • 110. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12
  • 111. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12
  • 112. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12
  • 113. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12
  • 114. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12
  • 115. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12
  • 116. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12
  • 117. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12 Unbound EF-G bound Tinoco and Gonzalez, Genes Dev, 2011 Fei et al, PNAS, 2009
  • 118. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12 ξntkl = p(znt = k, znt+ = l xn) 1. Run mixture model on posterior counts p(ξnA) = tkl Aξntkl kl p(ξn um) = ∫ dA p(Aum)p(ξn A) 2. Rerun with M x K block-diagonal form uA = uA uA uA M
  • 119. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12 τfast τslow 2 4 8 16 32 64
  • 120. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12 τfast τslow 2 4 8 16 32 64 reality
  • 121. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12 no EF-G 50 nM EF-G 500 nM EF-G Fei, Bronson, Hofman, Srinivas, Wiggins, Gonzalez, PNAS, 2009
  • 122. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12 no EF-G 50 nM EF-G 500 nM EF-G p(zk) ∼ e−Gk kB T log p(zk) − log p(zl ) = −(Gk − Gl )kBT + cst.
  • 123. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12 no EF-G 50 nM EF-G 500 nM EF-G p(zk) ∼ e−Gk kB T Δ∆G=logit(p)
  • 124. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12 no EF-Gbound fraction and life-times
  • 125. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12 5 nM EF-Gbound fraction and life-times
  • 126. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12 50 nM EF-Gbound fraction and life-times
  • 127. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12 250 nM EF-Gbound fraction and life-times
  • 128. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12 500 nM EF-Gbound fraction and life-times
  • 129. bio+stats vbem/networks hierarchical biological challenges inference model selection model selection: chris.wiggins@columbia.edu 4/23/12
  • 130. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12 Low Noise, UnderfittedInf Out - Inf In
  • 131. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12 Low Noise, CorrectOut vs In
  • 132. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12 Low Noise, OverfittedInf Out - Inf In
  • 133. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12 High Noise, UnderfittedInf Out - Inf In
  • 134. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12 High Noise, CorrectInf Out - Inf In
  • 135. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12 High Noise, OverfittedInf Out - Inf In
  • 136. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12 High Noise, OverfittedInf Out - Inf In
  • 137. bio+stats vbem/networks hierarchical biological challenges inference model selection hfret. . . chris.wiggins@columbia.edu 4/23/12 the future, in progress: X
  • 138. bio+stats vbem/networks hierarchical biological challenges inference model selection thanks. . . - jake hofman (vbmod,vbfret) - jonathan bronson (vbfret) - jan-willem van de meent (hfret) - ruben gonzalez (vbfret, hfret) for more info: - vbfret.sourceforge.net - vbmod.sourceforge.net - hfret.sourceforge.net (soon) chris.wiggins@columbia.edu 4/23/12 BMC bioinformatics, 2010; PNAS 2009; Biophysical Journal 2009;
  • 139. traditional role of statistics in biophysics “if your experiment needs statistics, you ought to have done a better experiment” -lord rutherford