SlideShare ist ein Scribd-Unternehmen logo
1 von 36
Downloaden Sie, um offline zu lesen
d-VMP:
Distributed Variational Message Passing
Andrés R. Masegosa1, Ana M. Martínez2,
Helge Langseth1, Thomas D. Nielsen2, Antonio Salmerón3,
Darío Ramos-López3, Anders L. Madsen2,4
1Department of Computer Science, Aalborg University, Denmark
2 Department of Computer and Information Science,
The Norwegian University of Science and Technology, Norway
3Department of Mathematics, University of Almería, Spain
4 Hugin Expert A/S, Aalborg, Denmark
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 1
Outline
1 Motivation
2 Variational Message Passing
3 d-VMP
4 Experimental results
5 Conclusions
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 2
Outline
1 Motivation
2 Variational Message Passing
3 d-VMP
4 Experimental results
5 Conclusions
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 3
Motivation
Large
Imbalance
? values
Complex distributions
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 4
Motivation
Large
Imbalance
? values
Complex distributions
Attribute range
Density
0 1 2 3 4 5 6
0123456
●
●
●
●
SVI 1%
SVI 5%
SVI 10%
VMP 1%
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 5
Motivation
Goal: learn a generative model for a finantial dataset to monitor the
customers and make predictions for a single customer.
Xi Hi
θα
i = 1, . . . , N
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 6
Popular existing approach: SVI
Stochastic Variational Inference: iteratively updates the model
parameters based on subsampled data batches.
No estimation of all local hidden variables of the model.
No generation of lower bound.
Poor fit if batch of data is not representative from all data.
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 7
Our contribution:
d-VMP: a distributed message passing scheme.
Defined for a broader class of models (than SVI).
Better and faster convergence results compared to SVI.
Posterior over all local latent variables and lower bound.
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 8
Outline
1 Motivation
2 Variational Message Passing
3 d-VMP
4 Experimental results
5 Conclusions
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 9
Models:
Bayesian learning on iid. data using conjugate exponential BN models:
ln p(X) = ln hX + sX · η − AX (η)
Xi Hi
θα
i = 1, . . . , N
We want to calculate p(θ, H|D).
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 10
Variational Inference:
Approximate p(θ, H|D) (often intractable) by finding tractable
posterior distributions q ∈ Q by minimizing:
min
q(θ,H)∈Q
KL(q(θ, H)|p(θ, H|D)),
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 11
Variational Inference:
Approximate p(θ, H|D) (often intractable) by finding tractable
posterior distributions q ∈ Q by minimizing:
min
q(θ,H)∈Q
KL(q(θ, H)|p(θ, H|D)),
In the mean field variational approach, Q is assumed to fully
factorize:
q(θ, H) =
M
k=1
q(θk)
N
i=1
J
j=1
q(Hi,j ),
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 12
Variational Inference:
Variational Inference exploits:
ln P(D)
constant
= L(q(θ, H))
Maximize
+ KL(q(θ, H)|p(θ, H|D))
Minimize
,
Iterative coordinate ascent of the variational distributions.
Updates in the variational distribution of a variable only
involves variables in its Markov blanket.
Coordinate ascent algorithm formulated as a message passing
scheme.
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 13
Variational Message Passing, VMP:
Message from parent to child: moment parameters
(expectation of the sufficient statistics).
Message from child to parent: natural parameters
(based on the messages received from the co-parents).
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 14
Outline
1 Motivation
2 Variational Message Passing
3 d-VMP
4 Experimental results
5 Conclusions
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 15
Distributed optimization of the lower bound:
Master
θα
q(t)(θ)
X
H
θ1
i = 1, . . . , N
Slave 1
q(t)(H1)
X
H
θ2
i = 1, . . . , N
Slave 2
q(t)(H2)
X
H
θ3
i = 1, . . . , N
Slave 3
q(t)(H3)
q(t)(θ) is broadcasted to all the slave nodes.
maxq∈Q L(q(θ, H))
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 16
Distributed optimization of the lower bound:
Master
θα
q(t)(θ)
X
H
θ1
i = 1, . . . , N
Slave 1
q(t)(H1)
X
H
θ2
i = 1, . . . , N
Slave 2
q(t)(H2)
X
H
θ3
i = 1, . . . , N
Slave 3
q(t)(H3)
q(t+1)(H) = arg maxq(H) L(q(H), q(t)(θ))
maxq∈Q L(q(θ, H))
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 17
Distributed optimization of the lower bound:
Master
θα
q(t)(θ)
X
H
θ1
i = 1, . . . , N
Slave 1
q(t)(H1)
X
H
θ2
i = 1, . . . , N
Slave 2
q(t)(H2)
X
H
θ3
i = 1, . . . , N
Slave 3
q(t)(H3)
q(t+1)(Hn) = arg maxq(Hn) Ln(q(Hn), q(t)(θ))
maxq∈Q L(q(θ, H))
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 18
Distributed optimization of the lower bound:
Master
θα
q(t)(θ)
X
H
θ1
i = 1, . . . , N
Slave 1
q(t)(H1)
X
H
θ2
i = 1, . . . , N
Slave 2
q(t)(H2)
X
H
θ3
i = 1, . . . , N
Slave 3
q(t)(H3)
q(t+1)
(θ) = arg max
q(θ)
L(q(t)
(H), q(θ))
May be coupled
maxq∈Q L(q(θ, H))
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 19
Candidate solutions:
Resort to a generalized mean-field approximation as SVI: does
not factorize over the global parameters.
Prohibitive for models with a large number of global (coupled)
parameters, e.g. linear regression.
Our proposal: VMP as a distributed projected natural
gradient ascent algorithm (PGNA).
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 20
d-VMP as a projected natural gradient ascent
Insight 1: VMP can be expressed as a projected natural
gradient ascent algorithm.
η
(t+1)
X = η
(t)
X + ρX,t[ ˆ ηL(η(t)
)]+
X (1)
[·] is the projection operator.
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 21
d-VMP as a projected natural gradient ascent
Insight 2: The natural gradient of the lower bound can be
expressed as follows:
ˆ ηθ
L = mPa(θ)→θ + mHi →θ
The gradient can be computed in parallel.
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 22
d-VMP as a projected natural gradient ascent
Insight 3: Global parameters are “coupled” only if they belong
to each other’s Markov blanket.
Define a disjoint partition of the global parameters:
R = {J1, . . . , JS }
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 23
d-VMP as a projected natural gradient ascent
d-VMP is based on performing independent global updates
over the global parameters of each partition:
η
(t+1)
Jr
= η
(t)
Jr
+ ρr,t[ ˆ ηL(η(t)
)]+
Jr
ρr,t is the learning rate. If |Jr | = 1 then ρr,t = 1.
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 24
dVMP as a distributed PNGA algorithm:
Master
θα
η
(t+1)
Jr
X
H
θ1
i = 1, . . . , N
Slave 1
η
(t+1)
1
X
H
θ2
i = 1, . . . , N
Slave 2
η
(t+1)
2
X
H
θ3
i = 1, . . . , N
Slave 3
η
(t+1)
3
η
(t)
Jr
+ ρr,t[ˆηL(η(t)
)]+
Jr
For all Jr ∈ R
maxq∈Q L(q(θ, H))
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 25
Outline
1 Motivation
2 Variational Message Passing
3 d-VMP
4 Experimental results
5 Conclusions
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 26
Model fit to the data
Xij HijYi
Hi θ
α
j = 1, . . . , J
i = 1, . . . , N
Attribute range
Density
0 1 2 3 4 5 6
0123456
●
●
●
●
SVI 1%
SVI 5%
SVI 10%
VMP 1%
Representative sample of 55K clients (N) and 33 attributes (J).
“Unrolled” model of more than 3.5M nodes (75% latent variables).
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 27
Model fit to the data
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 28
Model fit to the data
0 500 1000 1500 2000
−1.5e+07−1.0e+07−5.0e+060.0e+00
Time (seconds)
Globallowerbound
Alg. BS(data%)/LR
SVI 1%/0.55
SVI 1%/0.75
SVI 1%/0.99
SVI 5%/0.55
SVI 5%/0.75
SVI 5%/0.99
SVI 10%/0.55
SVI 10%/0.75
SVI 10%/0.99
d−VMP
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 29
Test marginal log-likelihood
BS (% data) LR Log-Likel.
SVI
1 %
0.55 -180902.87
0.75 -298564.03
0.99 -426979.52
5 %
0.55 -177302.24
0.75 -333264.16
0.99 -628105.70
10 %
0.55 -347035.22
0.75 -397525.45
0.99 -538087.13
d-VMP 1.0 67265.34
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 30
Mixtures of learnt posteriors for one attribute
●
●
●
●
SVI 1%
SVI 5%
SVI 10%
VMP 1%
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 31
Scalability settings
Generated data set of 42 million samples per client and 12
variables.
“Unrolled” model of more than 1 billion (109) nodes
(75% latent variables).
AMIDST Toolbox with Apache Flink.
Amazon Web Services (AWS) as distributed computing
environment.
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 32
Scalability results
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 33
Outline
1 Motivation
2 Variational Message Passing
3 d-VMP
4 Experimental results
5 Conclusions
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 34
Conclusions
Variational methods can be scaled using distributed
computation instead of sampling techniques.
Bayesian learning in model with more than 1 billion nodes
(75% of hidden).
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 35
Thank you for your attention
Questions?
You can download our open source Java toolbox:
amidsttoolbox.com
Acknowledgments: This project has received funding from the European Union’s
Seventh Framework Programme for research, technological development and
demonstration under grant agreement no 619209
Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 36

Weitere ähnliche Inhalte

Was ist angesagt?

Markov Chain Monte Carlo Methods
Markov Chain Monte Carlo MethodsMarkov Chain Monte Carlo Methods
Markov Chain Monte Carlo MethodsFrancesco Casalegno
 
Contribution à l'étude du trafic routier sur réseaux à l'aide des équations d...
Contribution à l'étude du trafic routier sur réseaux à l'aide des équations d...Contribution à l'étude du trafic routier sur réseaux à l'aide des équations d...
Contribution à l'étude du trafic routier sur réseaux à l'aide des équations d...Guillaume Costeseque
 
Elementary Landscape Decomposition of the Hamiltonian Path Optimization Problem
Elementary Landscape Decomposition of the Hamiltonian Path Optimization ProblemElementary Landscape Decomposition of the Hamiltonian Path Optimization Problem
Elementary Landscape Decomposition of the Hamiltonian Path Optimization Problemjfrchicanog
 
Traffic flow modeling on road networks using Hamilton-Jacobi equations
Traffic flow modeling on road networks using Hamilton-Jacobi equationsTraffic flow modeling on road networks using Hamilton-Jacobi equations
Traffic flow modeling on road networks using Hamilton-Jacobi equationsGuillaume Costeseque
 
Quasi-Optimal Recombination Operator
Quasi-Optimal Recombination OperatorQuasi-Optimal Recombination Operator
Quasi-Optimal Recombination Operatorjfrchicanog
 
Metropolis-Hastings MCMC Short Tutorial
Metropolis-Hastings MCMC Short TutorialMetropolis-Hastings MCMC Short Tutorial
Metropolis-Hastings MCMC Short TutorialRalph Schlosser
 
Second order traffic flow models on networks
Second order traffic flow models on networksSecond order traffic flow models on networks
Second order traffic flow models on networksGuillaume Costeseque
 
ABC-Xian
ABC-XianABC-Xian
ABC-XianDeb Roy
 
Let's Practice What We Preach: Likelihood Methods for Monte Carlo Data
Let's Practice What We Preach: Likelihood Methods for Monte Carlo DataLet's Practice What We Preach: Likelihood Methods for Monte Carlo Data
Let's Practice What We Preach: Likelihood Methods for Monte Carlo DataChristian Robert
 
no U-turn sampler, a discussion of Hoffman & Gelman NUTS algorithm
no U-turn sampler, a discussion of Hoffman & Gelman NUTS algorithmno U-turn sampler, a discussion of Hoffman & Gelman NUTS algorithm
no U-turn sampler, a discussion of Hoffman & Gelman NUTS algorithmChristian Robert
 
A Unifying Review of Gaussian Linear Models (Roweis 1999)
A Unifying Review of Gaussian Linear Models (Roweis 1999)A Unifying Review of Gaussian Linear Models (Roweis 1999)
A Unifying Review of Gaussian Linear Models (Roweis 1999)Feynman Liang
 
RuleML2015: Learning Characteristic Rules in Geographic Information Systems
RuleML2015: Learning Characteristic Rules in Geographic Information SystemsRuleML2015: Learning Characteristic Rules in Geographic Information Systems
RuleML2015: Learning Characteristic Rules in Geographic Information SystemsRuleML
 
Jan Picek, Martin Schindler, Jan Kyselý, Romana Beranová: Statistical aspects...
Jan Picek, Martin Schindler, Jan Kyselý, Romana Beranová: Statistical aspects...Jan Picek, Martin Schindler, Jan Kyselý, Romana Beranová: Statistical aspects...
Jan Picek, Martin Schindler, Jan Kyselý, Romana Beranová: Statistical aspects...Jiří Šmída
 

Was ist angesagt? (19)

Climate Extremes Workshop - A Semiparametric Bayesian Clustering Model for S...
Climate Extremes Workshop -  A Semiparametric Bayesian Clustering Model for S...Climate Extremes Workshop -  A Semiparametric Bayesian Clustering Model for S...
Climate Extremes Workshop - A Semiparametric Bayesian Clustering Model for S...
 
Markov Chain Monte Carlo Methods
Markov Chain Monte Carlo MethodsMarkov Chain Monte Carlo Methods
Markov Chain Monte Carlo Methods
 
Contribution à l'étude du trafic routier sur réseaux à l'aide des équations d...
Contribution à l'étude du trafic routier sur réseaux à l'aide des équations d...Contribution à l'étude du trafic routier sur réseaux à l'aide des équations d...
Contribution à l'étude du trafic routier sur réseaux à l'aide des équations d...
 
Elementary Landscape Decomposition of the Hamiltonian Path Optimization Problem
Elementary Landscape Decomposition of the Hamiltonian Path Optimization ProblemElementary Landscape Decomposition of the Hamiltonian Path Optimization Problem
Elementary Landscape Decomposition of the Hamiltonian Path Optimization Problem
 
Traffic flow modeling on road networks using Hamilton-Jacobi equations
Traffic flow modeling on road networks using Hamilton-Jacobi equationsTraffic flow modeling on road networks using Hamilton-Jacobi equations
Traffic flow modeling on road networks using Hamilton-Jacobi equations
 
Quasi-Optimal Recombination Operator
Quasi-Optimal Recombination OperatorQuasi-Optimal Recombination Operator
Quasi-Optimal Recombination Operator
 
Metropolis-Hastings MCMC Short Tutorial
Metropolis-Hastings MCMC Short TutorialMetropolis-Hastings MCMC Short Tutorial
Metropolis-Hastings MCMC Short Tutorial
 
Second order traffic flow models on networks
Second order traffic flow models on networksSecond order traffic flow models on networks
Second order traffic flow models on networks
 
ABC-Xian
ABC-XianABC-Xian
ABC-Xian
 
Let's Practice What We Preach: Likelihood Methods for Monte Carlo Data
Let's Practice What We Preach: Likelihood Methods for Monte Carlo DataLet's Practice What We Preach: Likelihood Methods for Monte Carlo Data
Let's Practice What We Preach: Likelihood Methods for Monte Carlo Data
 
no U-turn sampler, a discussion of Hoffman & Gelman NUTS algorithm
no U-turn sampler, a discussion of Hoffman & Gelman NUTS algorithmno U-turn sampler, a discussion of Hoffman & Gelman NUTS algorithm
no U-turn sampler, a discussion of Hoffman & Gelman NUTS algorithm
 
Athens workshop on MCMC
Athens workshop on MCMCAthens workshop on MCMC
Athens workshop on MCMC
 
Planted Clique Research Paper
Planted Clique Research PaperPlanted Clique Research Paper
Planted Clique Research Paper
 
Randomization
RandomizationRandomization
Randomization
 
QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...
QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...
QMC: Transition Workshop - Probabilistic Integrators for Deterministic Differ...
 
A Unifying Review of Gaussian Linear Models (Roweis 1999)
A Unifying Review of Gaussian Linear Models (Roweis 1999)A Unifying Review of Gaussian Linear Models (Roweis 1999)
A Unifying Review of Gaussian Linear Models (Roweis 1999)
 
RuleML2015: Learning Characteristic Rules in Geographic Information Systems
RuleML2015: Learning Characteristic Rules in Geographic Information SystemsRuleML2015: Learning Characteristic Rules in Geographic Information Systems
RuleML2015: Learning Characteristic Rules in Geographic Information Systems
 
Dcm
DcmDcm
Dcm
 
Jan Picek, Martin Schindler, Jan Kyselý, Romana Beranová: Statistical aspects...
Jan Picek, Martin Schindler, Jan Kyselý, Romana Beranová: Statistical aspects...Jan Picek, Martin Schindler, Jan Kyselý, Romana Beranová: Statistical aspects...
Jan Picek, Martin Schindler, Jan Kyselý, Romana Beranová: Statistical aspects...
 

Andere mochten auch

Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)
Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)
Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)AMIDST Toolbox
 
Finding Your People Story: How to Develop an Employer Brand That Attracts Ta...
Finding Your People Story:  How to Develop an Employer Brand That Attracts Ta...Finding Your People Story:  How to Develop an Employer Brand That Attracts Ta...
Finding Your People Story: How to Develop an Employer Brand That Attracts Ta...Chad Norman
 
Neo4j graphs in the real world - graph days d.c. - april 14, 2015
Neo4j   graphs in the real world - graph days d.c. - april 14, 2015Neo4j   graphs in the real world - graph days d.c. - april 14, 2015
Neo4j graphs in the real world - graph days d.c. - april 14, 2015Neo4j
 
Is Machine learning useful for Fraud Prevention?
Is Machine learning useful for Fraud Prevention?Is Machine learning useful for Fraud Prevention?
Is Machine learning useful for Fraud Prevention?Andrea Dal Pozzolo
 
Credit card fraud detection methods using Data-mining.pptx (2)
Credit card fraud detection methods using Data-mining.pptx (2)Credit card fraud detection methods using Data-mining.pptx (2)
Credit card fraud detection methods using Data-mining.pptx (2)k.surya kumar
 
Model building in credit card and loan approval
Model building in credit card and loan approval Model building in credit card and loan approval
Model building in credit card and loan approval Venkata Reddy Konasani
 
Types of databases
Types of databasesTypes of databases
Types of databasesPAQUIAAIZEL
 
Machine Learning for Fraud Detection
Machine Learning for Fraud DetectionMachine Learning for Fraud Detection
Machine Learning for Fraud DetectionNitesh Kumar
 
Credit scoring using Rattle and R
Credit scoring using Rattle and RCredit scoring using Rattle and R
Credit scoring using Rattle and RAyan Das
 

Andere mochten auch (11)

Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)
Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)
Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)
 
Finding Your People Story: How to Develop an Employer Brand That Attracts Ta...
Finding Your People Story:  How to Develop an Employer Brand That Attracts Ta...Finding Your People Story:  How to Develop an Employer Brand That Attracts Ta...
Finding Your People Story: How to Develop an Employer Brand That Attracts Ta...
 
Healthcare fraud detection
Healthcare fraud detectionHealthcare fraud detection
Healthcare fraud detection
 
Successful Uses of R in Banking
Successful Uses of R in BankingSuccessful Uses of R in Banking
Successful Uses of R in Banking
 
Neo4j graphs in the real world - graph days d.c. - april 14, 2015
Neo4j   graphs in the real world - graph days d.c. - april 14, 2015Neo4j   graphs in the real world - graph days d.c. - april 14, 2015
Neo4j graphs in the real world - graph days d.c. - april 14, 2015
 
Is Machine learning useful for Fraud Prevention?
Is Machine learning useful for Fraud Prevention?Is Machine learning useful for Fraud Prevention?
Is Machine learning useful for Fraud Prevention?
 
Credit card fraud detection methods using Data-mining.pptx (2)
Credit card fraud detection methods using Data-mining.pptx (2)Credit card fraud detection methods using Data-mining.pptx (2)
Credit card fraud detection methods using Data-mining.pptx (2)
 
Model building in credit card and loan approval
Model building in credit card and loan approval Model building in credit card and loan approval
Model building in credit card and loan approval
 
Types of databases
Types of databasesTypes of databases
Types of databases
 
Machine Learning for Fraud Detection
Machine Learning for Fraud DetectionMachine Learning for Fraud Detection
Machine Learning for Fraud Detection
 
Credit scoring using Rattle and R
Credit scoring using Rattle and RCredit scoring using Rattle and R
Credit scoring using Rattle and R
 

Ähnlich wie Distributed Variational Message Passing for Large Probabilistic Graphical Models

One Algorithm to Rule Them All: How to Automate Statistical Computation
One Algorithm to Rule Them All: How to Automate Statistical ComputationOne Algorithm to Rule Them All: How to Automate Statistical Computation
One Algorithm to Rule Them All: How to Automate Statistical ComputationWork-Bench
 
Asynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and AlgorithmsAsynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and AlgorithmsFabian Pedregosa
 
Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block De...
Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block De...Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block De...
Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block De...AMIDST Toolbox
 
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...NTNU
 
Regression on gaussian symbols
Regression on gaussian symbolsRegression on gaussian symbols
Regression on gaussian symbolsAxel de Romblay
 
The Gaussian Process Latent Variable Model (GPLVM)
The Gaussian Process Latent Variable Model (GPLVM)The Gaussian Process Latent Variable Model (GPLVM)
The Gaussian Process Latent Variable Model (GPLVM)James McMurray
 
Mining group correlations over data streams
Mining group correlations over data streamsMining group correlations over data streams
Mining group correlations over data streamsyuanchung
 
StatPhysPerspectives_AMALEA_Cetraro_AnnaCarbone.pdf
StatPhysPerspectives_AMALEA_Cetraro_AnnaCarbone.pdfStatPhysPerspectives_AMALEA_Cetraro_AnnaCarbone.pdf
StatPhysPerspectives_AMALEA_Cetraro_AnnaCarbone.pdfAnna Carbone
 
Second order traffic flow models on networks
Second order traffic flow models on networksSecond order traffic flow models on networks
Second order traffic flow models on networksGuillaume Costeseque
 
Accelerating Metropolis Hastings with Lightweight Inference Compilation
Accelerating Metropolis Hastings with Lightweight Inference CompilationAccelerating Metropolis Hastings with Lightweight Inference Compilation
Accelerating Metropolis Hastings with Lightweight Inference CompilationFeynman Liang
 
Stratified sampling and resampling for approximate Bayesian computation
Stratified sampling and resampling for approximate Bayesian computationStratified sampling and resampling for approximate Bayesian computation
Stratified sampling and resampling for approximate Bayesian computationUmberto Picchini
 
Probabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering NetworksProbabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering NetworksTomaso Aste
 
Anomalous diffusion in a finite world: time-scale dependent trajectory analysis
Anomalous diffusion in a finite world: time-scale dependent trajectory analysisAnomalous diffusion in a finite world: time-scale dependent trajectory analysis
Anomalous diffusion in a finite world: time-scale dependent trajectory analysiskhinsen
 
A quantum-inspired optimization heuristic for the multiple sequence alignment...
A quantum-inspired optimization heuristic for the multiple sequence alignment...A quantum-inspired optimization heuristic for the multiple sequence alignment...
A quantum-inspired optimization heuristic for the multiple sequence alignment...Konstantinos Giannakis
 
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural NetworksMasahiro Suzuki
 
Efficient Data Stream Classification via Probabilistic Adaptive Windows
Efficient Data Stream Classification via Probabilistic Adaptive WindowsEfficient Data Stream Classification via Probabilistic Adaptive Windows
Efficient Data Stream Classification via Probabilistic Adaptive WindowsAlbert Bifet
 

Ähnlich wie Distributed Variational Message Passing for Large Probabilistic Graphical Models (20)

SASA 2016
SASA 2016SASA 2016
SASA 2016
 
One Algorithm to Rule Them All: How to Automate Statistical Computation
One Algorithm to Rule Them All: How to Automate Statistical ComputationOne Algorithm to Rule Them All: How to Automate Statistical Computation
One Algorithm to Rule Them All: How to Automate Statistical Computation
 
GDRR Opening Workshop - Bayesian Inference for Common Cause Failure Rate Base...
GDRR Opening Workshop - Bayesian Inference for Common Cause Failure Rate Base...GDRR Opening Workshop - Bayesian Inference for Common Cause Failure Rate Base...
GDRR Opening Workshop - Bayesian Inference for Common Cause Failure Rate Base...
 
Asynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and AlgorithmsAsynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and Algorithms
 
Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block De...
Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block De...Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block De...
Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block De...
 
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
 
Regression on gaussian symbols
Regression on gaussian symbolsRegression on gaussian symbols
Regression on gaussian symbols
 
The Gaussian Process Latent Variable Model (GPLVM)
The Gaussian Process Latent Variable Model (GPLVM)The Gaussian Process Latent Variable Model (GPLVM)
The Gaussian Process Latent Variable Model (GPLVM)
 
Mining group correlations over data streams
Mining group correlations over data streamsMining group correlations over data streams
Mining group correlations over data streams
 
StatPhysPerspectives_AMALEA_Cetraro_AnnaCarbone.pdf
StatPhysPerspectives_AMALEA_Cetraro_AnnaCarbone.pdfStatPhysPerspectives_AMALEA_Cetraro_AnnaCarbone.pdf
StatPhysPerspectives_AMALEA_Cetraro_AnnaCarbone.pdf
 
Second order traffic flow models on networks
Second order traffic flow models on networksSecond order traffic flow models on networks
Second order traffic flow models on networks
 
Accelerating Metropolis Hastings with Lightweight Inference Compilation
Accelerating Metropolis Hastings with Lightweight Inference CompilationAccelerating Metropolis Hastings with Lightweight Inference Compilation
Accelerating Metropolis Hastings with Lightweight Inference Compilation
 
Final Report-1-(1)
Final Report-1-(1)Final Report-1-(1)
Final Report-1-(1)
 
Stratified sampling and resampling for approximate Bayesian computation
Stratified sampling and resampling for approximate Bayesian computationStratified sampling and resampling for approximate Bayesian computation
Stratified sampling and resampling for approximate Bayesian computation
 
Probabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering NetworksProbabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering Networks
 
QCD Phase Diagram
QCD Phase DiagramQCD Phase Diagram
QCD Phase Diagram
 
Anomalous diffusion in a finite world: time-scale dependent trajectory analysis
Anomalous diffusion in a finite world: time-scale dependent trajectory analysisAnomalous diffusion in a finite world: time-scale dependent trajectory analysis
Anomalous diffusion in a finite world: time-scale dependent trajectory analysis
 
A quantum-inspired optimization heuristic for the multiple sequence alignment...
A quantum-inspired optimization heuristic for the multiple sequence alignment...A quantum-inspired optimization heuristic for the multiple sequence alignment...
A quantum-inspired optimization heuristic for the multiple sequence alignment...
 
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
 
Efficient Data Stream Classification via Probabilistic Adaptive Windows
Efficient Data Stream Classification via Probabilistic Adaptive WindowsEfficient Data Stream Classification via Probabilistic Adaptive Windows
Efficient Data Stream Classification via Probabilistic Adaptive Windows
 

Kürzlich hochgeladen

A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Types of different blotting techniques.pptx
Types of different blotting techniques.pptxTypes of different blotting techniques.pptx
Types of different blotting techniques.pptxkhadijarafiq2012
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 

Kürzlich hochgeladen (20)

A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Types of different blotting techniques.pptx
Types of different blotting techniques.pptxTypes of different blotting techniques.pptx
Types of different blotting techniques.pptx
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 

Distributed Variational Message Passing for Large Probabilistic Graphical Models

  • 1. d-VMP: Distributed Variational Message Passing Andrés R. Masegosa1, Ana M. Martínez2, Helge Langseth1, Thomas D. Nielsen2, Antonio Salmerón3, Darío Ramos-López3, Anders L. Madsen2,4 1Department of Computer Science, Aalborg University, Denmark 2 Department of Computer and Information Science, The Norwegian University of Science and Technology, Norway 3Department of Mathematics, University of Almería, Spain 4 Hugin Expert A/S, Aalborg, Denmark Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 1
  • 2. Outline 1 Motivation 2 Variational Message Passing 3 d-VMP 4 Experimental results 5 Conclusions Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 2
  • 3. Outline 1 Motivation 2 Variational Message Passing 3 d-VMP 4 Experimental results 5 Conclusions Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 3
  • 4. Motivation Large Imbalance ? values Complex distributions Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 4
  • 5. Motivation Large Imbalance ? values Complex distributions Attribute range Density 0 1 2 3 4 5 6 0123456 ● ● ● ● SVI 1% SVI 5% SVI 10% VMP 1% Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 5
  • 6. Motivation Goal: learn a generative model for a finantial dataset to monitor the customers and make predictions for a single customer. Xi Hi θα i = 1, . . . , N Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 6
  • 7. Popular existing approach: SVI Stochastic Variational Inference: iteratively updates the model parameters based on subsampled data batches. No estimation of all local hidden variables of the model. No generation of lower bound. Poor fit if batch of data is not representative from all data. Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 7
  • 8. Our contribution: d-VMP: a distributed message passing scheme. Defined for a broader class of models (than SVI). Better and faster convergence results compared to SVI. Posterior over all local latent variables and lower bound. Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 8
  • 9. Outline 1 Motivation 2 Variational Message Passing 3 d-VMP 4 Experimental results 5 Conclusions Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 9
  • 10. Models: Bayesian learning on iid. data using conjugate exponential BN models: ln p(X) = ln hX + sX · η − AX (η) Xi Hi θα i = 1, . . . , N We want to calculate p(θ, H|D). Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 10
  • 11. Variational Inference: Approximate p(θ, H|D) (often intractable) by finding tractable posterior distributions q ∈ Q by minimizing: min q(θ,H)∈Q KL(q(θ, H)|p(θ, H|D)), Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 11
  • 12. Variational Inference: Approximate p(θ, H|D) (often intractable) by finding tractable posterior distributions q ∈ Q by minimizing: min q(θ,H)∈Q KL(q(θ, H)|p(θ, H|D)), In the mean field variational approach, Q is assumed to fully factorize: q(θ, H) = M k=1 q(θk) N i=1 J j=1 q(Hi,j ), Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 12
  • 13. Variational Inference: Variational Inference exploits: ln P(D) constant = L(q(θ, H)) Maximize + KL(q(θ, H)|p(θ, H|D)) Minimize , Iterative coordinate ascent of the variational distributions. Updates in the variational distribution of a variable only involves variables in its Markov blanket. Coordinate ascent algorithm formulated as a message passing scheme. Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 13
  • 14. Variational Message Passing, VMP: Message from parent to child: moment parameters (expectation of the sufficient statistics). Message from child to parent: natural parameters (based on the messages received from the co-parents). Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 14
  • 15. Outline 1 Motivation 2 Variational Message Passing 3 d-VMP 4 Experimental results 5 Conclusions Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 15
  • 16. Distributed optimization of the lower bound: Master θα q(t)(θ) X H θ1 i = 1, . . . , N Slave 1 q(t)(H1) X H θ2 i = 1, . . . , N Slave 2 q(t)(H2) X H θ3 i = 1, . . . , N Slave 3 q(t)(H3) q(t)(θ) is broadcasted to all the slave nodes. maxq∈Q L(q(θ, H)) Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 16
  • 17. Distributed optimization of the lower bound: Master θα q(t)(θ) X H θ1 i = 1, . . . , N Slave 1 q(t)(H1) X H θ2 i = 1, . . . , N Slave 2 q(t)(H2) X H θ3 i = 1, . . . , N Slave 3 q(t)(H3) q(t+1)(H) = arg maxq(H) L(q(H), q(t)(θ)) maxq∈Q L(q(θ, H)) Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 17
  • 18. Distributed optimization of the lower bound: Master θα q(t)(θ) X H θ1 i = 1, . . . , N Slave 1 q(t)(H1) X H θ2 i = 1, . . . , N Slave 2 q(t)(H2) X H θ3 i = 1, . . . , N Slave 3 q(t)(H3) q(t+1)(Hn) = arg maxq(Hn) Ln(q(Hn), q(t)(θ)) maxq∈Q L(q(θ, H)) Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 18
  • 19. Distributed optimization of the lower bound: Master θα q(t)(θ) X H θ1 i = 1, . . . , N Slave 1 q(t)(H1) X H θ2 i = 1, . . . , N Slave 2 q(t)(H2) X H θ3 i = 1, . . . , N Slave 3 q(t)(H3) q(t+1) (θ) = arg max q(θ) L(q(t) (H), q(θ)) May be coupled maxq∈Q L(q(θ, H)) Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 19
  • 20. Candidate solutions: Resort to a generalized mean-field approximation as SVI: does not factorize over the global parameters. Prohibitive for models with a large number of global (coupled) parameters, e.g. linear regression. Our proposal: VMP as a distributed projected natural gradient ascent algorithm (PGNA). Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 20
  • 21. d-VMP as a projected natural gradient ascent Insight 1: VMP can be expressed as a projected natural gradient ascent algorithm. η (t+1) X = η (t) X + ρX,t[ ˆ ηL(η(t) )]+ X (1) [·] is the projection operator. Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 21
  • 22. d-VMP as a projected natural gradient ascent Insight 2: The natural gradient of the lower bound can be expressed as follows: ˆ ηθ L = mPa(θ)→θ + mHi →θ The gradient can be computed in parallel. Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 22
  • 23. d-VMP as a projected natural gradient ascent Insight 3: Global parameters are “coupled” only if they belong to each other’s Markov blanket. Define a disjoint partition of the global parameters: R = {J1, . . . , JS } Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 23
  • 24. d-VMP as a projected natural gradient ascent d-VMP is based on performing independent global updates over the global parameters of each partition: η (t+1) Jr = η (t) Jr + ρr,t[ ˆ ηL(η(t) )]+ Jr ρr,t is the learning rate. If |Jr | = 1 then ρr,t = 1. Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 24
  • 25. dVMP as a distributed PNGA algorithm: Master θα η (t+1) Jr X H θ1 i = 1, . . . , N Slave 1 η (t+1) 1 X H θ2 i = 1, . . . , N Slave 2 η (t+1) 2 X H θ3 i = 1, . . . , N Slave 3 η (t+1) 3 η (t) Jr + ρr,t[ˆηL(η(t) )]+ Jr For all Jr ∈ R maxq∈Q L(q(θ, H)) Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 25
  • 26. Outline 1 Motivation 2 Variational Message Passing 3 d-VMP 4 Experimental results 5 Conclusions Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 26
  • 27. Model fit to the data Xij HijYi Hi θ α j = 1, . . . , J i = 1, . . . , N Attribute range Density 0 1 2 3 4 5 6 0123456 ● ● ● ● SVI 1% SVI 5% SVI 10% VMP 1% Representative sample of 55K clients (N) and 33 attributes (J). “Unrolled” model of more than 3.5M nodes (75% latent variables). Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 27
  • 28. Model fit to the data Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 28
  • 29. Model fit to the data 0 500 1000 1500 2000 −1.5e+07−1.0e+07−5.0e+060.0e+00 Time (seconds) Globallowerbound Alg. BS(data%)/LR SVI 1%/0.55 SVI 1%/0.75 SVI 1%/0.99 SVI 5%/0.55 SVI 5%/0.75 SVI 5%/0.99 SVI 10%/0.55 SVI 10%/0.75 SVI 10%/0.99 d−VMP Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 29
  • 30. Test marginal log-likelihood BS (% data) LR Log-Likel. SVI 1 % 0.55 -180902.87 0.75 -298564.03 0.99 -426979.52 5 % 0.55 -177302.24 0.75 -333264.16 0.99 -628105.70 10 % 0.55 -347035.22 0.75 -397525.45 0.99 -538087.13 d-VMP 1.0 67265.34 Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 30
  • 31. Mixtures of learnt posteriors for one attribute ● ● ● ● SVI 1% SVI 5% SVI 10% VMP 1% Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 31
  • 32. Scalability settings Generated data set of 42 million samples per client and 12 variables. “Unrolled” model of more than 1 billion (109) nodes (75% latent variables). AMIDST Toolbox with Apache Flink. Amazon Web Services (AWS) as distributed computing environment. Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 32
  • 33. Scalability results Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 33
  • 34. Outline 1 Motivation 2 Variational Message Passing 3 d-VMP 4 Experimental results 5 Conclusions Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 34
  • 35. Conclusions Variational methods can be scaled using distributed computation instead of sampling techniques. Bayesian learning in model with more than 1 billion nodes (75% of hidden). Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 35
  • 36. Thank you for your attention Questions? You can download our open source Java toolbox: amidsttoolbox.com Acknowledgments: This project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement no 619209 Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 36