SlideShare ist ein Scribd-Unternehmen logo
1 von 56
Downloaden Sie, um offline zu lesen
Part 1: 2016-01-20
Part 2: 2016-02-10
Tomasz Kuśmierczyk
Session 5: Sampling & MCMC
Approximate and Scalable Inference for Complex
Probabilistic Models in Recommender Systems
Part 2: Inference Techniques
MCMC = Monte Carlo Markov Chains
MCMC ⊂ Sampling
Literature / Credits
● Szymon Jaroszewicz lectures on “Selected Advanced Topics in Machine Learning”
● Daphne Koller lectures on “Probabilistic Graphical Models” (https://class.coursera.
org/pgm-003/lecture)
● Patrick Lam slides http://www.people.fas.harvard.
edu/~plam/teaching/methods/convergence/convergence_print.pdf
● Bishop’s book ch. 11
● MacKay, David JC. Information theory, inference and learning algorithms. Cambridge
university press, 2003. (http://www.inference.phy.cam.ac.uk/itprnn/book.pdf)
● R & JAGS online tutorials…
● …
Basics & motivation
Motivation: Monte Carlo for integrating
http://mlg.eng.cam.ac.uk/zoubin/tut06/mcmc.pdf
Non-trivial posterior
distribution (e.g., for BNs)
Sampling vs Variational Inference (previous seminar)
http://people.inf.ethz.ch/bkay/talks/Brodersen_2013_03_22.pdf
Sampling continued ...
● Accuracy of sampling based estimates depends only on the variance of the
quantity being estimated
● It does not depend directly on the dimensionality (having many variables is
not a problem)
● In some cases we are able to break the curse of dimensionality
but
● Sampling gets much more difficult in higher dimensions
● Variance often increases as the dimension grows
● Accuracy of sampling based methods grows only with square root of the
number of samples
Jaroszewicz
Sampling techniques - basic cases
● uniform -> pseudo-random numbers generator
● discrete distributions -> range matching with the help of uniform (in log of
number of outcomes time)
● continous -> cdf inverse
● various ‘tricks’
● ...
Sampling techniques (e.g., for BNs posterior)
● Ancestral Sampling (no evidence)
● Probabilistic Logic Sampling (like AS but samples not consistent with
evidence are discarded -> low number of samples generated)
● Likelihood weighting (estimations may be inaccurate + other problems)
● Importance Sampling
● (Adaptive) Rejection Sampling
● Sampling-Importance-Resampling
● Metropolis
● Metropolis-Hastings
● Gibbs Sampling
● Hamiltionian (hybrid) Sampling
● Slice sampling
● and more...
Monte Carlo without Markov Chains
Few remarks
● there is no difference between sampling from normalized and non-normalized
distributions
● non-normalized distributions are easy to evaluate for BNs
● in most cases (e.g. rejection sampling) we work with non-normalized
distributions
● for simplicity p(x) is used in notation but there is no difference for complicated
posterior distributions
● 1D case presented but work also in multi-dimensional case.
Rejection sampling
Jaroszewicz, Bishop
c q(x)
p(x)
Rejection sampling - proof
Jaroszewicz
Selection of c?
● c should be as small as possible to have low reduction rate
● but p <= c q must hold
● Adaptive Rejection Sampling for log-concave distributions
○ log-concave = logarithm of the distribution is concave
Adaptive Rejection Sampling
Jaroszewicz
Rejection Sampling problems
● part of the samples are rejected
● tight “envelope” helps a bit
but
● in many dimensions (when there are many variables) dimensionality curse
must be taken into account
● see Bishop’s example (for rejection sampling):
○ p(x) ~ N(0, s1)
○ q(x) ~ N(0, 1.01*s1)
○ D=1000
○ -> acceptance ratio 1/20000
Markov Chains
What is a Markov Chain?
● A triple <possibly infinite set S of possible states, initial distribution over states
P0, transition matrix P (T)>
● transition matrix - a matrix with probabilities Pij (Tij) that being in some state
si at time t we will move to another state sj at time t+1
● Markov property = next state depends only on one previous
Jaroszewicz
Markov Chains - distribution over states
Jaroszewicz
Markov Chains - stationary distribution
Jaroszewicz
Stationarity example
Daphne Koller
Stationarity from regularity
● If there exists k such that, for every two states <si, sj> the probability of
getting from si to sj in exactly k steps is > 0 (MC is regular) →MC converges
to a unique stationary distribution
● Sufficient conditions for regularity:
○ there is a path between every pair of states
○ for every state, there is a self-transition
Stationarity of irreducible, aperiodic MC
● Irreducible, aperiodic Markov chains always converge to a unique stationary
distribution
Reducibility
Jaroszewicz
Periodicity
Jaroszewicz
Why I talk about Markov Chains -> MCMC
the idea is that:
● Markov Chain “jumps” over states
● states determine (BN) samples (that are later used for Monte Carlo)
○ for example: state ⇔ sample
but we need:
● Markov Chain converges to a stationary distribution (to be proved every time)
● a distribution of generated samples is equal to required distribution (BNs
posterior)
Properties
● Very general purpose
● Often easy to implement
● Good theoretical guarantees as t -> ∞
but:
● Lots of tunable parameters / design choices
● Can be quite slow to converge
● Difficult to tell whether it’s working
Metropolis-Hastings derivation
on the blackboard:
1. From detailed balance to stationarity
2. Proposed distribution and acceptance probability
3. From detailed balance to conditions on acceptance probability
Part 2
Dawn of Statistical Renaissance
Gibbs sampling
Gibbs sampling: Algorithm
Daphne Koller
Does it work? - often
Under certain conditions, the stationary distribution of this Markov chain is the joint
distribution of the Bayesian network:
● A probability distribution P(X) is positive, if P(X = x) > 0 for all x ∈ Dom(X).
● Theorem: If all conditional distributions in a Bayesian network are positive
(all probabilities are > 0) then a Gibbs sampler converges to the joint
distribution of the Bayesian network.
Gibbs properties
● Can handle evidence even with very low probability
● Works for all kinds of models, e.g. Markov networks, continuous variables
● Works very well in many practical cases
● overall is a very powerful and useful technique
● very popular nowadays
● has become another Swiss army knife for probabilistic inference
but
● Samples not statistically independent (statistics gets difficult)
● Hard to give guarantees on results
Jaroszewicz
Gibbs problems - more exploratory chains needed
Jaroszewicz
Gibbs sampling: example
Bayesian PMF using MCMC
https://www.cs.toronto.edu/~amnih/papers/bpmf.pdf
Bayesian PMF using MCMC
Some useful formulas:
on the blackboard ...
Diagnostics
You never know with randomness...
Practical problems
● We only want to use samples that are sampled from a distribution close to p
(x) - when chain is already ‘mixing’
● At early iterations (before chain converged) we may be far from p(x) - we
need ‘burn-in’ iterations
● Samples are correlated - we need thinning (take only every n-th sample)
Diagnostics
● Visual Inspection
● Geweke Diagnostic
○ tests whether the burn-in is sufficient
● Gelman and Rubin Diagnostic
○ may detect problems with disconnected sample spaces
● Raftery and Lewis Diagnostic
○ calculates the number of iterations and burn-in needed by first running
● Heidelberg and Welch Diagnostic
○ test statistic for stationarity of the distribution
http://www.people.fas.harvard.edu/~plam/teaching/methods/convergence/convergence_print.pdf
Visual inspection
http://www.people.fas.harvard.
edu/~plam/teaching/methods/convergence/converg
Multimodal distribution, hard to get
from one mode to another.
The chain is not mixing.
Autocorrelation (correlation between delayed
samples)
http://www.people.fas.harvard.
edu/~plam/teaching/methods/convergence/converg
Geweke Diagnostic
● takes two nonoverlapping parts of the Markov chain
● compares the means of both parts, using a difference of means test
● to see if the two parts of the chain are from the same distribution (null
hypothesis).
● the test statistic is a standard Z-score with the standard errors adjusted for
autocorrelation.
Gelman and Rubin Diagnostic
1. Run m ≥ 2 chains of length 2n from overdispersed starting values.
2. Discard the first n draws in each chain.
3. Calculate the within-chain and between-chain variance.
http://www.people.fas.harvard.
edu/~plam/teaching/methods/convergence/converg
Gelman and Rubin Diagnostic 2
4. Calculate the estimated variance of the parameter as a weighted
sum of the within-chain and between-chain variance.
5. Calculate the potential scale reduction factor.
When R is high (perhaps greater than 1.1 or 1.2), then we should run our
chains out longer to improve convergence to the stationary distribution.
http://www.people.fas.harvard.
edu/~plam/teaching/methods/convergence/converg
Probabilistic programming
Probabilistic programming language
programming language designed to:
● describe probabilistic models
● perform inference automatically even on complicated models
for example:
● PyMC
● BUGS / JAGS
● BayesPy
https://en.wikipedia.org/wiki/Probabilistic_programming_language
What’s inside?
● BUGS - Adaptive Rejection (AR) sampling
● JAGS - Slice Sampler (one variable at once)
JAGS PMF-like example: model file
model{#########START###########
sv ~ dunif(0,100)
su ~ dunif(0,100)
s ~ dunif(0,100)
tau <- 1/(s*s)
tauv <- 1/(sv*sv)
tauu <- 1/(su*su)
...
...
for (j in 1:M) {
for (d in 1:D) {
v[j,d] ~ dnorm(0, tauv)
}
}
for (i in 1:N) {
for (d in 1:D) {
u[i,d] ~ dnorm(0, tauu)
}
}
for (j in 1:M) {
for (i in 1:N) {
mu[i,j] <- inprod(u[i,], v[j,])
r3[i,j] <- 1/(1+exp(-mu[i,j]))
r[i,j] ~ dnorm(r3[i,j], tau)
}
}
}#############END############
JAGS PMF-like example: Parameters preparation
n.chains = 1
n.iter = 5000
n.burnin = n.iter
n.thin = 1 #max(1, floor((n.iter - n.burnin)/1000))
D = 10
lu = 0.05
lv = 0.05
n.cluster=n.chains
model.file = "models/pmf_hypnorm3.bug"
N = dim(train)[1]
M = dim(train)[2]
start.s = sd(train[!is.na(train)])
start.su = sqrt(start.s^2/lu)
start.sv = sqrt(start.s^2/lv)
jags.data = list(N=N, M=M, D=D, r=train)
jags.params = c("u", "v", "s", "su", "sv")
jags.inits = list(s=start.s, su=start.su, sv=start.sv,
u=matrix( rnorm(N*D,mean=0,sd=start.su), N, D),
v=matrix( rnorm(M*D,mean=0,sd=start.sv), M, D))
JAGS PMF-like example: running (sampling)
library(rjags)
model = jags.model(model.file, jags.data, n.chains=n.chains, n.adapt=n.burnin)
#update(model)
samples = jags.samples(model, jags.params, n.iter=n.iter, thin=n.thin)
JAGS PMF-like example: retrieving samples
per.chain = dim(samples$u)[3]
iterations = per.chain * dim(samples$u)[4]
user_sample = function(i, k) {samples$u[i, , (k-1)%%per.chain+1, ceiling(k/per.chain)]}
item_sample = function(j, k) {samples$v[j, , (k-1)%%per.chain+1, ceiling(k/per.chain)]}
Why it’s good, why it’s bad?
● fast prototyping
● less control
Results on movielens 100k
RMSE = 0.943 (~SGD)
More on https://github.com/tkusmierczyk/pmf-jags
Thank you!

Weitere ähnliche Inhalte

Was ist angesagt?

Introduction to MCMC methods
Introduction to MCMC methodsIntroduction to MCMC methods
Introduction to MCMC methodsChristian Robert
 
no U-turn sampler, a discussion of Hoffman & Gelman NUTS algorithm
no U-turn sampler, a discussion of Hoffman & Gelman NUTS algorithmno U-turn sampler, a discussion of Hoffman & Gelman NUTS algorithm
no U-turn sampler, a discussion of Hoffman & Gelman NUTS algorithmChristian Robert
 
Introduction to Bootstrap and elements of Markov Chains
Introduction to Bootstrap and elements of Markov ChainsIntroduction to Bootstrap and elements of Markov Chains
Introduction to Bootstrap and elements of Markov ChainsUniversity of Salerno
 
Programacion Cuadratica
Programacion CuadraticaProgramacion Cuadratica
Programacion Cuadraticapaquitootd
 
Hidden Markov Models with applications to speech recognition
Hidden Markov Models with applications to speech recognitionHidden Markov Models with applications to speech recognition
Hidden Markov Models with applications to speech recognitionbutest
 
Mark Girolami's Read Paper 2010
Mark Girolami's Read Paper 2010Mark Girolami's Read Paper 2010
Mark Girolami's Read Paper 2010Christian Robert
 
A new class of restricted quantum membrane systems
A new class of restricted quantum membrane systemsA new class of restricted quantum membrane systems
A new class of restricted quantum membrane systemsKonstantinos Giannakis
 
RSS discussion of Girolami and Calderhead, October 13, 2010
RSS discussion of Girolami and Calderhead, October 13, 2010RSS discussion of Girolami and Calderhead, October 13, 2010
RSS discussion of Girolami and Calderhead, October 13, 2010Christian Robert
 
short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018Christian Robert
 
Infinite and Standard Computation with Unconventional and Quantum Methods Usi...
Infinite and Standard Computation with Unconventional and Quantum Methods Usi...Infinite and Standard Computation with Unconventional and Quantum Methods Usi...
Infinite and Standard Computation with Unconventional and Quantum Methods Usi...Konstantinos Giannakis
 
Kriging and spatial design accelerated by orders of magnitude
Kriging and spatial design accelerated by orders of magnitudeKriging and spatial design accelerated by orders of magnitude
Kriging and spatial design accelerated by orders of magnitudeAlexander Litvinenko
 

Was ist angesagt? (20)

Introduction to MCMC methods
Introduction to MCMC methodsIntroduction to MCMC methods
Introduction to MCMC methods
 
mcmc
mcmcmcmc
mcmc
 
Jere Koskela slides
Jere Koskela slidesJere Koskela slides
Jere Koskela slides
 
A bit about мcmc
A bit about мcmcA bit about мcmc
A bit about мcmc
 
no U-turn sampler, a discussion of Hoffman & Gelman NUTS algorithm
no U-turn sampler, a discussion of Hoffman & Gelman NUTS algorithmno U-turn sampler, a discussion of Hoffman & Gelman NUTS algorithm
no U-turn sampler, a discussion of Hoffman & Gelman NUTS algorithm
 
Introduction to Bootstrap and elements of Markov Chains
Introduction to Bootstrap and elements of Markov ChainsIntroduction to Bootstrap and elements of Markov Chains
Introduction to Bootstrap and elements of Markov Chains
 
Programacion Cuadratica
Programacion CuadraticaProgramacion Cuadratica
Programacion Cuadratica
 
Qualifier
QualifierQualifier
Qualifier
 
Hidden Markov Models with applications to speech recognition
Hidden Markov Models with applications to speech recognitionHidden Markov Models with applications to speech recognition
Hidden Markov Models with applications to speech recognition
 
Richard Everitt's slides
Richard Everitt's slidesRichard Everitt's slides
Richard Everitt's slides
 
Hmm
HmmHmm
Hmm
 
Mark Girolami's Read Paper 2010
Mark Girolami's Read Paper 2010Mark Girolami's Read Paper 2010
Mark Girolami's Read Paper 2010
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
A new class of restricted quantum membrane systems
A new class of restricted quantum membrane systemsA new class of restricted quantum membrane systems
A new class of restricted quantum membrane systems
 
My Prize Winning Physics Poster from 2006
My Prize Winning Physics Poster from 2006My Prize Winning Physics Poster from 2006
My Prize Winning Physics Poster from 2006
 
QMC: Operator Splitting Workshop, Perturbed (accelerated) Proximal-Gradient A...
QMC: Operator Splitting Workshop, Perturbed (accelerated) Proximal-Gradient A...QMC: Operator Splitting Workshop, Perturbed (accelerated) Proximal-Gradient A...
QMC: Operator Splitting Workshop, Perturbed (accelerated) Proximal-Gradient A...
 
RSS discussion of Girolami and Calderhead, October 13, 2010
RSS discussion of Girolami and Calderhead, October 13, 2010RSS discussion of Girolami and Calderhead, October 13, 2010
RSS discussion of Girolami and Calderhead, October 13, 2010
 
short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018
 
Infinite and Standard Computation with Unconventional and Quantum Methods Usi...
Infinite and Standard Computation with Unconventional and Quantum Methods Usi...Infinite and Standard Computation with Unconventional and Quantum Methods Usi...
Infinite and Standard Computation with Unconventional and Quantum Methods Usi...
 
Kriging and spatial design accelerated by orders of magnitude
Kriging and spatial design accelerated by orders of magnitudeKriging and spatial design accelerated by orders of magnitude
Kriging and spatial design accelerated by orders of magnitude
 

Ähnlich wie Sampling and Markov Chain Monte Carlo Techniques

Monte Carlo Berkeley.pptx
Monte Carlo Berkeley.pptxMonte Carlo Berkeley.pptx
Monte Carlo Berkeley.pptxHaibinSu2
 
Distance-based bias in model-directed optimization of additively decomposable...
Distance-based bias in model-directed optimization of additively decomposable...Distance-based bias in model-directed optimization of additively decomposable...
Distance-based bias in model-directed optimization of additively decomposable...Martin Pelikan
 
generalized_nbody_acs_2015_challacombe
generalized_nbody_acs_2015_challacombegeneralized_nbody_acs_2015_challacombe
generalized_nbody_acs_2015_challacombeMatt Challacombe
 
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya
 
Smartphone Activity Prediction
Smartphone Activity PredictionSmartphone Activity Prediction
Smartphone Activity PredictionTriskelion_Kaggle
 
State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...Big Data Spain
 
From Competition to Complementarity: Comparative Influence Diffusion and Maxi...
From Competition to Complementarity: Comparative Influence Diffusion and Maxi...From Competition to Complementarity: Comparative Influence Diffusion and Maxi...
From Competition to Complementarity: Comparative Influence Diffusion and Maxi...Wei Lu
 
Aligning seqeunces with W-curve and SQL.
Aligning seqeunces with W-curve and SQL.Aligning seqeunces with W-curve and SQL.
Aligning seqeunces with W-curve and SQL.Workhorse Computing
 
Exploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningExploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningSungchul Kim
 
Graph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraGraph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraJason Riedy
 
Recent advances on low-rank and sparse decomposition for moving object detection
Recent advances on low-rank and sparse decomposition for moving object detectionRecent advances on low-rank and sparse decomposition for moving object detection
Recent advances on low-rank and sparse decomposition for moving object detectionActiveEon
 
High-Dimensional Network Estimation using ECL
High-Dimensional Network Estimation using ECLHigh-Dimensional Network Estimation using ECL
High-Dimensional Network Estimation using ECLHPCC Systems
 
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)Universitat Politècnica de Catalunya
 
PR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation LearningPR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation LearningSungchul Kim
 
MM - KBAC: Using mixed models to adjust for population structure in a rare-va...
MM - KBAC: Using mixed models to adjust for population structure in a rare-va...MM - KBAC: Using mixed models to adjust for population structure in a rare-va...
MM - KBAC: Using mixed models to adjust for population structure in a rare-va...Golden Helix Inc
 
Computational Giants_nhom.pptx
Computational Giants_nhom.pptxComputational Giants_nhom.pptx
Computational Giants_nhom.pptxThAnhonc
 
Firefly exact MCMC for Big Data
Firefly exact MCMC for Big DataFirefly exact MCMC for Big Data
Firefly exact MCMC for Big DataGianvito Siciliano
 

Ähnlich wie Sampling and Markov Chain Monte Carlo Techniques (20)

Monte Carlo Berkeley.pptx
Monte Carlo Berkeley.pptxMonte Carlo Berkeley.pptx
Monte Carlo Berkeley.pptx
 
Distance-based bias in model-directed optimization of additively decomposable...
Distance-based bias in model-directed optimization of additively decomposable...Distance-based bias in model-directed optimization of additively decomposable...
Distance-based bias in model-directed optimization of additively decomposable...
 
generalized_nbody_acs_2015_challacombe
generalized_nbody_acs_2015_challacombegeneralized_nbody_acs_2015_challacombe
generalized_nbody_acs_2015_challacombe
 
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Smartphone Activity Prediction
Smartphone Activity PredictionSmartphone Activity Prediction
Smartphone Activity Prediction
 
State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...
 
From Competition to Complementarity: Comparative Influence Diffusion and Maxi...
From Competition to Complementarity: Comparative Influence Diffusion and Maxi...From Competition to Complementarity: Comparative Influence Diffusion and Maxi...
From Competition to Complementarity: Comparative Influence Diffusion and Maxi...
 
Aligning seqeunces with W-curve and SQL.
Aligning seqeunces with W-curve and SQL.Aligning seqeunces with W-curve and SQL.
Aligning seqeunces with W-curve and SQL.
 
Exploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningExploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation Learning
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
Graph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraGraph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear Algebra
 
lec10svm.ppt
lec10svm.pptlec10svm.ppt
lec10svm.ppt
 
Recent advances on low-rank and sparse decomposition for moving object detection
Recent advances on low-rank and sparse decomposition for moving object detectionRecent advances on low-rank and sparse decomposition for moving object detection
Recent advances on low-rank and sparse decomposition for moving object detection
 
High-Dimensional Network Estimation using ECL
High-Dimensional Network Estimation using ECLHigh-Dimensional Network Estimation using ECL
High-Dimensional Network Estimation using ECL
 
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
 
PR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation LearningPR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation Learning
 
MM - KBAC: Using mixed models to adjust for population structure in a rare-va...
MM - KBAC: Using mixed models to adjust for population structure in a rare-va...MM - KBAC: Using mixed models to adjust for population structure in a rare-va...
MM - KBAC: Using mixed models to adjust for population structure in a rare-va...
 
Computational Giants_nhom.pptx
Computational Giants_nhom.pptxComputational Giants_nhom.pptx
Computational Giants_nhom.pptx
 
Firefly exact MCMC for Big Data
Firefly exact MCMC for Big DataFirefly exact MCMC for Big Data
Firefly exact MCMC for Big Data
 

Mehr von Tomasz Kusmierczyk

Overconfidence and subnetwork Inference for BNNs
Overconfidence and subnetwork Inference for BNNsOverconfidence and subnetwork Inference for BNNs
Overconfidence and subnetwork Inference for BNNsTomasz Kusmierczyk
 
Introduction to modern Variational Inference.
Introduction to modern Variational Inference.Introduction to modern Variational Inference.
Introduction to modern Variational Inference.Tomasz Kusmierczyk
 
Automatic variational inference with latent categorical variables
Automatic variational inference with latent categorical variablesAutomatic variational inference with latent categorical variables
Automatic variational inference with latent categorical variablesTomasz Kusmierczyk
 
Loss Calibrated Variational Inference
Loss Calibrated Variational InferenceLoss Calibrated Variational Inference
Loss Calibrated Variational InferenceTomasz Kusmierczyk
 
Variational inference using implicit distributions
Variational inference using implicit distributionsVariational inference using implicit distributions
Variational inference using implicit distributionsTomasz Kusmierczyk
 
On the Causal Effect of Digital Badges
On the Causal Effect of Digital BadgesOn the Causal Effect of Digital Badges
On the Causal Effect of Digital BadgesTomasz Kusmierczyk
 
What are the negative effects of social media?: fighting fake information
What are the negative effects of social media?: fighting fake informationWhat are the negative effects of social media?: fighting fake information
What are the negative effects of social media?: fighting fake informationTomasz Kusmierczyk
 
Probabilistic Models in Recommender Systems: Time Variant Models
Probabilistic Models in Recommender Systems: Time Variant ModelsProbabilistic Models in Recommender Systems: Time Variant Models
Probabilistic Models in Recommender Systems: Time Variant ModelsTomasz Kusmierczyk
 
Mining Correlations on Massive Bursty Time Series Collection (DASFAA2015)
Mining Correlations on Massive Bursty Time Series Collection (DASFAA2015)Mining Correlations on Massive Bursty Time Series Collection (DASFAA2015)
Mining Correlations on Massive Bursty Time Series Collection (DASFAA2015)Tomasz Kusmierczyk
 

Mehr von Tomasz Kusmierczyk (10)

Priors for BNNs
Priors for BNNsPriors for BNNs
Priors for BNNs
 
Overconfidence and subnetwork Inference for BNNs
Overconfidence and subnetwork Inference for BNNsOverconfidence and subnetwork Inference for BNNs
Overconfidence and subnetwork Inference for BNNs
 
Introduction to modern Variational Inference.
Introduction to modern Variational Inference.Introduction to modern Variational Inference.
Introduction to modern Variational Inference.
 
Automatic variational inference with latent categorical variables
Automatic variational inference with latent categorical variablesAutomatic variational inference with latent categorical variables
Automatic variational inference with latent categorical variables
 
Loss Calibrated Variational Inference
Loss Calibrated Variational InferenceLoss Calibrated Variational Inference
Loss Calibrated Variational Inference
 
Variational inference using implicit distributions
Variational inference using implicit distributionsVariational inference using implicit distributions
Variational inference using implicit distributions
 
On the Causal Effect of Digital Badges
On the Causal Effect of Digital BadgesOn the Causal Effect of Digital Badges
On the Causal Effect of Digital Badges
 
What are the negative effects of social media?: fighting fake information
What are the negative effects of social media?: fighting fake informationWhat are the negative effects of social media?: fighting fake information
What are the negative effects of social media?: fighting fake information
 
Probabilistic Models in Recommender Systems: Time Variant Models
Probabilistic Models in Recommender Systems: Time Variant ModelsProbabilistic Models in Recommender Systems: Time Variant Models
Probabilistic Models in Recommender Systems: Time Variant Models
 
Mining Correlations on Massive Bursty Time Series Collection (DASFAA2015)
Mining Correlations on Massive Bursty Time Series Collection (DASFAA2015)Mining Correlations on Massive Bursty Time Series Collection (DASFAA2015)
Mining Correlations on Massive Bursty Time Series Collection (DASFAA2015)
 

Kürzlich hochgeladen

Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 

Kürzlich hochgeladen (20)

Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 

Sampling and Markov Chain Monte Carlo Techniques

  • 1. Part 1: 2016-01-20 Part 2: 2016-02-10 Tomasz Kuśmierczyk Session 5: Sampling & MCMC Approximate and Scalable Inference for Complex Probabilistic Models in Recommender Systems Part 2: Inference Techniques
  • 2. MCMC = Monte Carlo Markov Chains MCMC ⊂ Sampling
  • 3. Literature / Credits ● Szymon Jaroszewicz lectures on “Selected Advanced Topics in Machine Learning” ● Daphne Koller lectures on “Probabilistic Graphical Models” (https://class.coursera. org/pgm-003/lecture) ● Patrick Lam slides http://www.people.fas.harvard. edu/~plam/teaching/methods/convergence/convergence_print.pdf ● Bishop’s book ch. 11 ● MacKay, David JC. Information theory, inference and learning algorithms. Cambridge university press, 2003. (http://www.inference.phy.cam.ac.uk/itprnn/book.pdf) ● R & JAGS online tutorials… ● …
  • 5. Motivation: Monte Carlo for integrating http://mlg.eng.cam.ac.uk/zoubin/tut06/mcmc.pdf Non-trivial posterior distribution (e.g., for BNs)
  • 6. Sampling vs Variational Inference (previous seminar) http://people.inf.ethz.ch/bkay/talks/Brodersen_2013_03_22.pdf
  • 7. Sampling continued ... ● Accuracy of sampling based estimates depends only on the variance of the quantity being estimated ● It does not depend directly on the dimensionality (having many variables is not a problem) ● In some cases we are able to break the curse of dimensionality but ● Sampling gets much more difficult in higher dimensions ● Variance often increases as the dimension grows ● Accuracy of sampling based methods grows only with square root of the number of samples Jaroszewicz
  • 8. Sampling techniques - basic cases ● uniform -> pseudo-random numbers generator ● discrete distributions -> range matching with the help of uniform (in log of number of outcomes time) ● continous -> cdf inverse ● various ‘tricks’ ● ...
  • 9. Sampling techniques (e.g., for BNs posterior) ● Ancestral Sampling (no evidence) ● Probabilistic Logic Sampling (like AS but samples not consistent with evidence are discarded -> low number of samples generated) ● Likelihood weighting (estimations may be inaccurate + other problems) ● Importance Sampling ● (Adaptive) Rejection Sampling ● Sampling-Importance-Resampling ● Metropolis ● Metropolis-Hastings ● Gibbs Sampling ● Hamiltionian (hybrid) Sampling ● Slice sampling ● and more...
  • 10. Monte Carlo without Markov Chains
  • 11. Few remarks ● there is no difference between sampling from normalized and non-normalized distributions ● non-normalized distributions are easy to evaluate for BNs ● in most cases (e.g. rejection sampling) we work with non-normalized distributions ● for simplicity p(x) is used in notation but there is no difference for complicated posterior distributions ● 1D case presented but work also in multi-dimensional case.
  • 13. Rejection sampling - proof Jaroszewicz
  • 14. Selection of c? ● c should be as small as possible to have low reduction rate ● but p <= c q must hold ● Adaptive Rejection Sampling for log-concave distributions ○ log-concave = logarithm of the distribution is concave
  • 16. Rejection Sampling problems ● part of the samples are rejected ● tight “envelope” helps a bit but ● in many dimensions (when there are many variables) dimensionality curse must be taken into account ● see Bishop’s example (for rejection sampling): ○ p(x) ~ N(0, s1) ○ q(x) ~ N(0, 1.01*s1) ○ D=1000 ○ -> acceptance ratio 1/20000
  • 18. What is a Markov Chain? ● A triple <possibly infinite set S of possible states, initial distribution over states P0, transition matrix P (T)> ● transition matrix - a matrix with probabilities Pij (Tij) that being in some state si at time t we will move to another state sj at time t+1 ● Markov property = next state depends only on one previous Jaroszewicz
  • 19. Markov Chains - distribution over states Jaroszewicz
  • 20. Markov Chains - stationary distribution Jaroszewicz
  • 22. Stationarity from regularity ● If there exists k such that, for every two states <si, sj> the probability of getting from si to sj in exactly k steps is > 0 (MC is regular) →MC converges to a unique stationary distribution ● Sufficient conditions for regularity: ○ there is a path between every pair of states ○ for every state, there is a self-transition
  • 23. Stationarity of irreducible, aperiodic MC ● Irreducible, aperiodic Markov chains always converge to a unique stationary distribution
  • 26. Why I talk about Markov Chains -> MCMC the idea is that: ● Markov Chain “jumps” over states ● states determine (BN) samples (that are later used for Monte Carlo) ○ for example: state ⇔ sample but we need: ● Markov Chain converges to a stationary distribution (to be proved every time) ● a distribution of generated samples is equal to required distribution (BNs posterior)
  • 27. Properties ● Very general purpose ● Often easy to implement ● Good theoretical guarantees as t -> ∞ but: ● Lots of tunable parameters / design choices ● Can be quite slow to converge ● Difficult to tell whether it’s working
  • 28. Metropolis-Hastings derivation on the blackboard: 1. From detailed balance to stationarity 2. Proposed distribution and acceptance probability 3. From detailed balance to conditions on acceptance probability
  • 29. Part 2 Dawn of Statistical Renaissance
  • 32. Does it work? - often Under certain conditions, the stationary distribution of this Markov chain is the joint distribution of the Bayesian network: ● A probability distribution P(X) is positive, if P(X = x) > 0 for all x ∈ Dom(X). ● Theorem: If all conditional distributions in a Bayesian network are positive (all probabilities are > 0) then a Gibbs sampler converges to the joint distribution of the Bayesian network.
  • 33. Gibbs properties ● Can handle evidence even with very low probability ● Works for all kinds of models, e.g. Markov networks, continuous variables ● Works very well in many practical cases ● overall is a very powerful and useful technique ● very popular nowadays ● has become another Swiss army knife for probabilistic inference but ● Samples not statistically independent (statistics gets difficult) ● Hard to give guarantees on results Jaroszewicz
  • 34. Gibbs problems - more exploratory chains needed Jaroszewicz
  • 36. Bayesian PMF using MCMC https://www.cs.toronto.edu/~amnih/papers/bpmf.pdf
  • 37. Bayesian PMF using MCMC Some useful formulas: on the blackboard ...
  • 39. You never know with randomness...
  • 40. Practical problems ● We only want to use samples that are sampled from a distribution close to p (x) - when chain is already ‘mixing’ ● At early iterations (before chain converged) we may be far from p(x) - we need ‘burn-in’ iterations ● Samples are correlated - we need thinning (take only every n-th sample)
  • 41. Diagnostics ● Visual Inspection ● Geweke Diagnostic ○ tests whether the burn-in is sufficient ● Gelman and Rubin Diagnostic ○ may detect problems with disconnected sample spaces ● Raftery and Lewis Diagnostic ○ calculates the number of iterations and burn-in needed by first running ● Heidelberg and Welch Diagnostic ○ test statistic for stationarity of the distribution http://www.people.fas.harvard.edu/~plam/teaching/methods/convergence/convergence_print.pdf
  • 43. Autocorrelation (correlation between delayed samples) http://www.people.fas.harvard. edu/~plam/teaching/methods/convergence/converg
  • 44. Geweke Diagnostic ● takes two nonoverlapping parts of the Markov chain ● compares the means of both parts, using a difference of means test ● to see if the two parts of the chain are from the same distribution (null hypothesis). ● the test statistic is a standard Z-score with the standard errors adjusted for autocorrelation.
  • 45. Gelman and Rubin Diagnostic 1. Run m ≥ 2 chains of length 2n from overdispersed starting values. 2. Discard the first n draws in each chain. 3. Calculate the within-chain and between-chain variance. http://www.people.fas.harvard. edu/~plam/teaching/methods/convergence/converg
  • 46. Gelman and Rubin Diagnostic 2 4. Calculate the estimated variance of the parameter as a weighted sum of the within-chain and between-chain variance. 5. Calculate the potential scale reduction factor. When R is high (perhaps greater than 1.1 or 1.2), then we should run our chains out longer to improve convergence to the stationary distribution. http://www.people.fas.harvard. edu/~plam/teaching/methods/convergence/converg
  • 48. Probabilistic programming language programming language designed to: ● describe probabilistic models ● perform inference automatically even on complicated models for example: ● PyMC ● BUGS / JAGS ● BayesPy https://en.wikipedia.org/wiki/Probabilistic_programming_language
  • 49. What’s inside? ● BUGS - Adaptive Rejection (AR) sampling ● JAGS - Slice Sampler (one variable at once)
  • 50. JAGS PMF-like example: model file model{#########START########### sv ~ dunif(0,100) su ~ dunif(0,100) s ~ dunif(0,100) tau <- 1/(s*s) tauv <- 1/(sv*sv) tauu <- 1/(su*su) ... ... for (j in 1:M) { for (d in 1:D) { v[j,d] ~ dnorm(0, tauv) } } for (i in 1:N) { for (d in 1:D) { u[i,d] ~ dnorm(0, tauu) } } for (j in 1:M) { for (i in 1:N) { mu[i,j] <- inprod(u[i,], v[j,]) r3[i,j] <- 1/(1+exp(-mu[i,j])) r[i,j] ~ dnorm(r3[i,j], tau) } } }#############END############
  • 51. JAGS PMF-like example: Parameters preparation n.chains = 1 n.iter = 5000 n.burnin = n.iter n.thin = 1 #max(1, floor((n.iter - n.burnin)/1000)) D = 10 lu = 0.05 lv = 0.05 n.cluster=n.chains model.file = "models/pmf_hypnorm3.bug" N = dim(train)[1] M = dim(train)[2] start.s = sd(train[!is.na(train)]) start.su = sqrt(start.s^2/lu) start.sv = sqrt(start.s^2/lv) jags.data = list(N=N, M=M, D=D, r=train) jags.params = c("u", "v", "s", "su", "sv") jags.inits = list(s=start.s, su=start.su, sv=start.sv, u=matrix( rnorm(N*D,mean=0,sd=start.su), N, D), v=matrix( rnorm(M*D,mean=0,sd=start.sv), M, D))
  • 52. JAGS PMF-like example: running (sampling) library(rjags) model = jags.model(model.file, jags.data, n.chains=n.chains, n.adapt=n.burnin) #update(model) samples = jags.samples(model, jags.params, n.iter=n.iter, thin=n.thin)
  • 53. JAGS PMF-like example: retrieving samples per.chain = dim(samples$u)[3] iterations = per.chain * dim(samples$u)[4] user_sample = function(i, k) {samples$u[i, , (k-1)%%per.chain+1, ceiling(k/per.chain)]} item_sample = function(j, k) {samples$v[j, , (k-1)%%per.chain+1, ceiling(k/per.chain)]}
  • 54. Why it’s good, why it’s bad? ● fast prototyping ● less control
  • 55. Results on movielens 100k RMSE = 0.943 (~SGD) More on https://github.com/tkusmierczyk/pmf-jags