SlideShare ist ein Scribd-Unternehmen logo
1 von 180
Downloaden Sie, um offline zu lesen
Probabilistic modeling in
Deep Learning
Dzianis Dus
Lead Data Scientist at InData Labs
How we will spend the next 60 minutes?
In thinking about the following topics:
In thinking about the following topics:
1. What does “probabilistic modeling” means?
How we will spend the next 60 minutes?
In thinking about the following topics:
1. What does “probabilistic modeling” means?
2. Why it is cool (sometimes)?
How we will spend the next 60 minutes?
In thinking about the following topics:
1. What does “probabilistic modeling” means?
2. Why it is cool (sometimes)?
3. How we can use it to build:
How we will spend the next 60 minutes?
In thinking about the following topics:
1. What does “probabilistic modeling” means?
2. Why it is cool (sometimes)?
3. How we can use it to build:
a. More robust and powerful models
How we will spend the next 60 minutes?
In thinking about the following topics:
1. What does “probabilistic modeling” means?
2. Why it is cool (sometimes)?
3. How we can use it to build:
a. More robust and powerful models
b. Models with predefined properties
How we will spend the next 60 minutes?
In thinking about the following topics:
1. What does “probabilistic modeling” means?
2. Why it is cool (sometimes)?
3. How we can use it to build:
a. More robust and powerful models
b. Models with predefined properties
c. Models without overfitting (o_O)
How we will spend the next 60 minutes?
In thinking about the following topics:
1. What does “probabilistic modeling” means?
2. Why it is cool (sometimes)?
3. How we can use it to build:
a. More robust and powerful models
b. Models with predefined properties
c. Models without overfitting (o_O)
d. Infinite ensembles of models (o_O)
How we will spend the next 60 minutes?
In thinking about the following topics:
1. What does “probabilistic modeling” means?
2. Why it is cool (sometimes)?
3. How we can use it to build:
a. More robust and powerful models
b. Models with predefined properties
c. Models without overfitting (o_O)
d. Infinite ensembles of models (o_O)
4. Deep Learning
How we will spend the next 60 minutes?
Problem statement: Empirical way
Suppose that we want to solve classical regression problem:
Problem statement: Empirical way
Suppose that we want to solve classical regression problem:
Typical approach:
Problem statement: Empirical way
Suppose that we want to solve classical regression problem:
Typical approach:
1. Choose functional family for F(...)
2. Choose appropriate loss function
3. Choose optimization algorithm
4. Minimize loss on (X, Y)
5. ...
Problem statement: Empirical way
Suppose that we want to solve classical regression problem:
Typical approach:
1. Choose functional family for F(...)
2. Choose appropriate loss function
3. Choose optimization algorithm
4. Minimize loss on (X, Y)
5. ...
Problem statement: Probabilistic way
Define “probability model” (describes how your data was generated):
Problem statement: Probabilistic way
Define “probability model” (describes how your data was generated):
Having model you can calculate “likelihood” of your data:
Problem statement: Probabilistic way
Define “probability model” (describes how your data was generated):
Having model you can calculate “likelihood” of your data:
We are working with i.i.d. data
Problem statement: Probabilistic way
Define “probability model” (describes how your data was generated):
Having model you can calculate “likelihood” of your data:
Sharing the same variance
Problem statement: Probabilistic way
Data log-likelihood:
Maximum likelihood estimation:
Problem statement: Probabilistic way
Data log-likelihood:
Maximum likelihood estimation:
MSE Loss minimization
Problem statement: Probabilistic way
Data log-likelihood:
Maximum likelihood estimation:
MSE Loss minimization
For i.i.d. data sharing the same variance!
Problem statement: Probabilistic way
Problem statement: Probabilistic way
Log-Likelihood maximization = Empirical loss minimization
Problem statement: Probabilistic way
1. MAE minimization = likelihood maximization of i.i.d. Laplace-distributed variables
Empirical loss minimizationLog-Likelihood maximization =
Problem statement: Probabilistic way
1. MAE minimization = likelihood maximization of i.i.d. Laplace-distributed variables
2. For each empirically stated problem exists appropriate probability model
Empirical loss minimizationLog-Likelihood maximization =
Problem statement: Probabilistic way
1. MAE minimization = likelihood maximization of i.i.d. Laplace-distributed variables
2. For each empirically stated problem exists appropriate probability model
3. Empirical loss is often just a particular case of wider probability model
Empirical loss minimizationLog-Likelihood maximization =
Problem statement: Probabilistic way
1. MAE minimization = likelihood maximization of i.i.d. Laplace-distributed variables
2. For each empirically stated problem exists appropriate probability model
3. Empirical loss is often just a particular case of wider probability model
4. Wider model = wider opportunities!
Empirical loss minimizationLog-Likelihood maximization =
Probabilistic modeling: Wider opportunities for Flo
Suppose that we have:
1. N unique users in the training set
2. For each user we’ve collected time series of user states (on daily basis):
3. For each user we’ve collected time series of cycles lengths:
4. We predict time series of lengths Y based on time series of states X
Probabilistic modeling: Wider opportunities for Flo
We want to maximize data likelihood:
Probabilistic modeling: Wider opportunities for Flo
We want to maximize data likelihood:
Probability that user i will have
cycle with length y at day j
Probabilistic modeling: Wider opportunities for Flo
We want to maximize data likelihood:
Just another notationProbability that user i will have
cycle with length y at day j
Probabilistic modeling: Wider opportunities for Flo
We want to maximize data likelihood:
Cycle length of user i at day j has
Gaussian distribution
Probabilistic modeling: Wider opportunities for Flo
We want to maximize data likelihood:
Parameters of distribution at day j
depends on model parameters
and all features up to day j
Probabilistic modeling: Wider opportunities for Flo
We want to maximize data likelihood:
Can be easily modeled with deep RNN!
Probabilistic modeling: Wider opportunities for Flo
We want to maximize data likelihood:
Can be easily modeled with deep RNN!
Note that:
Probabilistic modeling: Wider opportunities for Flo
We want to maximize data likelihood:
Can be easily modeled with deep RNN!
Note that:
We don’t need any labels to predict variance!
Probabilistic modeling: Wider opportunities for Flo
Real life example:
Parameter estimation theory
Estimation theory is a branch of statistics that deals with estimating the values of
parameters based on measured empirical data that has a random component.
© Wikipedia
Parameter estimation theory
Estimation theory is a branch of statistics that deals with estimating the values of
parameters based on measured empirical data that has a random component.
© Wikipedia
Commonly used estimators:
● Maximum likelihood estimator (MLE) - the Ugly
● Maximum a posteriori estimator (MAP) - the Bad
● Bayesian estimator - the Good
Parameter estimation theory
Estimation theory is a branch of statistics that deals with estimating the values of
parameters based on measured empirical data that has a random component.
© Wikipedia
Commonly used estimators:
● Maximum likelihood estimator (MLE) - the Ugly
● Maximum a posteriori estimator (MAP) - the Bad
● Bayesian estimator - the Good
We are here
Parameter estimation theory
Estimation theory is a branch of statistics that deals with estimating the values of
parameters based on measured empirical data that has a random component.
© Wikipedia
Commonly used estimators:
● Maximum likelihood estimator (MLE) - the Ugly
● Maximum a posteriori estimator (MAP) - the Bad
● Bayesian estimator - the Good
The way we go
Maximum a posteriori estimator
Until now, we’ve been talking about Maximum Likelihood Estimator:
Maximum a posteriori estimator
Until now, we’ve been talking about Maximum Likelihood Estimator:
Now assume that prior distribution over parameters exists:
Maximum a posteriori estimator
Until now, we’ve been talking about Maximum Likelihood Estimator:
Now assume that prior distribution over parameters exists:
Then we can apply Bayes Rule:
Maximum a posteriori estimator
Until now, we’ve been talking about Maximum Likelihood Estimator:
Now assume that prior distribution over parameters exists:
Then we can apply Bayes Rule:
Posterior distribution
over model parameters
Maximum a posteriori estimator
Until now, we’ve been talking about Maximum Likelihood Estimator:
Now assume that prior distribution over parameters exists:
Then we can apply Bayes Rule:
Data likelihood for specific parameters
(could be modeled with Deep Network!)
Maximum a posteriori estimator
Until now, we’ve been talking about Maximum Likelihood Estimator:
Now assume that prior distribution over parameters exists:
Then we can apply Bayes Rule:
Prior distribution over parameters
(describes our prior knowledge or / and
our desires for the model)
Maximum a posteriori estimator
Until now, we’ve been talking about Maximum Likelihood Estimator:
Now assume that prior distribution over parameters exists:
Then we can apply Bayes Rule:
Bayesian evidence
Maximum a posteriori estimator
Until now, we’ve been talking about Maximum Likelihood Estimator:
Now assume that prior distribution over parameters exists:
Then we can apply Bayes Rule:
Bayesian evidence
A powerful method for model selection!
Maximum a posteriori estimator
Until now, we’ve been talking about Maximum Likelihood Estimator:
Now assume that prior distribution over parameters exists:
Then we can apply Bayes Rule:
As a rule this integral is intractable :(
(You can never integrate this)
Maximum a posteriori estimator
The core idea of Maximum a Posteriori Estimator:
Maximum a posteriori estimator
The core idea of Maximum a Posteriori Estimator:
Doesn’t depend on model parameters
Maximum a posteriori estimator
The core idea of Maximum a Posteriori Estimator:
Maximum a posteriori estimator
The core idea of Maximum a Posteriori Estimator:
The only (but powerful!)
difference from MLE
Maximum a posteriori estimator
The core idea of Maximum a Posteriori Estimator:
1. MAP estimates model parameters as mode of posterior distribution
Maximum a posteriori estimator
The core idea of Maximum a Posteriori Estimator:
1. MAP estimates model parameters as mode of posterior distribution
2. MAP estimation with non-informative prior = MLE
Maximum a posteriori estimator
The core idea of Maximum a Posteriori Estimator:
1. MAP estimates model parameters as mode of posterior distribution
2. MAP estimation with non-informative prior = MLE
3. MAP restricts the search space of possible models
Maximum a posteriori estimator
The core idea of Maximum a Posteriori Estimator:
1. MAP estimates model parameters as mode of posterior distribution
2. MAP estimation with non-informative prior = MLE
3. MAP restricts the search space of possible models
4. With MAP you can put restrictions not only on model weights but also on many
interactions inside the network
Probabilistic modeling: Regularization
Regularization - is a process of introducing additional information in order to
solve an ill-posed problem or prevent overfitting. © Wikipedia
Probabilistic modeling: Regularization
Regularization - is a process of introducing additional information in order to
solve an ill-posed problem or prevent overfitting. © Wikipedia
Regularization - is a process of introducing additional information in order to
restrict model to have predefined properties.
Probabilistic modeling: Regularization
Regularization - is a process of introducing additional information in order to
solve an ill-posed problem or prevent overfitting. © Wikipedia
Regularization - is a process of introducing additional information in order to
restrict model to have predefined properties.
It is closely connected to “prior distributions” on weights / activations / …
Probabilistic modeling: Regularization
Regularization - is a process of introducing additional information in order to
solve an ill-posed problem or prevent overfitting. © Wikipedia
Regularization - is a process of introducing additional information in order to
restrict model to have predefined properties.
It is closely connected to “prior distributions” on weights / activations / …
… and to MAP estimation!
Probabilistic modeling: Regularization
Weights decay (or L2 regularization):
Probabilistic modeling: Regularization
Weights decay (or L2 regularization):
Appropriate probability model:
Model log-likelihood:
Probabilistic modeling: Regularization
Probabilistic modeling: Regularization
Probabilistic modeling: Regularization
Data log-likelihood
(we’ve already calculated this)
Probabilistic modeling: Regularization
Doesn’t depend on
model parameters
Probabilistic modeling: Regularization
Squared L2 norm
of parameters
Probabilistic modeling: Regularization
Regularization constant
Probabilistic modeling: Regularization
So, it is clear that:
Probabilistic modeling: Regularization
1. Laplace distribution as a prior = L1 regularization
Probabilistic modeling: Regularization
1. Laplace distribution as a prior = L1 regularization
2. It can be shown that Dropout is also a form of particular probability model …
Probabilistic modeling: Regularization
1. Laplace distribution as a prior = L1 regularization
2. It can be shown that Dropout is also a form of particular probability model …
3. … a Bayesian one :) …
Probabilistic modeling: Regularization
1. Laplace distribution as a prior = L1 regularization
2. It can be shown that Dropout is also a form of particular probability model …
3. … a Bayesian one :) …
4. … and therefore can be used not only as a regularization technique!
Probabilistic modeling: Regularization
1. Laplace distribution as a prior = L1 regularization
2. It can be shown that Dropout is also a form of particular probability model …
3. … a Bayesian one :) …
4. … and therefore can be used not only as a regularization technique!
5. Do you want to pack your network weights into few kilobytes?
Probabilistic modeling: Regularization
1. Laplace distribution as a prior = L1 regularization
2. It can be shown that Dropout is also a form of particular probability model …
3. … a Bayesian one :) …
4. … and therefore can be used not only as a regularization technique!
5. Do you want to pack your network weights into few kilobytes?
6. Ok, all you need - is MAP!
Probabilistic modeling: Regularization
1. Laplace distribution as a prior = L1 regularization
2. It can be shown that Dropout is also a form of particular probability model …
3. … a Bayesian one :) …
4. … and therefore can be used not only as a regularization technique!
5. Do you want to pack your network weights into few kilobytes?
6. Ok, all you need - is MAP!
MAP - is all you need!
Weights packing: Empirical way
Song Han and others - Deep Compression: Compressing Deep Neural Networks with Pruning,
Trained Quantization and Huffman Coding (2015)
Modern neural networks could be dramatically compressed:
Weights packing: Soft-Weight Sharing
1. Define prior distribution of weights as Gaussian Mixture Model
1. Define prior distribution of weights as Gaussian Mixture Model
Mixture of Gaussians =
Weights packing: Soft-Weight Sharing
1. Define prior distribution of weights as Gaussian Mixture Model
2. For one of the Gaussian components force:
Weights packing: Soft-Weight Sharing
1. Define prior distribution of weights as Gaussian Mixture Model
2. For one of the Gaussian components force:
3. Maybe define Gamma prior for variances (for numerical stability)
Weights packing: Soft-Weight Sharing
1. Define prior distribution of weights as Gaussian Mixture Model
2. For one of the Gaussian components force:
3. Maybe define Gamma prior for variances (for numerical stability)
4. Just find MAP estimation for both model parameters and free mixture parameters!
Weights packing: Soft-Weight Sharing
Karen Ullrich - Soft Weight-Sharing For Neural Network Compression (2017)
Weights packing: Soft-Weight Sharing
Karen Ullrich - Soft Weight-Sharing For Neural Network Compression (2017)
Weights packing: Soft-Weight Sharing
Maximum a posteriori estimation
1. Pretty cool and powerful technique
2. You can build hierarchical models (put priors on priors of priors of…)
3. You can put priors on activations of layers (sparse autoencoders)
4. Leads to “Empirical Bayes”
5. Thinking how to restrict your model? Try to find appropriate prior!
True Bayesian Modeling: Recap
True Bayesian Modeling: Recap
1. Posterior could be easily found in case of conjugate distributions
True Bayesian Modeling: Recap
1. Posterior could be easily found in case of conjugate distributions
2. But for most real life models denominator is intractable
True Bayesian Modeling: Recap
1. Posterior could be easily found in case of conjugate distributions
2. But for most real life models denominator is intractable
3. In MAP denominator is totally ignored
True Bayesian Modeling: Recap
1. Posterior could be easily found in case of conjugate distributions
2. But for most real life models denominator is intractable
3. In MAP denominator is totally ignored
4. Can we find a good approximation of the posterior?
True Bayesian Modeling: Approximation
Two main ideas:
True Bayesian Modeling: Approximation
Two main ideas:
1. MCMC (Monte Carlo Markov Chain)
True Bayesian Modeling: Approximation
Two main ideas:
1. MCMC (Monte Carlo Markov Chain) - a tricky one
True Bayesian Modeling: Approximation
Two main ideas:
1. MCMC (Monte Carlo Markov Chain) - a tricky one
2. Variational Inference
True Bayesian Modeling: Approximation
Two main ideas:
1. MCMC (Monte Carlo Markov Chain) - a tricky one
2. Variational Inference - a “Black Magic” one
True Bayesian Modeling: Approximation
Two main ideas:
1. MCMC (Monte Carlo Markov Chain) - a tricky one
2. Variational Inference - a “Black Magic” one
Another ideas exists:
1. Monte Carlo Dropout
2. Stochastic gradient langevin dynamics
3. ...
True Bayesian Modeling: MCMC
1. Key idea is to construct Markov Chain which has posterior distribution as
its equilibrium distribution
True Bayesian Modeling: MCMC
1. Key idea is to construct Markov Chain which has posterior distribution as
its equilibrium distribution
2. Then you can burn-in Markov Chain (convergence to equilibrium) and then
sample from the posterior distribution
True Bayesian Modeling: MCMC
1. Key idea is to construct Markov Chain which has posterior distribution as
its equilibrium distribution
2. Then you can burn-in Markov Chain (convergence to equilibrium) and then
sample from the posterior distribution
3. Sounds tricky, but it is well-defined procedure
True Bayesian Modeling: MCMC
1. Key idea is to construct Markov Chain which has posterior distribution as
its equilibrium distribution
2. Then you can burn-in Markov Chain (convergence to equilibrium) and then
sample from the posterior distribution
3. Sounds tricky, but it is well-defined procedure
4. PyMC3 = Bayesian Modeling and Probabilistic Machine Learning in Python
True Bayesian Modeling: MCMC
1. Key idea is to construct Markov Chain which has posterior distribution as
its equilibrium distribution
2. Then you can burn-in Markov Chain (convergence to equilibrium) and then
sample from the posterior distribution
3. Sounds tricky, but it is well-defined procedure
4. PyMC3 = Bayesian Modeling and Probabilistic Machine Learning in Python
5. Unfortunately, it is not scalable
True Bayesian Modeling: MCMC
1. Key idea is to construct Markov Chain which has posterior distribution as
its equilibrium distribution
2. Then you can burn-in Markov Chain (convergence to equilibrium) and then
sample from the posterior distribution
3. Sounds tricky, but it is well-defined procedure
4. PyMC3 = Bayesian Modeling and Probabilistic Machine Learning in Python
5. Unfortunately, it is not scalable
6. So, you can’t explicitly apply it to complex models (like Neural Networks)
True Bayesian Modeling: MCMC
1. Key idea is to construct Markov Chain which has posterior distribution as
its equilibrium distribution
2. Then you can burn-in Markov Chain (convergence to equilibrium) and then
sample from the posterior distribution
3. Sounds tricky, but it is well-defined procedure
4. PyMC3 = Bayesian Modeling and Probabilistic Machine Learning in Python
5. Unfortunately, it is not scalable
6. So, you can’t explicitly apply it to complex models (like Neural Networks)
7. But implicit scaling is possible: Bayesian learning via stochastic gradient
langevin dynamics (2011)
True Bayesian Modeling: Variational Inference
True posterior:
True Bayesian Modeling: Variational Inference
True posterior:
Modeled with Deep Neural Network
True Bayesian Modeling: Variational Inference
True posterior:
Intractable integral :(
True Bayesian Modeling: Variational Inference
True posterior:
Let’s find good approximation:
True Bayesian Modeling: Variational Inference
True posterior:
Let’s find good approximation:
True Bayesian Modeling: Variational Inference
True posterior:
Let’s find good approximation:
Explicitly define distribution family
for approximation
(e.g. multivariate gaussian)
True Bayesian Modeling: Variational Inference
True posterior:
Let’s find good approximation:
Variational parameters
(e.g. mean vector, covariance matrix)
True Bayesian Modeling: Variational Inference
True posterior:
Let’s find good approximation:
Speaking mathematically:
True Bayesian Modeling: Variational Inference
True posterior:
Let’s find good approximation:
Speaking mathematically:
Kullback-Leibler divergence
(measure of distributions dissimilarity)
True Bayesian Modeling: Variational Inference
True posterior:
Let’s find good approximation:
Speaking mathematically:
True posterior is unknown :(
Achtung!
A lot of math
is coming!
True Bayesian Modeling: Variational Inference
True Bayesian Modeling: Variational Inference
True Bayesian Modeling: Variational Inference
True Bayesian Modeling: Variational Inference
Rewrite this using
Bayes rule:
True Bayesian Modeling: Variational Inference
True Bayesian Modeling: Variational Inference
Doesn’t depend on theta!
(After integration)
Parameters
of integration
True Bayesian Modeling: Variational Inference
So, it is a constant!
True Bayesian Modeling: Variational Inference
True Bayesian Modeling: Variational Inference
Has no effect on
minimization problem
True Bayesian Modeling: Variational Inference
True Bayesian Modeling: Variational Inference
Group this together
True Bayesian Modeling: Variational Inference
True Bayesian Modeling: Variational Inference
Multiply by (-1)
True Bayesian Modeling: Variational Inference
True Bayesian Modeling: Variational Inference
KL
divergence
True Bayesian Modeling: Variational Inference
True Bayesian Modeling: Variational Inference
It is an expectation
over q(...)
True Bayesian Modeling: Variational Inference
True Bayesian Modeling: Variational Inference
Equivalent problems!
True Bayesian Modeling: Variational Inference
Equivalent problems!
Likelihood of your data
(your Neural Network works here!)
True Bayesian Modeling: Variational Inference
Equivalent problems!
Prior on network weights
(you define this!)
True Bayesian Modeling: Variational Inference
Equivalent problems!
Approximate posterior
(you define the form of this!)
True Bayesian Modeling: Variational Inference
Equivalent problems!
We want to optimize this wrt of
approximate posterior parameters!
True Bayesian Modeling: Variational Inference
Equivalent problems!
We need to calculate the gradient of this
True Bayesian Modeling: Variational Inference
Gradient calculation:
True Bayesian Modeling: Variational Inference
Gradient calculation:
True Bayesian Modeling: Variational Inference
Gradient calculation:
Rewrite this as expectation
(for convenience)
True Bayesian Modeling: Variational Inference
Gradient calculation:
True Bayesian Modeling: Variational Inference
Gradient calculation:
Ooops...
True Bayesian Modeling: Variational Inference
True Bayesian Modeling: Variational Inference
Modeled with
Deep Network!
True Bayesian Modeling: Variational Inference
This integral is intractable too :(
(God damn!)
True Bayesian Modeling: Variational Inference
If it was just q(...) then we can calculate
approximation using Monte Carlo
method!
True Bayesian Modeling: Variational Inference
True Bayesian Modeling: Variational Inference
This is just = 1!
True Bayesian Modeling: Variational Inference
This is gradient of log(q(...))!
True Bayesian Modeling: Variational Inference
True Bayesian Modeling: Variational Inference
True Bayesian Modeling: Variational Inference
Luke,
log derivative
trick!
True Bayesian Modeling: Variational Inference
Luke,
log derivative
trick!
True Bayesian Modeling: Variational Inference
Can be approximated
with Monte Carlo!
Luke,
log derivative
trick!
True Bayesian Modeling: Variational Inference
Luke,
log derivative
trick!
Bayesian Networks: Step by step
Define functional family for approximate posterior (e.g. Gaussian):
Bayesian Networks: Step by step
Define functional family for approximate posterior (e.g. Gaussian):
Solve optimization problem (with doubly stochastic gradient ascend):
Bayesian Networks: Step by step
Define functional family for approximate posterior (e.g. Gaussian):
Solve optimization problem (with doubly stochastic gradient ascend):
Having approximate posterior
you can sample network weights (as much as you want)!
Bayesian Networks: Pros and Cons
As a result you have:
1. Infinite ensemble of Neural Networks!
2. No overfit problem (in classical sense)!
3. No adversarial examples problem!
4. Measure of prediction confidence!
5. ...
Bayesian Networks: Pros and Cons
As a result you have:
1. Infinite ensemble of Neural Networks!
2. No overfit problem (in classical sense)!
3. No adversarial examples problem!
4. Measure of prediction confidence!
5. ...
No free hunch:
1. A lot of work is still hidden in “scalability” and “convergence”!
2. Very (very!) expensive predictions!
Bayesian Networks Examples: BRNN
Meire Fortunato and others - Bayesian Recurrent Neural Networks (2017)
Bayesian Networks Examples: SegNet
Alex Kendall and others - Bayesian SegNet: Model Uncertainty in
Deep Convolutional Encoder-Decoder Architectures for Scene Understanding (2016)
Bayesian Networks Examples: SegNet
Alex Kendall and others - Bayesian SegNet: Model Uncertainty in
Deep Convolutional Encoder-Decoder Architectures for Scene Understanding (2016)
Bayesian Networks Examples: SegNet
Alex Kendall and others - Bayesian SegNet: Model Uncertainty in
Deep Convolutional Encoder-Decoder Architectures for Scene Understanding (2016)
Bayesian Networks Examples: SegNet
Alex Kendall and others - Bayesian SegNet: Model Uncertainty in
Deep Convolutional Encoder-Decoder Architectures for Scene Understanding (2016)
Bayesian Networks Examples: SegNet
Alex Kendall and others - Bayesian SegNet: Model Uncertainty in
Deep Convolutional Encoder-Decoder Architectures for Scene Understanding (2016)
Bayesian Networks in (near) Production: UBER
Lingxue Zhu - Deep and Confident Prediction for Time Series at Uber (2017)
How it works:
1. LSTM network
2. Monte Carlo Dropout
3. Daily complete trips
prediction
4. Anomaly detection for
various metrics
Bayesian Networks in (near) Production: UBER
Lingxue Zhu - Deep and Confident Prediction for Time Series at Uber (2017)
How it works:
1. LSTM network
2. Monte Carlo Dropout
3. Daily complete trips
prediction
4. Anomaly detection for
various metrics
Bayesian Networks in (near) Production: UBER
Lingxue Zhu - Deep and Confident Prediction for Time Series at Uber (2017)
How it works:
1. LSTM network
2. Monte Carlo Dropout
3. Daily complete trips
prediction
4. Anomaly detection for
various metrics
Bayesian Networks in (near) Production: Flo
Predicted distributions of cycle length for 40 independent users:
Switched to Empirical Bayes for now.
Speech Summary
1. Probabilistic modeling is a powerful tool with strong math background
Speech Summary
1. Probabilistic modeling is a powerful tool with strong math background
2. Many techniques are currently not widely used in Deep Learning
Speech Summary
1. Probabilistic modeling is a powerful tool with strong math background
2. Many techniques are currently not widely used in Deep Learning
3. You can improve many aspects of your model using the same framework
Speech Summary
1. Probabilistic modeling is a powerful tool with strong math background
2. Many techniques are currently not widely used in Deep Learning
3. You can improve many aspects of your model using the same framework
4. Scalability, stability of convergence and inference cost are main constraints
Speech Summary
1. Probabilistic modeling is a powerful tool with strong math background
2. Many techniques are currently not widely used in Deep Learning
3. You can improve many aspects of your model using the same framework
4. Scalability, stability of convergence and inference cost are main constraints
5. The future of Deep Learning looks Bayesian...
Speech Summary
1. Probabilistic modeling is a powerful tool with strong math background
2. Many techniques are currently not widely used in Deep Learning
3. You can improve many aspects of your model using the same framework
4. Scalability, stability of convergence and inference cost are main constraints
5. The future of Deep Learning looks Bayesian...
… (for the moment, for me)
Thank you for your !
I hope, you have a lot of questions :)
(attention)
Dzianis Dus
Lead Data Scientist at InData Labs

Weitere ähnliche Inhalte

Was ist angesagt?

Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioMarina Santini
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learningHaris Jamil
 
Belief Networks & Bayesian Classification
Belief Networks & Bayesian ClassificationBelief Networks & Bayesian Classification
Belief Networks & Bayesian ClassificationAdnan Masood
 
Feedforward neural network
Feedforward neural networkFeedforward neural network
Feedforward neural networkSopheaktra YONG
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)Abhimanyu Dwivedi
 
Machine Learning Classifiers
Machine Learning ClassifiersMachine Learning Classifiers
Machine Learning ClassifiersMostafa
 
Intro to Deep learning - Autoencoders
Intro to Deep learning - Autoencoders Intro to Deep learning - Autoencoders
Intro to Deep learning - Autoencoders Akash Goel
 
Machine learning with ADA Boost
Machine learning with ADA BoostMachine learning with ADA Boost
Machine learning with ADA BoostAman Patel
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnBenjamin Bengfort
 
Linear regression
Linear regressionLinear regression
Linear regressionMartinHogg9
 
Artificial Neural Network Lecture 6- Associative Memories & Discrete Hopfield...
Artificial Neural Network Lecture 6- Associative Memories & Discrete Hopfield...Artificial Neural Network Lecture 6- Associative Memories & Discrete Hopfield...
Artificial Neural Network Lecture 6- Associative Memories & Discrete Hopfield...Mohammed Bennamoun
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkKnoldus Inc.
 
Naive bayesian classification
Naive bayesian classificationNaive bayesian classification
Naive bayesian classificationDr-Dipali Meher
 
Naive bayes
Naive bayesNaive bayes
Naive bayesumeskath
 
Chapter 09 classification advanced
Chapter 09 classification advancedChapter 09 classification advanced
Chapter 09 classification advancedHouw Liong The
 
. An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic .... An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic ...butest
 

Was ist angesagt? (20)

Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
 
Belief Networks & Bayesian Classification
Belief Networks & Bayesian ClassificationBelief Networks & Bayesian Classification
Belief Networks & Bayesian Classification
 
Feedforward neural network
Feedforward neural networkFeedforward neural network
Feedforward neural network
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)
 
Machine Learning Classifiers
Machine Learning ClassifiersMachine Learning Classifiers
Machine Learning Classifiers
 
Machine learning
Machine learningMachine learning
Machine learning
 
Intro to Deep learning - Autoencoders
Intro to Deep learning - Autoencoders Intro to Deep learning - Autoencoders
Intro to Deep learning - Autoencoders
 
Machine learning with ADA Boost
Machine learning with ADA BoostMachine learning with ADA Boost
Machine learning with ADA Boost
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
 
Bayesian learning
Bayesian learningBayesian learning
Bayesian learning
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Artificial Neural Network Lecture 6- Associative Memories & Discrete Hopfield...
Artificial Neural Network Lecture 6- Associative Memories & Discrete Hopfield...Artificial Neural Network Lecture 6- Associative Memories & Discrete Hopfield...
Artificial Neural Network Lecture 6- Associative Memories & Discrete Hopfield...
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
 
04 Multi-layer Feedforward Networks
04 Multi-layer Feedforward Networks04 Multi-layer Feedforward Networks
04 Multi-layer Feedforward Networks
 
Naive bayesian classification
Naive bayesian classificationNaive bayesian classification
Naive bayesian classification
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 
Chapter 09 classification advanced
Chapter 09 classification advancedChapter 09 classification advanced
Chapter 09 classification advanced
 
. An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic .... An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic ...
 
Bayes Belief Networks
Bayes Belief NetworksBayes Belief Networks
Bayes Belief Networks
 

Andere mochten auch

Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)
Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)
Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)AMIDST Toolbox
 
Cost-Aware Virtual Machine Placement across Distributed Data Centers using Ba...
Cost-Aware Virtual Machine Placement acrossDistributed Data Centers using Ba...Cost-Aware Virtual Machine Placement acrossDistributed Data Centers using Ba...
Cost-Aware Virtual Machine Placement across Distributed Data Centers using Ba...Soodeh Farokhi
 
Module 5 Bayesian belief network modelling
Module 5 Bayesian belief network modellingModule 5 Bayesian belief network modelling
Module 5 Bayesian belief network modellingThink2Impact
 
Bayesian statistics
Bayesian statisticsBayesian statistics
Bayesian statisticsSagar Kamble
 
construction risk factor analysis: BBN Network
construction risk factor analysis: BBN Networkconstruction risk factor analysis: BBN Network
construction risk factor analysis: BBN NetworkShaswati Mohapatra
 
Bayesian Network 을 활용한 예측 분석
Bayesian Network 을 활용한 예측 분석Bayesian Network 을 활용한 예측 분석
Bayesian Network 을 활용한 예측 분석datasciencekorea
 
The Perfume Project - Part 1
The Perfume Project - Part 1The Perfume Project - Part 1
The Perfume Project - Part 1Ankit Jha
 
Controlling Project Performance using PDM - PSQT2005 - Ben Linders
Controlling Project Performance using PDM - PSQT2005 - Ben LindersControlling Project Performance using PDM - PSQT2005 - Ben Linders
Controlling Project Performance using PDM - PSQT2005 - Ben LindersBen Linders
 
Building Process Improvement Business Cases Using Bayesian Belief Networks an...
Building Process Improvement Business Cases Using Bayesian Belief Networks an...Building Process Improvement Business Cases Using Bayesian Belief Networks an...
Building Process Improvement Business Cases Using Bayesian Belief Networks an...Ben Linders
 

Andere mochten auch (10)

Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)
Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)
Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)
 
Cost-Aware Virtual Machine Placement across Distributed Data Centers using Ba...
Cost-Aware Virtual Machine Placement acrossDistributed Data Centers using Ba...Cost-Aware Virtual Machine Placement acrossDistributed Data Centers using Ba...
Cost-Aware Virtual Machine Placement across Distributed Data Centers using Ba...
 
Module 5 Bayesian belief network modelling
Module 5 Bayesian belief network modellingModule 5 Bayesian belief network modelling
Module 5 Bayesian belief network modelling
 
Bayesian statistics
Bayesian statisticsBayesian statistics
Bayesian statistics
 
Bayes Belief Network
Bayes Belief NetworkBayes Belief Network
Bayes Belief Network
 
construction risk factor analysis: BBN Network
construction risk factor analysis: BBN Networkconstruction risk factor analysis: BBN Network
construction risk factor analysis: BBN Network
 
Bayesian Network 을 활용한 예측 분석
Bayesian Network 을 활용한 예측 분석Bayesian Network 을 활용한 예측 분석
Bayesian Network 을 활용한 예측 분석
 
The Perfume Project - Part 1
The Perfume Project - Part 1The Perfume Project - Part 1
The Perfume Project - Part 1
 
Controlling Project Performance using PDM - PSQT2005 - Ben Linders
Controlling Project Performance using PDM - PSQT2005 - Ben LindersControlling Project Performance using PDM - PSQT2005 - Ben Linders
Controlling Project Performance using PDM - PSQT2005 - Ben Linders
 
Building Process Improvement Business Cases Using Bayesian Belief Networks an...
Building Process Improvement Business Cases Using Bayesian Belief Networks an...Building Process Improvement Business Cases Using Bayesian Belief Networks an...
Building Process Improvement Business Cases Using Bayesian Belief Networks an...
 

Ähnlich wie Probabilistic modeling in deep learning

Computational Biology, Part 4 Protein Coding Regions
Computational Biology, Part 4 Protein Coding RegionsComputational Biology, Part 4 Protein Coding Regions
Computational Biology, Part 4 Protein Coding Regionsbutest
 
Intro to modelling-supervised learning
Intro to modelling-supervised learningIntro to modelling-supervised learning
Intro to modelling-supervised learningJustin Sebok
 
Barga Data Science lecture 9
Barga Data Science lecture 9Barga Data Science lecture 9
Barga Data Science lecture 9Roger Barga
 
A talk by Dr. Mudassir Shabbir on "Ideas on an effective integration of mathe...
A talk by Dr. Mudassir Shabbir on "Ideas on an effective integration of mathe...A talk by Dr. Mudassir Shabbir on "Ideas on an effective integration of mathe...
A talk by Dr. Mudassir Shabbir on "Ideas on an effective integration of mathe...mudassir shabbir
 
ML crash course
ML crash courseML crash course
ML crash coursemikaelhuss
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018HJ van Veen
 
Deep Learning Class #0 - You Can Do It
Deep Learning Class #0 - You Can Do ItDeep Learning Class #0 - You Can Do It
Deep Learning Class #0 - You Can Do ItHolberton School
 
DL Classe 0 - You can do it
DL Classe 0 - You can do itDL Classe 0 - You can do it
DL Classe 0 - You can do itGregory Renard
 
Tutorial on Deep Generative Models
 Tutorial on Deep Generative Models Tutorial on Deep Generative Models
Tutorial on Deep Generative ModelsMLReview
 
My Three Ex’s: A Data Science Approach for Applied Machine Learning
My Three Ex’s: A Data Science Approach for Applied Machine LearningMy Three Ex’s: A Data Science Approach for Applied Machine Learning
My Three Ex’s: A Data Science Approach for Applied Machine LearningDaniel Tunkelang
 
notes as .ppt
notes as .pptnotes as .ppt
notes as .pptbutest
 
regression.pptx
regression.pptxregression.pptx
regression.pptxaneeshs28
 
Introduction to Bayesian Analysis in Python
Introduction to Bayesian Analysis in PythonIntroduction to Bayesian Analysis in Python
Introduction to Bayesian Analysis in PythonPeadar Coyle
 
Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...
Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...
Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...Sri Ambati
 
Topic Modelling: Tutorial on Usage and Applications
Topic Modelling: Tutorial on Usage and ApplicationsTopic Modelling: Tutorial on Usage and Applications
Topic Modelling: Tutorial on Usage and ApplicationsAyush Jain
 
Dataworkz odsc london 2018
Dataworkz odsc london 2018Dataworkz odsc london 2018
Dataworkz odsc london 2018Olaf de Leeuw
 
M08 BiasVarianceTradeoff
M08 BiasVarianceTradeoffM08 BiasVarianceTradeoff
M08 BiasVarianceTradeoffRaman Kannan
 

Ähnlich wie Probabilistic modeling in deep learning (20)

Computational Biology, Part 4 Protein Coding Regions
Computational Biology, Part 4 Protein Coding RegionsComputational Biology, Part 4 Protein Coding Regions
Computational Biology, Part 4 Protein Coding Regions
 
Intro to modelling-supervised learning
Intro to modelling-supervised learningIntro to modelling-supervised learning
Intro to modelling-supervised learning
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 
lec1.ppt
lec1.pptlec1.ppt
lec1.ppt
 
Barga Data Science lecture 9
Barga Data Science lecture 9Barga Data Science lecture 9
Barga Data Science lecture 9
 
A talk by Dr. Mudassir Shabbir on "Ideas on an effective integration of mathe...
A talk by Dr. Mudassir Shabbir on "Ideas on an effective integration of mathe...A talk by Dr. Mudassir Shabbir on "Ideas on an effective integration of mathe...
A talk by Dr. Mudassir Shabbir on "Ideas on an effective integration of mathe...
 
ML crash course
ML crash courseML crash course
ML crash course
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018
 
Deep Learning Class #0 - You Can Do It
Deep Learning Class #0 - You Can Do ItDeep Learning Class #0 - You Can Do It
Deep Learning Class #0 - You Can Do It
 
DL Classe 0 - You can do it
DL Classe 0 - You can do itDL Classe 0 - You can do it
DL Classe 0 - You can do it
 
Tutorial on Deep Generative Models
 Tutorial on Deep Generative Models Tutorial on Deep Generative Models
Tutorial on Deep Generative Models
 
My Three Ex’s: A Data Science Approach for Applied Machine Learning
My Three Ex’s: A Data Science Approach for Applied Machine LearningMy Three Ex’s: A Data Science Approach for Applied Machine Learning
My Three Ex’s: A Data Science Approach for Applied Machine Learning
 
notes as .ppt
notes as .pptnotes as .ppt
notes as .ppt
 
regression.pptx
regression.pptxregression.pptx
regression.pptx
 
Introduction to Bayesian Analysis in Python
Introduction to Bayesian Analysis in PythonIntroduction to Bayesian Analysis in Python
Introduction to Bayesian Analysis in Python
 
PREDICT 422 - Module 1.pptx
PREDICT 422 - Module 1.pptxPREDICT 422 - Module 1.pptx
PREDICT 422 - Module 1.pptx
 
Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...
Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...
Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...
 
Topic Modelling: Tutorial on Usage and Applications
Topic Modelling: Tutorial on Usage and ApplicationsTopic Modelling: Tutorial on Usage and Applications
Topic Modelling: Tutorial on Usage and Applications
 
Dataworkz odsc london 2018
Dataworkz odsc london 2018Dataworkz odsc london 2018
Dataworkz odsc london 2018
 
M08 BiasVarianceTradeoff
M08 BiasVarianceTradeoffM08 BiasVarianceTradeoff
M08 BiasVarianceTradeoff
 

Mehr von Denis Dus

Generative modeling with Convolutional Neural Networks
Generative modeling with Convolutional Neural NetworksGenerative modeling with Convolutional Neural Networks
Generative modeling with Convolutional Neural NetworksDenis Dus
 
Sequence prediction with TensorFlow
Sequence prediction with TensorFlowSequence prediction with TensorFlow
Sequence prediction with TensorFlowDenis Dus
 
Reproducibility and automation of machine learning process
Reproducibility and automation of machine learning processReproducibility and automation of machine learning process
Reproducibility and automation of machine learning processDenis Dus
 
Assignment of arbitrarily distributed random samples to the fixed probability...
Assignment of arbitrarily distributed random samples to the fixed probability...Assignment of arbitrarily distributed random samples to the fixed probability...
Assignment of arbitrarily distributed random samples to the fixed probability...Denis Dus
 
word2vec (часть 2)
word2vec (часть 2)word2vec (часть 2)
word2vec (часть 2)Denis Dus
 
word2vec (part 1)
word2vec (part 1)word2vec (part 1)
word2vec (part 1)Denis Dus
 
Using spark 1.2 with Java 8 and Cassandra
Using spark 1.2 with Java 8 and CassandraUsing spark 1.2 with Java 8 and Cassandra
Using spark 1.2 with Java 8 and CassandraDenis Dus
 

Mehr von Denis Dus (7)

Generative modeling with Convolutional Neural Networks
Generative modeling with Convolutional Neural NetworksGenerative modeling with Convolutional Neural Networks
Generative modeling with Convolutional Neural Networks
 
Sequence prediction with TensorFlow
Sequence prediction with TensorFlowSequence prediction with TensorFlow
Sequence prediction with TensorFlow
 
Reproducibility and automation of machine learning process
Reproducibility and automation of machine learning processReproducibility and automation of machine learning process
Reproducibility and automation of machine learning process
 
Assignment of arbitrarily distributed random samples to the fixed probability...
Assignment of arbitrarily distributed random samples to the fixed probability...Assignment of arbitrarily distributed random samples to the fixed probability...
Assignment of arbitrarily distributed random samples to the fixed probability...
 
word2vec (часть 2)
word2vec (часть 2)word2vec (часть 2)
word2vec (часть 2)
 
word2vec (part 1)
word2vec (part 1)word2vec (part 1)
word2vec (part 1)
 
Using spark 1.2 with Java 8 and Cassandra
Using spark 1.2 with Java 8 and CassandraUsing spark 1.2 with Java 8 and Cassandra
Using spark 1.2 with Java 8 and Cassandra
 

Kürzlich hochgeladen

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 

Kürzlich hochgeladen (20)

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 

Probabilistic modeling in deep learning

  • 1. Probabilistic modeling in Deep Learning Dzianis Dus Lead Data Scientist at InData Labs
  • 2. How we will spend the next 60 minutes? In thinking about the following topics:
  • 3. In thinking about the following topics: 1. What does “probabilistic modeling” means? How we will spend the next 60 minutes?
  • 4. In thinking about the following topics: 1. What does “probabilistic modeling” means? 2. Why it is cool (sometimes)? How we will spend the next 60 minutes?
  • 5. In thinking about the following topics: 1. What does “probabilistic modeling” means? 2. Why it is cool (sometimes)? 3. How we can use it to build: How we will spend the next 60 minutes?
  • 6. In thinking about the following topics: 1. What does “probabilistic modeling” means? 2. Why it is cool (sometimes)? 3. How we can use it to build: a. More robust and powerful models How we will spend the next 60 minutes?
  • 7. In thinking about the following topics: 1. What does “probabilistic modeling” means? 2. Why it is cool (sometimes)? 3. How we can use it to build: a. More robust and powerful models b. Models with predefined properties How we will spend the next 60 minutes?
  • 8. In thinking about the following topics: 1. What does “probabilistic modeling” means? 2. Why it is cool (sometimes)? 3. How we can use it to build: a. More robust and powerful models b. Models with predefined properties c. Models without overfitting (o_O) How we will spend the next 60 minutes?
  • 9. In thinking about the following topics: 1. What does “probabilistic modeling” means? 2. Why it is cool (sometimes)? 3. How we can use it to build: a. More robust and powerful models b. Models with predefined properties c. Models without overfitting (o_O) d. Infinite ensembles of models (o_O) How we will spend the next 60 minutes?
  • 10. In thinking about the following topics: 1. What does “probabilistic modeling” means? 2. Why it is cool (sometimes)? 3. How we can use it to build: a. More robust and powerful models b. Models with predefined properties c. Models without overfitting (o_O) d. Infinite ensembles of models (o_O) 4. Deep Learning How we will spend the next 60 minutes?
  • 11. Problem statement: Empirical way Suppose that we want to solve classical regression problem:
  • 12. Problem statement: Empirical way Suppose that we want to solve classical regression problem: Typical approach:
  • 13. Problem statement: Empirical way Suppose that we want to solve classical regression problem: Typical approach: 1. Choose functional family for F(...) 2. Choose appropriate loss function 3. Choose optimization algorithm 4. Minimize loss on (X, Y) 5. ...
  • 14. Problem statement: Empirical way Suppose that we want to solve classical regression problem: Typical approach: 1. Choose functional family for F(...) 2. Choose appropriate loss function 3. Choose optimization algorithm 4. Minimize loss on (X, Y) 5. ...
  • 15. Problem statement: Probabilistic way Define “probability model” (describes how your data was generated):
  • 16. Problem statement: Probabilistic way Define “probability model” (describes how your data was generated): Having model you can calculate “likelihood” of your data:
  • 17. Problem statement: Probabilistic way Define “probability model” (describes how your data was generated): Having model you can calculate “likelihood” of your data: We are working with i.i.d. data
  • 18. Problem statement: Probabilistic way Define “probability model” (describes how your data was generated): Having model you can calculate “likelihood” of your data: Sharing the same variance
  • 19. Problem statement: Probabilistic way Data log-likelihood: Maximum likelihood estimation:
  • 20. Problem statement: Probabilistic way Data log-likelihood: Maximum likelihood estimation: MSE Loss minimization
  • 21. Problem statement: Probabilistic way Data log-likelihood: Maximum likelihood estimation: MSE Loss minimization For i.i.d. data sharing the same variance!
  • 23. Problem statement: Probabilistic way Log-Likelihood maximization = Empirical loss minimization
  • 24. Problem statement: Probabilistic way 1. MAE minimization = likelihood maximization of i.i.d. Laplace-distributed variables Empirical loss minimizationLog-Likelihood maximization =
  • 25. Problem statement: Probabilistic way 1. MAE minimization = likelihood maximization of i.i.d. Laplace-distributed variables 2. For each empirically stated problem exists appropriate probability model Empirical loss minimizationLog-Likelihood maximization =
  • 26. Problem statement: Probabilistic way 1. MAE minimization = likelihood maximization of i.i.d. Laplace-distributed variables 2. For each empirically stated problem exists appropriate probability model 3. Empirical loss is often just a particular case of wider probability model Empirical loss minimizationLog-Likelihood maximization =
  • 27. Problem statement: Probabilistic way 1. MAE minimization = likelihood maximization of i.i.d. Laplace-distributed variables 2. For each empirically stated problem exists appropriate probability model 3. Empirical loss is often just a particular case of wider probability model 4. Wider model = wider opportunities! Empirical loss minimizationLog-Likelihood maximization =
  • 28. Probabilistic modeling: Wider opportunities for Flo Suppose that we have: 1. N unique users in the training set 2. For each user we’ve collected time series of user states (on daily basis): 3. For each user we’ve collected time series of cycles lengths: 4. We predict time series of lengths Y based on time series of states X
  • 29. Probabilistic modeling: Wider opportunities for Flo We want to maximize data likelihood:
  • 30. Probabilistic modeling: Wider opportunities for Flo We want to maximize data likelihood: Probability that user i will have cycle with length y at day j
  • 31. Probabilistic modeling: Wider opportunities for Flo We want to maximize data likelihood: Just another notationProbability that user i will have cycle with length y at day j
  • 32. Probabilistic modeling: Wider opportunities for Flo We want to maximize data likelihood: Cycle length of user i at day j has Gaussian distribution
  • 33. Probabilistic modeling: Wider opportunities for Flo We want to maximize data likelihood: Parameters of distribution at day j depends on model parameters and all features up to day j
  • 34. Probabilistic modeling: Wider opportunities for Flo We want to maximize data likelihood: Can be easily modeled with deep RNN!
  • 35. Probabilistic modeling: Wider opportunities for Flo We want to maximize data likelihood: Can be easily modeled with deep RNN! Note that:
  • 36. Probabilistic modeling: Wider opportunities for Flo We want to maximize data likelihood: Can be easily modeled with deep RNN! Note that: We don’t need any labels to predict variance!
  • 37. Probabilistic modeling: Wider opportunities for Flo Real life example:
  • 38. Parameter estimation theory Estimation theory is a branch of statistics that deals with estimating the values of parameters based on measured empirical data that has a random component. © Wikipedia
  • 39. Parameter estimation theory Estimation theory is a branch of statistics that deals with estimating the values of parameters based on measured empirical data that has a random component. © Wikipedia Commonly used estimators: ● Maximum likelihood estimator (MLE) - the Ugly ● Maximum a posteriori estimator (MAP) - the Bad ● Bayesian estimator - the Good
  • 40. Parameter estimation theory Estimation theory is a branch of statistics that deals with estimating the values of parameters based on measured empirical data that has a random component. © Wikipedia Commonly used estimators: ● Maximum likelihood estimator (MLE) - the Ugly ● Maximum a posteriori estimator (MAP) - the Bad ● Bayesian estimator - the Good We are here
  • 41. Parameter estimation theory Estimation theory is a branch of statistics that deals with estimating the values of parameters based on measured empirical data that has a random component. © Wikipedia Commonly used estimators: ● Maximum likelihood estimator (MLE) - the Ugly ● Maximum a posteriori estimator (MAP) - the Bad ● Bayesian estimator - the Good The way we go
  • 42. Maximum a posteriori estimator Until now, we’ve been talking about Maximum Likelihood Estimator:
  • 43. Maximum a posteriori estimator Until now, we’ve been talking about Maximum Likelihood Estimator: Now assume that prior distribution over parameters exists:
  • 44. Maximum a posteriori estimator Until now, we’ve been talking about Maximum Likelihood Estimator: Now assume that prior distribution over parameters exists: Then we can apply Bayes Rule:
  • 45. Maximum a posteriori estimator Until now, we’ve been talking about Maximum Likelihood Estimator: Now assume that prior distribution over parameters exists: Then we can apply Bayes Rule: Posterior distribution over model parameters
  • 46. Maximum a posteriori estimator Until now, we’ve been talking about Maximum Likelihood Estimator: Now assume that prior distribution over parameters exists: Then we can apply Bayes Rule: Data likelihood for specific parameters (could be modeled with Deep Network!)
  • 47. Maximum a posteriori estimator Until now, we’ve been talking about Maximum Likelihood Estimator: Now assume that prior distribution over parameters exists: Then we can apply Bayes Rule: Prior distribution over parameters (describes our prior knowledge or / and our desires for the model)
  • 48. Maximum a posteriori estimator Until now, we’ve been talking about Maximum Likelihood Estimator: Now assume that prior distribution over parameters exists: Then we can apply Bayes Rule: Bayesian evidence
  • 49. Maximum a posteriori estimator Until now, we’ve been talking about Maximum Likelihood Estimator: Now assume that prior distribution over parameters exists: Then we can apply Bayes Rule: Bayesian evidence A powerful method for model selection!
  • 50. Maximum a posteriori estimator Until now, we’ve been talking about Maximum Likelihood Estimator: Now assume that prior distribution over parameters exists: Then we can apply Bayes Rule: As a rule this integral is intractable :( (You can never integrate this)
  • 51. Maximum a posteriori estimator The core idea of Maximum a Posteriori Estimator:
  • 52. Maximum a posteriori estimator The core idea of Maximum a Posteriori Estimator: Doesn’t depend on model parameters
  • 53. Maximum a posteriori estimator The core idea of Maximum a Posteriori Estimator:
  • 54. Maximum a posteriori estimator The core idea of Maximum a Posteriori Estimator: The only (but powerful!) difference from MLE
  • 55. Maximum a posteriori estimator The core idea of Maximum a Posteriori Estimator: 1. MAP estimates model parameters as mode of posterior distribution
  • 56. Maximum a posteriori estimator The core idea of Maximum a Posteriori Estimator: 1. MAP estimates model parameters as mode of posterior distribution 2. MAP estimation with non-informative prior = MLE
  • 57. Maximum a posteriori estimator The core idea of Maximum a Posteriori Estimator: 1. MAP estimates model parameters as mode of posterior distribution 2. MAP estimation with non-informative prior = MLE 3. MAP restricts the search space of possible models
  • 58. Maximum a posteriori estimator The core idea of Maximum a Posteriori Estimator: 1. MAP estimates model parameters as mode of posterior distribution 2. MAP estimation with non-informative prior = MLE 3. MAP restricts the search space of possible models 4. With MAP you can put restrictions not only on model weights but also on many interactions inside the network
  • 59. Probabilistic modeling: Regularization Regularization - is a process of introducing additional information in order to solve an ill-posed problem or prevent overfitting. © Wikipedia
  • 60. Probabilistic modeling: Regularization Regularization - is a process of introducing additional information in order to solve an ill-posed problem or prevent overfitting. © Wikipedia Regularization - is a process of introducing additional information in order to restrict model to have predefined properties.
  • 61. Probabilistic modeling: Regularization Regularization - is a process of introducing additional information in order to solve an ill-posed problem or prevent overfitting. © Wikipedia Regularization - is a process of introducing additional information in order to restrict model to have predefined properties. It is closely connected to “prior distributions” on weights / activations / …
  • 62. Probabilistic modeling: Regularization Regularization - is a process of introducing additional information in order to solve an ill-posed problem or prevent overfitting. © Wikipedia Regularization - is a process of introducing additional information in order to restrict model to have predefined properties. It is closely connected to “prior distributions” on weights / activations / … … and to MAP estimation!
  • 63. Probabilistic modeling: Regularization Weights decay (or L2 regularization):
  • 64. Probabilistic modeling: Regularization Weights decay (or L2 regularization): Appropriate probability model: Model log-likelihood:
  • 67. Probabilistic modeling: Regularization Data log-likelihood (we’ve already calculated this)
  • 72. Probabilistic modeling: Regularization 1. Laplace distribution as a prior = L1 regularization
  • 73. Probabilistic modeling: Regularization 1. Laplace distribution as a prior = L1 regularization 2. It can be shown that Dropout is also a form of particular probability model …
  • 74. Probabilistic modeling: Regularization 1. Laplace distribution as a prior = L1 regularization 2. It can be shown that Dropout is also a form of particular probability model … 3. … a Bayesian one :) …
  • 75. Probabilistic modeling: Regularization 1. Laplace distribution as a prior = L1 regularization 2. It can be shown that Dropout is also a form of particular probability model … 3. … a Bayesian one :) … 4. … and therefore can be used not only as a regularization technique!
  • 76. Probabilistic modeling: Regularization 1. Laplace distribution as a prior = L1 regularization 2. It can be shown that Dropout is also a form of particular probability model … 3. … a Bayesian one :) … 4. … and therefore can be used not only as a regularization technique! 5. Do you want to pack your network weights into few kilobytes?
  • 77. Probabilistic modeling: Regularization 1. Laplace distribution as a prior = L1 regularization 2. It can be shown that Dropout is also a form of particular probability model … 3. … a Bayesian one :) … 4. … and therefore can be used not only as a regularization technique! 5. Do you want to pack your network weights into few kilobytes? 6. Ok, all you need - is MAP!
  • 78. Probabilistic modeling: Regularization 1. Laplace distribution as a prior = L1 regularization 2. It can be shown that Dropout is also a form of particular probability model … 3. … a Bayesian one :) … 4. … and therefore can be used not only as a regularization technique! 5. Do you want to pack your network weights into few kilobytes? 6. Ok, all you need - is MAP! MAP - is all you need!
  • 79. Weights packing: Empirical way Song Han and others - Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding (2015) Modern neural networks could be dramatically compressed:
  • 80. Weights packing: Soft-Weight Sharing 1. Define prior distribution of weights as Gaussian Mixture Model
  • 81. 1. Define prior distribution of weights as Gaussian Mixture Model Mixture of Gaussians = Weights packing: Soft-Weight Sharing
  • 82. 1. Define prior distribution of weights as Gaussian Mixture Model 2. For one of the Gaussian components force: Weights packing: Soft-Weight Sharing
  • 83. 1. Define prior distribution of weights as Gaussian Mixture Model 2. For one of the Gaussian components force: 3. Maybe define Gamma prior for variances (for numerical stability) Weights packing: Soft-Weight Sharing
  • 84. 1. Define prior distribution of weights as Gaussian Mixture Model 2. For one of the Gaussian components force: 3. Maybe define Gamma prior for variances (for numerical stability) 4. Just find MAP estimation for both model parameters and free mixture parameters! Weights packing: Soft-Weight Sharing
  • 85. Karen Ullrich - Soft Weight-Sharing For Neural Network Compression (2017) Weights packing: Soft-Weight Sharing
  • 86. Karen Ullrich - Soft Weight-Sharing For Neural Network Compression (2017) Weights packing: Soft-Weight Sharing
  • 87. Maximum a posteriori estimation 1. Pretty cool and powerful technique 2. You can build hierarchical models (put priors on priors of priors of…) 3. You can put priors on activations of layers (sparse autoencoders) 4. Leads to “Empirical Bayes” 5. Thinking how to restrict your model? Try to find appropriate prior!
  • 89. True Bayesian Modeling: Recap 1. Posterior could be easily found in case of conjugate distributions
  • 90. True Bayesian Modeling: Recap 1. Posterior could be easily found in case of conjugate distributions 2. But for most real life models denominator is intractable
  • 91. True Bayesian Modeling: Recap 1. Posterior could be easily found in case of conjugate distributions 2. But for most real life models denominator is intractable 3. In MAP denominator is totally ignored
  • 92. True Bayesian Modeling: Recap 1. Posterior could be easily found in case of conjugate distributions 2. But for most real life models denominator is intractable 3. In MAP denominator is totally ignored 4. Can we find a good approximation of the posterior?
  • 93. True Bayesian Modeling: Approximation Two main ideas:
  • 94. True Bayesian Modeling: Approximation Two main ideas: 1. MCMC (Monte Carlo Markov Chain)
  • 95. True Bayesian Modeling: Approximation Two main ideas: 1. MCMC (Monte Carlo Markov Chain) - a tricky one
  • 96. True Bayesian Modeling: Approximation Two main ideas: 1. MCMC (Monte Carlo Markov Chain) - a tricky one 2. Variational Inference
  • 97. True Bayesian Modeling: Approximation Two main ideas: 1. MCMC (Monte Carlo Markov Chain) - a tricky one 2. Variational Inference - a “Black Magic” one
  • 98. True Bayesian Modeling: Approximation Two main ideas: 1. MCMC (Monte Carlo Markov Chain) - a tricky one 2. Variational Inference - a “Black Magic” one Another ideas exists: 1. Monte Carlo Dropout 2. Stochastic gradient langevin dynamics 3. ...
  • 99. True Bayesian Modeling: MCMC 1. Key idea is to construct Markov Chain which has posterior distribution as its equilibrium distribution
  • 100. True Bayesian Modeling: MCMC 1. Key idea is to construct Markov Chain which has posterior distribution as its equilibrium distribution 2. Then you can burn-in Markov Chain (convergence to equilibrium) and then sample from the posterior distribution
  • 101. True Bayesian Modeling: MCMC 1. Key idea is to construct Markov Chain which has posterior distribution as its equilibrium distribution 2. Then you can burn-in Markov Chain (convergence to equilibrium) and then sample from the posterior distribution 3. Sounds tricky, but it is well-defined procedure
  • 102. True Bayesian Modeling: MCMC 1. Key idea is to construct Markov Chain which has posterior distribution as its equilibrium distribution 2. Then you can burn-in Markov Chain (convergence to equilibrium) and then sample from the posterior distribution 3. Sounds tricky, but it is well-defined procedure 4. PyMC3 = Bayesian Modeling and Probabilistic Machine Learning in Python
  • 103. True Bayesian Modeling: MCMC 1. Key idea is to construct Markov Chain which has posterior distribution as its equilibrium distribution 2. Then you can burn-in Markov Chain (convergence to equilibrium) and then sample from the posterior distribution 3. Sounds tricky, but it is well-defined procedure 4. PyMC3 = Bayesian Modeling and Probabilistic Machine Learning in Python 5. Unfortunately, it is not scalable
  • 104. True Bayesian Modeling: MCMC 1. Key idea is to construct Markov Chain which has posterior distribution as its equilibrium distribution 2. Then you can burn-in Markov Chain (convergence to equilibrium) and then sample from the posterior distribution 3. Sounds tricky, but it is well-defined procedure 4. PyMC3 = Bayesian Modeling and Probabilistic Machine Learning in Python 5. Unfortunately, it is not scalable 6. So, you can’t explicitly apply it to complex models (like Neural Networks)
  • 105. True Bayesian Modeling: MCMC 1. Key idea is to construct Markov Chain which has posterior distribution as its equilibrium distribution 2. Then you can burn-in Markov Chain (convergence to equilibrium) and then sample from the posterior distribution 3. Sounds tricky, but it is well-defined procedure 4. PyMC3 = Bayesian Modeling and Probabilistic Machine Learning in Python 5. Unfortunately, it is not scalable 6. So, you can’t explicitly apply it to complex models (like Neural Networks) 7. But implicit scaling is possible: Bayesian learning via stochastic gradient langevin dynamics (2011)
  • 106. True Bayesian Modeling: Variational Inference True posterior:
  • 107. True Bayesian Modeling: Variational Inference True posterior: Modeled with Deep Neural Network
  • 108. True Bayesian Modeling: Variational Inference True posterior: Intractable integral :(
  • 109. True Bayesian Modeling: Variational Inference True posterior: Let’s find good approximation:
  • 110. True Bayesian Modeling: Variational Inference True posterior: Let’s find good approximation:
  • 111. True Bayesian Modeling: Variational Inference True posterior: Let’s find good approximation: Explicitly define distribution family for approximation (e.g. multivariate gaussian)
  • 112. True Bayesian Modeling: Variational Inference True posterior: Let’s find good approximation: Variational parameters (e.g. mean vector, covariance matrix)
  • 113. True Bayesian Modeling: Variational Inference True posterior: Let’s find good approximation: Speaking mathematically:
  • 114. True Bayesian Modeling: Variational Inference True posterior: Let’s find good approximation: Speaking mathematically: Kullback-Leibler divergence (measure of distributions dissimilarity)
  • 115. True Bayesian Modeling: Variational Inference True posterior: Let’s find good approximation: Speaking mathematically: True posterior is unknown :(
  • 116. Achtung! A lot of math is coming!
  • 117. True Bayesian Modeling: Variational Inference
  • 118. True Bayesian Modeling: Variational Inference
  • 119. True Bayesian Modeling: Variational Inference
  • 120. True Bayesian Modeling: Variational Inference Rewrite this using Bayes rule:
  • 121. True Bayesian Modeling: Variational Inference
  • 122. True Bayesian Modeling: Variational Inference Doesn’t depend on theta! (After integration) Parameters of integration
  • 123. True Bayesian Modeling: Variational Inference So, it is a constant!
  • 124. True Bayesian Modeling: Variational Inference
  • 125. True Bayesian Modeling: Variational Inference Has no effect on minimization problem
  • 126. True Bayesian Modeling: Variational Inference
  • 127. True Bayesian Modeling: Variational Inference Group this together
  • 128. True Bayesian Modeling: Variational Inference
  • 129. True Bayesian Modeling: Variational Inference Multiply by (-1)
  • 130. True Bayesian Modeling: Variational Inference
  • 131. True Bayesian Modeling: Variational Inference KL divergence
  • 132. True Bayesian Modeling: Variational Inference
  • 133. True Bayesian Modeling: Variational Inference It is an expectation over q(...)
  • 134. True Bayesian Modeling: Variational Inference
  • 135. True Bayesian Modeling: Variational Inference Equivalent problems!
  • 136. True Bayesian Modeling: Variational Inference Equivalent problems! Likelihood of your data (your Neural Network works here!)
  • 137. True Bayesian Modeling: Variational Inference Equivalent problems! Prior on network weights (you define this!)
  • 138. True Bayesian Modeling: Variational Inference Equivalent problems! Approximate posterior (you define the form of this!)
  • 139. True Bayesian Modeling: Variational Inference Equivalent problems! We want to optimize this wrt of approximate posterior parameters!
  • 140. True Bayesian Modeling: Variational Inference Equivalent problems! We need to calculate the gradient of this
  • 141. True Bayesian Modeling: Variational Inference Gradient calculation:
  • 142. True Bayesian Modeling: Variational Inference Gradient calculation:
  • 143. True Bayesian Modeling: Variational Inference Gradient calculation: Rewrite this as expectation (for convenience)
  • 144. True Bayesian Modeling: Variational Inference Gradient calculation:
  • 145. True Bayesian Modeling: Variational Inference Gradient calculation: Ooops...
  • 146. True Bayesian Modeling: Variational Inference
  • 147. True Bayesian Modeling: Variational Inference Modeled with Deep Network!
  • 148. True Bayesian Modeling: Variational Inference This integral is intractable too :( (God damn!)
  • 149. True Bayesian Modeling: Variational Inference If it was just q(...) then we can calculate approximation using Monte Carlo method!
  • 150. True Bayesian Modeling: Variational Inference
  • 151. True Bayesian Modeling: Variational Inference This is just = 1!
  • 152. True Bayesian Modeling: Variational Inference This is gradient of log(q(...))!
  • 153. True Bayesian Modeling: Variational Inference
  • 154. True Bayesian Modeling: Variational Inference
  • 155. True Bayesian Modeling: Variational Inference Luke, log derivative trick!
  • 156. True Bayesian Modeling: Variational Inference Luke, log derivative trick!
  • 157. True Bayesian Modeling: Variational Inference Can be approximated with Monte Carlo! Luke, log derivative trick!
  • 158. True Bayesian Modeling: Variational Inference Luke, log derivative trick!
  • 159. Bayesian Networks: Step by step Define functional family for approximate posterior (e.g. Gaussian):
  • 160. Bayesian Networks: Step by step Define functional family for approximate posterior (e.g. Gaussian): Solve optimization problem (with doubly stochastic gradient ascend):
  • 161. Bayesian Networks: Step by step Define functional family for approximate posterior (e.g. Gaussian): Solve optimization problem (with doubly stochastic gradient ascend): Having approximate posterior you can sample network weights (as much as you want)!
  • 162. Bayesian Networks: Pros and Cons As a result you have: 1. Infinite ensemble of Neural Networks! 2. No overfit problem (in classical sense)! 3. No adversarial examples problem! 4. Measure of prediction confidence! 5. ...
  • 163. Bayesian Networks: Pros and Cons As a result you have: 1. Infinite ensemble of Neural Networks! 2. No overfit problem (in classical sense)! 3. No adversarial examples problem! 4. Measure of prediction confidence! 5. ... No free hunch: 1. A lot of work is still hidden in “scalability” and “convergence”! 2. Very (very!) expensive predictions!
  • 164. Bayesian Networks Examples: BRNN Meire Fortunato and others - Bayesian Recurrent Neural Networks (2017)
  • 165. Bayesian Networks Examples: SegNet Alex Kendall and others - Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding (2016)
  • 166. Bayesian Networks Examples: SegNet Alex Kendall and others - Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding (2016)
  • 167. Bayesian Networks Examples: SegNet Alex Kendall and others - Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding (2016)
  • 168. Bayesian Networks Examples: SegNet Alex Kendall and others - Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding (2016)
  • 169. Bayesian Networks Examples: SegNet Alex Kendall and others - Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding (2016)
  • 170. Bayesian Networks in (near) Production: UBER Lingxue Zhu - Deep and Confident Prediction for Time Series at Uber (2017) How it works: 1. LSTM network 2. Monte Carlo Dropout 3. Daily complete trips prediction 4. Anomaly detection for various metrics
  • 171. Bayesian Networks in (near) Production: UBER Lingxue Zhu - Deep and Confident Prediction for Time Series at Uber (2017) How it works: 1. LSTM network 2. Monte Carlo Dropout 3. Daily complete trips prediction 4. Anomaly detection for various metrics
  • 172. Bayesian Networks in (near) Production: UBER Lingxue Zhu - Deep and Confident Prediction for Time Series at Uber (2017) How it works: 1. LSTM network 2. Monte Carlo Dropout 3. Daily complete trips prediction 4. Anomaly detection for various metrics
  • 173. Bayesian Networks in (near) Production: Flo Predicted distributions of cycle length for 40 independent users: Switched to Empirical Bayes for now.
  • 174. Speech Summary 1. Probabilistic modeling is a powerful tool with strong math background
  • 175. Speech Summary 1. Probabilistic modeling is a powerful tool with strong math background 2. Many techniques are currently not widely used in Deep Learning
  • 176. Speech Summary 1. Probabilistic modeling is a powerful tool with strong math background 2. Many techniques are currently not widely used in Deep Learning 3. You can improve many aspects of your model using the same framework
  • 177. Speech Summary 1. Probabilistic modeling is a powerful tool with strong math background 2. Many techniques are currently not widely used in Deep Learning 3. You can improve many aspects of your model using the same framework 4. Scalability, stability of convergence and inference cost are main constraints
  • 178. Speech Summary 1. Probabilistic modeling is a powerful tool with strong math background 2. Many techniques are currently not widely used in Deep Learning 3. You can improve many aspects of your model using the same framework 4. Scalability, stability of convergence and inference cost are main constraints 5. The future of Deep Learning looks Bayesian...
  • 179. Speech Summary 1. Probabilistic modeling is a powerful tool with strong math background 2. Many techniques are currently not widely used in Deep Learning 3. You can improve many aspects of your model using the same framework 4. Scalability, stability of convergence and inference cost are main constraints 5. The future of Deep Learning looks Bayesian... … (for the moment, for me)
  • 180. Thank you for your ! I hope, you have a lot of questions :) (attention) Dzianis Dus Lead Data Scientist at InData Labs