SlideShare ist ein Scribd-Unternehmen logo
1 von 60
Bayesian Neural Network
Natan Katz
Natan.katz@gmail.com
Agenda
• Short Introduction to Bayesian Inference
• Variational Inference
• Bayesian Neural Network
• Numerical Methods
• MNIST Example
Bayesian Inference
Bayesian Inference
The inputs:
Evidence – A Sample of observations (numbers, categories, vectors, images)
Hypothesis - An assumption about the prob. structure that creates the sample
Objective :
We wish to learn the optimal parameters of this distribution.
• This probability is called Posterior .
• We wish to find the optimal parameters for P(H|E)
• Remark In many books it is called MAP (Maximum A postriori Estimation)
Let’s Formulate
Z- R.V. that represents the hypothesis
X- R.V. that represents the evidence
Bayes formula:
P(Z|X) =
𝑃(𝑍,𝑋)
𝑃(𝑋)
Let’s Formulate (Cont.)
𝑃𝑟(Z) –Prior (The parameters’ distribution according to our belief)
𝑃𝑙(X|Z) –Likelihood (How likely is the sample given the parameters)
P(Z|X) =
𝑃𝑟(z) 𝑃 𝑙(x|z)
𝑃(𝑋)
Bayesian inference is therefore about working with the RHS terms.
In some case studying the denominator is intractable or extremely
difficult to calculate.
Example -GMM
We have K Gaussians with known variance σ
Draw μ 𝑘 ~ 𝑁 0, τ (τ is positive) from the prior
For each sample j =1…n
𝑧𝑗 ~Cat (1/K,1/K…1/K)
𝑥𝑗 ~ 𝑁(μ 𝑧 𝑗
, σ)
p(𝑥1….𝑛) = μ1:𝑘 𝑙=1
𝐾
𝑃(μ𝑙) 𝑖=1
𝑛
𝑧 𝑗
𝑝( 𝑧𝑗) P(𝑥𝑖|μ 𝑧 𝑗
) => 𝑃𝑟𝑒𝑡𝑡𝑦 𝑆ℎ𝑖𝑡
Some Good news
P(Z|X) =
𝑃𝑟(z) 𝑃 𝑙(x|z)
𝑃(𝑋)
• We wish to learn Z
• There is no Z in the denominator
=> P(Z|X) α 𝑃𝑙 𝑋 𝑍 𝑃𝑟(𝑍)
Solutions
Until 1999
Mostly numerical sampling:
• Metropolis Hastings
• RBM
Variational
Inference
“AN INTRODUCTION TO VARIATIONAL METHODS FOR GRAPHICAL MODELS”
11
VI – Algorithm Overview
• Rather a numerical sampling method we provide an analytical one:
1. We define a distribution family Q(Z) (bias-variance trade off)
2. We minimize KL divergence min KL(Q(z)|| P(Z|X))
log(P(X)) = 𝐸 𝑄 [log P(x, Z)] − 𝐸 𝑄 [log Q(Z)] + KL(Q(Z)||P(Z|X))
ELBO-Evidence Lower Bound
• Maximizing ELBO =>minimizing KL
𝐸𝐿𝐵𝑂 = 𝐸 𝑄[log P(X, Z)] − 𝐸 𝑄 [log Q(Z)] = 𝑄𝐿𝑜𝑔(
𝑃(𝑋,𝑍)
𝑄(𝑍)
)= J(Q)- Euler Lagrange
MFT (Mean Field Theory)
Scientific “approval”
13
What Deep Learning
doesn’t do
A DL Scenario
• We train a CNN to identify images (men versus women)
• Our accuracy can be amazing (98-99%)
Pretty cool
Let’s get Cruel
• We offer the model an image of a basketball
• The model outputs “man” or “woman”
Why is that?
Mathematical Observation
We trained a function F such that
F : {space of images}->{“man”,”woman”}
Statistical Observation
Basketball image is out of our training data
Anecdotes
Image (Uri Itay)
• Researchers trained a network to classify tanks and trees.
After using 200 images (100 of each kind 50 train 50 test), the test accuracy
was 100% .
As they took it to the Pentagon it began to miss. The reason was that all the
tank images were taken in cloudy days whereas trees in sunny.
Text
• We can see in text problems many cases where rather finding latent
phenomena networks use words as their anchor.
A plausible corollary
When we train a DL model:
• We hardly ever know what the model learned
• Models cannot “report” about their uncertainties
Is it crucial ?
• Consider an engine that decides upon AI whether a tumor is
malignant or benign
• Drug treatment upon medical record
• Actions that are taken by an autonomous vehicle
• High frequency trading
What can we do?
• DL models are trained to the optimal weights
• What if rather training weights pointwise, we ill train
weights’ distributions?
The Inference
• For each data pair (x,y) we create mean and variance
• This variance will reflect model’s uncertainty
• DL approach – Do Dropout in inference
Uncertainty Types
Epistemic Uncertainty :
Episteme= Knowledge
Uncertainty that theoretically we can know but we don’t:
• Model structure issues
• Absence of data
We can use the notion “reducible” too
Uncertainty Types
Aleatoric Uncertainty :
Aleator = Dice Player
Uncertainty that we cannot solve:
• The stochasticity of a dice
• Noisy labels
We can use the notion “irreducible” too
Bayesian Neural Network
BNN-Training
• We have a neural network
• We place a prior distribution P over the weights W
• For data D={ (X,Y)}
For measuring uncertainties, we use the posterior Distribution
DL Vs. BNN
DL
1. Training using a loss that is related to prediction probability P(Y|X,W)
2. The weights W are trained point-wise with MLE
Bayesian NN
1. Training using a loss that is related the posterior probability P(W|X,Y)
2. We train weights’ distribution
BNN-Inference
Inference
We assume prior knowledge on the weights’ distribution π
As in any NN we get an input x’ and aim to predict y’ :
P(y’| x’) = 𝑃 y’ 𝑥′
, 𝑤 𝑃 𝑤 𝐷 𝑑𝑤
This can be rewritten as:
P(y’| x’) =𝐸 𝑃(𝑤|𝐷) 𝑃 y’ 𝑥′
, 𝑤
D={(X,Y)}
Measuring Uncertainty
• In the inference, given a data point 𝑥∗
• Sample weights W 𝑛 𝑡𝑖𝑚𝑒𝑠
• Calculate its statistics
E[f(𝑥∗
,w)]= 𝑖=1
𝑛
𝑓(𝑥∗
, 𝑤𝑖)
V([f(𝑥∗
,w)] =E𝑓(𝑥∗
,w)2
- E[f(𝑥∗
,w)]2
W –r.v. which 𝑤𝑖 is its samples
Common tools to obtain Posterior Dist.
1. Variational Inference
2. MCMC –Sampling (Metropolis –Hastings, Gibbs)
3. HMC
4. SGLD
Metropolis Hastings
• MCMC sampling algorithm
• The main idea is that we pick samples upon pdf comparisons:
At each step we accept or randomize a sample upon the previous
sample and decide to accept or reject
• Unbiased, Huge variance and very slow (iterate over the entire data)
• Great History
What is Hamiltonian?
• A physical operator that measures energy of a dynamical system
Two sets of coordinates
q -State coordinates
p- Momentum
H(p, q) =U(q) +K(p)
U(q) = log[π 𝑞 𝐿(𝑞|𝐷)] K(P)=
𝑝 2
2𝑚
U-Potential energy, K –Kinetic
𝑑𝐻
𝑑𝑝
= 𝑞 ,
𝑑𝐻
𝑑𝑞
= - 𝑝
Hamiltonian Monte Carlo
• Hamiltonians offer a deterministic vector field (with trajectories….)
• If we set a Hamiltonian depended distribution, we can use this
property for sampling
P(x,y) = 𝑒−𝐻(𝑥,𝑦)
Hybrid - MC
• We have the “state space” x
• We can add “momentum” and use Hamiltonian mechanism
Leap Frog Algorithm
We set a time interval δ, For each step i :
1. 𝑃𝑖(t+0.5 δ) =𝑃𝑖(t) – (δ/2)
𝑑𝑈
𝑑𝑞(𝑡)
2 𝑄𝑖(t+ δ ) = 𝑄𝑖(t) + δ
𝑑𝐾
𝑑𝑝(𝑡+0.5δ)
3 𝑃𝑖(t+ δ) = 𝑃𝑖(t+0.5 δ) - (δ/2)
𝑑𝑈
𝑑𝑞(𝑡+δ)
𝑄𝑖
𝑄
HMC
Algorithm (Neal 1995, 2012, Duane 1987)
1. Draw 𝑥0 from our prior
Draw 𝑝0 from standard normal dist.
2. Perform L steps of leapfrog
3 Pick the 𝑥 𝑡 following M.H.
HMC –Pros & Cons
Pros
• It takes points from a wider domains thus we can describe the
distribution better
• It may take points with lower density
• Faster than MCMC
Cons
• It may suffer from low energy barrier
• No minibatch –Not nice
• It has to calculate gradients for the entire data!!! Bad
What do we need then?
• A tool that allows sub-sampling
• Fewer Gradients
• Keen knowledge about extremums and escape rooms
Stochastic Gradient
Langevin Dynamics
(SGLD)
Langevin Equation
Langevin Equation describes the motion of pollen grain in water:
F -γ𝑣 𝑡 +ξ 𝑡=0 ξ 𝑡 ~N(0,t)
ξ 𝑡 is a Brownian Force- The collisions with the water molecules
F - External forces
This equation has an equilibrium solution which is our posterior
distribution
Langevin Equation
Let’s use the following:
F=𝛻𝐸 𝑣 𝑡 =
𝑑𝑋
𝑑𝑇
The eq in its discrete form becomes:
𝑥𝑡+1 = 𝑥𝑡 +
dt
γ
ξ 𝑡 + 𝛻𝐸
dt
γ
(looks familiar doesn’t it?)
Langevin Equation
Some more re-write:
𝑥𝑡+1 = 𝑥𝑡 + 𝜖 𝑡 𝛻𝐸 +ε 𝑡 ε 𝑡 -Stochastic term
Consider this term
Are we in a better situation ?
Robbins & Monro (Stoch. Approx. 1951)
• Let F a function and θ a number
• There exists a unique solution :
F(𝑥∗ ) = θ
F - is unknown
Y - A measurable r.v.
E[Y(x)] = F(x)
Robbins &
Monro
(cont)
The following algorithm converges to 𝑥∗ :
𝑋 𝑁+1 = 𝑋 𝑁 +α 𝑁 (𝑌𝑁 − θ )
Back to Langevin
𝑥 𝑡+1 = 𝑥 𝑡 + 𝜖 𝑡 𝛻𝐸 +ε 𝑡
𝛻𝐸 𝑚𝑏=𝛻𝐸 + ε 𝑡
𝑥 𝑡+1 = 𝑥 𝑡 + 𝜖 𝑡 𝛻𝐸 𝑚𝑏
Δ 𝜃𝑡 =ε 𝑡(𝛻log 𝑝 𝜃𝑡 +
𝑁
𝑛 𝑖=1
𝑁
𝛻log 𝑝 𝑥𝑖|𝜃𝑡 )
We are almost there
• This eq converges to an optimal solution (MAP).
• We need a solution of SDE (probability)
• Let’s add a stochastic term
Δ 𝜃𝑡 =ε 𝑡(𝛻log 𝑝 𝜃𝑡 +
𝑁
𝑛 𝑖=1
𝑁
𝛻log 𝑝 𝑥𝑖|𝜃𝑡 ) + η 𝑡 η 𝑡~N(0,σ)
Variance Analysis
ε 𝑡 - Follow R&M rules
How big is σ ?
Bigger than ε 𝑡 *V(𝛻)
As t->∞ the equation must become Langevin.
THE variance of η must be therefore bigger than ε 𝑡 *V(𝛻)
We do the
following:
Finally, Example
https://towardsdatascience.com/making-your-neural-network-say-i-
dont-know-bayesian-nns-using-pyro-and-pytorch-b1c24e6ab8cd
Problem’s Framework
• MNIST CNN model
• MNIST SOTA ~99.8%
The Experiment
• Training a BNN – using VI (small amount, of epochs)
• Set a regular decision law – Take the max score of each digit
=>Accuracy ~88%
Allowing the Network to refuse
• For each image:
• Sample 100 networks
• We obtain 100 outputs per image
• We have 10 digits each with 100 scores
• If the median of these 100 scores>0.2 we take
(Indeed, we can accept more than one result)
Random Image
Summary
• Accuracy 96%
• Refuse 12.5%
• Random images 95% have been refused
Thanks!!
My process
• https://wjmaddox.github.io/assets/BNN_tutorial_CILVR.pdf
• https://arxiv.org/pdf/2007.06823.pdf
• https://towardsdatascience.com/what-uncertainties-tell-you-in-bayesian-neural-networks-6fbd5f85648e
• https://medium.com/@uriitai/augmentation-and-groups-theory-795c287fec3f
• https://github.com/paraschopra/bayesian-neural-network-mnist/blob/master/bnn.ipynb
• https://towardsdatascience.com/making-your-neural-network-say-i-dont-know-bayesian-nns-using-pyro-
and-pytorch-b1c24e6ab8cd
• http://www.stats.ox.ac.uk/~teh/research/compstats/WelTeh2011a.pdf
• https://arxiv.org/pdf/1206.1901.pdf
• http://cgl.elte.hu/~racz/Stoch-diff-eq.pdf
• https://arxiv.org/ftp/arxiv/papers/1103/1103.1184.pdf
• https://henripal.github.io/blog/langevin
• https://www.cs.princeton.edu/courses/archive/fall11/cos597C/lectures/variational-inference-i.pdf

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
 
The Forward-Forward Algorithm
The Forward-Forward AlgorithmThe Forward-Forward Algorithm
The Forward-Forward Algorithm
 
Modeling uncertainty in deep learning
Modeling uncertainty in deep learning Modeling uncertainty in deep learning
Modeling uncertainty in deep learning
 
Naive Bayes
Naive BayesNaive Bayes
Naive Bayes
 
Graph Neural Network - Introduction
Graph Neural Network - IntroductionGraph Neural Network - Introduction
Graph Neural Network - Introduction
 
Understanding RNN and LSTM
Understanding RNN and LSTMUnderstanding RNN and LSTM
Understanding RNN and LSTM
 
META-LEARNING.pptx
META-LEARNING.pptxMETA-LEARNING.pptx
META-LEARNING.pptx
 
Introduction to artificial neural network
Introduction to artificial neural networkIntroduction to artificial neural network
Introduction to artificial neural network
 
Bayesian inference
Bayesian inferenceBayesian inference
Bayesian inference
 
Convolutional neural network
Convolutional neural network Convolutional neural network
Convolutional neural network
 
Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...
Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...
Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...
 
Introduction to Neural Networks
Introduction to Neural NetworksIntroduction to Neural Networks
Introduction to Neural Networks
 
Meta learning tutorial
Meta learning tutorialMeta learning tutorial
Meta learning tutorial
 
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
 
Simple Introduction to AutoEncoder
Simple Introduction to AutoEncoderSimple Introduction to AutoEncoder
Simple Introduction to AutoEncoder
 
Neural Networks: Multilayer Perceptron
Neural Networks: Multilayer PerceptronNeural Networks: Multilayer Perceptron
Neural Networks: Multilayer Perceptron
 
Attention is all you need (UPC Reading Group 2018, by Santi Pascual)
Attention is all you need (UPC Reading Group 2018, by Santi Pascual)Attention is all you need (UPC Reading Group 2018, by Santi Pascual)
Attention is all you need (UPC Reading Group 2018, by Santi Pascual)
 
Feature selection
Feature selectionFeature selection
Feature selection
 
Autoencoders in Deep Learning
Autoencoders in Deep LearningAutoencoders in Deep Learning
Autoencoders in Deep Learning
 
Forward-Forward Algorithm
Forward-Forward AlgorithmForward-Forward Algorithm
Forward-Forward Algorithm
 

Ähnlich wie Bayesian Neural Networks

15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf
15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf
15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf
McSwathi
 
Probability distribution
Probability distributionProbability distribution
Probability distribution
Ranjan Kumar
 

Ähnlich wie Bayesian Neural Networks (20)

GAN for Bayesian Inference objectives
GAN for Bayesian Inference objectivesGAN for Bayesian Inference objectives
GAN for Bayesian Inference objectives
 
Variational inference
Variational inference  Variational inference
Variational inference
 
SGLD Berlin ML GROUP
SGLD Berlin ML GROUPSGLD Berlin ML GROUP
SGLD Berlin ML GROUP
 
Machine learning mathematicals.pdf
Machine learning mathematicals.pdfMachine learning mathematicals.pdf
Machine learning mathematicals.pdf
 
Into to prob_prog_hari (2)
Into to prob_prog_hari (2)Into to prob_prog_hari (2)
Into to prob_prog_hari (2)
 
Into to prob_prog_hari
Into to prob_prog_hariInto to prob_prog_hari
Into to prob_prog_hari
 
Statistics-2 : Elements of Inference
Statistics-2 : Elements of InferenceStatistics-2 : Elements of Inference
Statistics-2 : Elements of Inference
 
L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learning
 
Model Selection and Validation
Model Selection and ValidationModel Selection and Validation
Model Selection and Validation
 
(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks
 
ngboost.pptx
ngboost.pptxngboost.pptx
ngboost.pptx
 
Lec13_Bayes.pptx
Lec13_Bayes.pptxLec13_Bayes.pptx
Lec13_Bayes.pptx
 
15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf
15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf
15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf
 
Lec05.pptx
Lec05.pptxLec05.pptx
Lec05.pptx
 
03 Data Mining Techniques
03 Data Mining Techniques03 Data Mining Techniques
03 Data Mining Techniques
 
Neural ODE
Neural ODENeural ODE
Neural ODE
 
Dive into the Data
Dive into the DataDive into the Data
Dive into the Data
 
Data classification sammer
Data classification sammer Data classification sammer
Data classification sammer
 
Probability distribution
Probability distributionProbability distribution
Probability distribution
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 

Mehr von Natan Katz (13)

final_v.pptx
final_v.pptxfinal_v.pptx
final_v.pptx
 
AI for PM.pptx
AI for PM.pptxAI for PM.pptx
AI for PM.pptx
 
Ancestry, Anecdotes & Avanan -DL for Amateurs
Ancestry, Anecdotes & Avanan -DL for Amateurs Ancestry, Anecdotes & Avanan -DL for Amateurs
Ancestry, Anecdotes & Avanan -DL for Amateurs
 
Cyn meetup
Cyn meetupCyn meetup
Cyn meetup
 
Finalver
FinalverFinalver
Finalver
 
Foundation of KL Divergence
Foundation of KL DivergenceFoundation of KL Divergence
Foundation of KL Divergence
 
Quant2a
Quant2aQuant2a
Quant2a
 
Bismark
BismarkBismark
Bismark
 
Deep VI with_beta_likelihood
Deep VI with_beta_likelihoodDeep VI with_beta_likelihood
Deep VI with_beta_likelihood
 
NICE Research -Variational inference project
NICE Research -Variational inference projectNICE Research -Variational inference project
NICE Research -Variational inference project
 
NICE Implementations of Variational Inference
NICE Implementations of Variational Inference NICE Implementations of Variational Inference
NICE Implementations of Variational Inference
 
Ucb
UcbUcb
Ucb
 
Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement Learning
 

Kürzlich hochgeladen

Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 

Kürzlich hochgeladen (20)

Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 

Bayesian Neural Networks

  • 1. Bayesian Neural Network Natan Katz Natan.katz@gmail.com
  • 2. Agenda • Short Introduction to Bayesian Inference • Variational Inference • Bayesian Neural Network • Numerical Methods • MNIST Example
  • 4. Bayesian Inference The inputs: Evidence – A Sample of observations (numbers, categories, vectors, images) Hypothesis - An assumption about the prob. structure that creates the sample Objective : We wish to learn the optimal parameters of this distribution. • This probability is called Posterior . • We wish to find the optimal parameters for P(H|E) • Remark In many books it is called MAP (Maximum A postriori Estimation)
  • 5. Let’s Formulate Z- R.V. that represents the hypothesis X- R.V. that represents the evidence Bayes formula: P(Z|X) = 𝑃(𝑍,𝑋) 𝑃(𝑋)
  • 6. Let’s Formulate (Cont.) 𝑃𝑟(Z) –Prior (The parameters’ distribution according to our belief) 𝑃𝑙(X|Z) –Likelihood (How likely is the sample given the parameters) P(Z|X) = 𝑃𝑟(z) 𝑃 𝑙(x|z) 𝑃(𝑋) Bayesian inference is therefore about working with the RHS terms. In some case studying the denominator is intractable or extremely difficult to calculate.
  • 7. Example -GMM We have K Gaussians with known variance σ Draw μ 𝑘 ~ 𝑁 0, τ (τ is positive) from the prior For each sample j =1…n 𝑧𝑗 ~Cat (1/K,1/K…1/K) 𝑥𝑗 ~ 𝑁(μ 𝑧 𝑗 , σ) p(𝑥1….𝑛) = μ1:𝑘 𝑙=1 𝐾 𝑃(μ𝑙) 𝑖=1 𝑛 𝑧 𝑗 𝑝( 𝑧𝑗) P(𝑥𝑖|μ 𝑧 𝑗 ) => 𝑃𝑟𝑒𝑡𝑡𝑦 𝑆ℎ𝑖𝑡
  • 8. Some Good news P(Z|X) = 𝑃𝑟(z) 𝑃 𝑙(x|z) 𝑃(𝑋) • We wish to learn Z • There is no Z in the denominator => P(Z|X) α 𝑃𝑙 𝑋 𝑍 𝑃𝑟(𝑍)
  • 9. Solutions Until 1999 Mostly numerical sampling: • Metropolis Hastings • RBM
  • 11. “AN INTRODUCTION TO VARIATIONAL METHODS FOR GRAPHICAL MODELS” 11
  • 12. VI – Algorithm Overview • Rather a numerical sampling method we provide an analytical one: 1. We define a distribution family Q(Z) (bias-variance trade off) 2. We minimize KL divergence min KL(Q(z)|| P(Z|X)) log(P(X)) = 𝐸 𝑄 [log P(x, Z)] − 𝐸 𝑄 [log Q(Z)] + KL(Q(Z)||P(Z|X)) ELBO-Evidence Lower Bound • Maximizing ELBO =>minimizing KL
  • 13. 𝐸𝐿𝐵𝑂 = 𝐸 𝑄[log P(X, Z)] − 𝐸 𝑄 [log Q(Z)] = 𝑄𝐿𝑜𝑔( 𝑃(𝑋,𝑍) 𝑄(𝑍) )= J(Q)- Euler Lagrange MFT (Mean Field Theory) Scientific “approval” 13
  • 15. A DL Scenario • We train a CNN to identify images (men versus women) • Our accuracy can be amazing (98-99%) Pretty cool
  • 16. Let’s get Cruel • We offer the model an image of a basketball • The model outputs “man” or “woman”
  • 17. Why is that? Mathematical Observation We trained a function F such that F : {space of images}->{“man”,”woman”} Statistical Observation Basketball image is out of our training data
  • 18. Anecdotes Image (Uri Itay) • Researchers trained a network to classify tanks and trees. After using 200 images (100 of each kind 50 train 50 test), the test accuracy was 100% . As they took it to the Pentagon it began to miss. The reason was that all the tank images were taken in cloudy days whereas trees in sunny. Text • We can see in text problems many cases where rather finding latent phenomena networks use words as their anchor.
  • 19. A plausible corollary When we train a DL model: • We hardly ever know what the model learned • Models cannot “report” about their uncertainties
  • 20. Is it crucial ? • Consider an engine that decides upon AI whether a tumor is malignant or benign • Drug treatment upon medical record • Actions that are taken by an autonomous vehicle • High frequency trading
  • 21. What can we do? • DL models are trained to the optimal weights • What if rather training weights pointwise, we ill train weights’ distributions? The Inference • For each data pair (x,y) we create mean and variance • This variance will reflect model’s uncertainty • DL approach – Do Dropout in inference
  • 22. Uncertainty Types Epistemic Uncertainty : Episteme= Knowledge Uncertainty that theoretically we can know but we don’t: • Model structure issues • Absence of data We can use the notion “reducible” too
  • 23. Uncertainty Types Aleatoric Uncertainty : Aleator = Dice Player Uncertainty that we cannot solve: • The stochasticity of a dice • Noisy labels We can use the notion “irreducible” too
  • 25. BNN-Training • We have a neural network • We place a prior distribution P over the weights W • For data D={ (X,Y)} For measuring uncertainties, we use the posterior Distribution
  • 26. DL Vs. BNN DL 1. Training using a loss that is related to prediction probability P(Y|X,W) 2. The weights W are trained point-wise with MLE Bayesian NN 1. Training using a loss that is related the posterior probability P(W|X,Y) 2. We train weights’ distribution
  • 27. BNN-Inference Inference We assume prior knowledge on the weights’ distribution π As in any NN we get an input x’ and aim to predict y’ : P(y’| x’) = 𝑃 y’ 𝑥′ , 𝑤 𝑃 𝑤 𝐷 𝑑𝑤 This can be rewritten as: P(y’| x’) =𝐸 𝑃(𝑤|𝐷) 𝑃 y’ 𝑥′ , 𝑤 D={(X,Y)}
  • 28. Measuring Uncertainty • In the inference, given a data point 𝑥∗ • Sample weights W 𝑛 𝑡𝑖𝑚𝑒𝑠 • Calculate its statistics E[f(𝑥∗ ,w)]= 𝑖=1 𝑛 𝑓(𝑥∗ , 𝑤𝑖) V([f(𝑥∗ ,w)] =E𝑓(𝑥∗ ,w)2 - E[f(𝑥∗ ,w)]2 W –r.v. which 𝑤𝑖 is its samples
  • 29. Common tools to obtain Posterior Dist. 1. Variational Inference 2. MCMC –Sampling (Metropolis –Hastings, Gibbs) 3. HMC 4. SGLD
  • 30. Metropolis Hastings • MCMC sampling algorithm • The main idea is that we pick samples upon pdf comparisons: At each step we accept or randomize a sample upon the previous sample and decide to accept or reject • Unbiased, Huge variance and very slow (iterate over the entire data) • Great History
  • 31.
  • 32. What is Hamiltonian? • A physical operator that measures energy of a dynamical system Two sets of coordinates q -State coordinates p- Momentum H(p, q) =U(q) +K(p) U(q) = log[π 𝑞 𝐿(𝑞|𝐷)] K(P)= 𝑝 2 2𝑚 U-Potential energy, K –Kinetic 𝑑𝐻 𝑑𝑝 = 𝑞 , 𝑑𝐻 𝑑𝑞 = - 𝑝
  • 33. Hamiltonian Monte Carlo • Hamiltonians offer a deterministic vector field (with trajectories….) • If we set a Hamiltonian depended distribution, we can use this property for sampling P(x,y) = 𝑒−𝐻(𝑥,𝑦)
  • 34. Hybrid - MC • We have the “state space” x • We can add “momentum” and use Hamiltonian mechanism Leap Frog Algorithm We set a time interval δ, For each step i : 1. 𝑃𝑖(t+0.5 δ) =𝑃𝑖(t) – (δ/2) 𝑑𝑈 𝑑𝑞(𝑡) 2 𝑄𝑖(t+ δ ) = 𝑄𝑖(t) + δ 𝑑𝐾 𝑑𝑝(𝑡+0.5δ) 3 𝑃𝑖(t+ δ) = 𝑃𝑖(t+0.5 δ) - (δ/2) 𝑑𝑈 𝑑𝑞(𝑡+δ) 𝑄𝑖 𝑄
  • 35. HMC Algorithm (Neal 1995, 2012, Duane 1987) 1. Draw 𝑥0 from our prior Draw 𝑝0 from standard normal dist. 2. Perform L steps of leapfrog 3 Pick the 𝑥 𝑡 following M.H.
  • 36.
  • 37. HMC –Pros & Cons Pros • It takes points from a wider domains thus we can describe the distribution better • It may take points with lower density • Faster than MCMC Cons • It may suffer from low energy barrier • No minibatch –Not nice • It has to calculate gradients for the entire data!!! Bad
  • 38. What do we need then? • A tool that allows sub-sampling • Fewer Gradients • Keen knowledge about extremums and escape rooms
  • 40. Langevin Equation Langevin Equation describes the motion of pollen grain in water: F -γ𝑣 𝑡 +ξ 𝑡=0 ξ 𝑡 ~N(0,t) ξ 𝑡 is a Brownian Force- The collisions with the water molecules F - External forces This equation has an equilibrium solution which is our posterior distribution
  • 41. Langevin Equation Let’s use the following: F=𝛻𝐸 𝑣 𝑡 = 𝑑𝑋 𝑑𝑇 The eq in its discrete form becomes: 𝑥𝑡+1 = 𝑥𝑡 + dt γ ξ 𝑡 + 𝛻𝐸 dt γ (looks familiar doesn’t it?)
  • 42. Langevin Equation Some more re-write: 𝑥𝑡+1 = 𝑥𝑡 + 𝜖 𝑡 𝛻𝐸 +ε 𝑡 ε 𝑡 -Stochastic term Consider this term Are we in a better situation ?
  • 43. Robbins & Monro (Stoch. Approx. 1951) • Let F a function and θ a number • There exists a unique solution : F(𝑥∗ ) = θ F - is unknown Y - A measurable r.v. E[Y(x)] = F(x)
  • 44. Robbins & Monro (cont) The following algorithm converges to 𝑥∗ : 𝑋 𝑁+1 = 𝑋 𝑁 +α 𝑁 (𝑌𝑁 − θ )
  • 45. Back to Langevin 𝑥 𝑡+1 = 𝑥 𝑡 + 𝜖 𝑡 𝛻𝐸 +ε 𝑡 𝛻𝐸 𝑚𝑏=𝛻𝐸 + ε 𝑡 𝑥 𝑡+1 = 𝑥 𝑡 + 𝜖 𝑡 𝛻𝐸 𝑚𝑏 Δ 𝜃𝑡 =ε 𝑡(𝛻log 𝑝 𝜃𝑡 + 𝑁 𝑛 𝑖=1 𝑁 𝛻log 𝑝 𝑥𝑖|𝜃𝑡 )
  • 46. We are almost there • This eq converges to an optimal solution (MAP). • We need a solution of SDE (probability) • Let’s add a stochastic term Δ 𝜃𝑡 =ε 𝑡(𝛻log 𝑝 𝜃𝑡 + 𝑁 𝑛 𝑖=1 𝑁 𝛻log 𝑝 𝑥𝑖|𝜃𝑡 ) + η 𝑡 η 𝑡~N(0,σ)
  • 47. Variance Analysis ε 𝑡 - Follow R&M rules How big is σ ? Bigger than ε 𝑡 *V(𝛻) As t->∞ the equation must become Langevin. THE variance of η must be therefore bigger than ε 𝑡 *V(𝛻)
  • 50. Problem’s Framework • MNIST CNN model • MNIST SOTA ~99.8%
  • 51. The Experiment • Training a BNN – using VI (small amount, of epochs) • Set a regular decision law – Take the max score of each digit =>Accuracy ~88%
  • 52. Allowing the Network to refuse • For each image: • Sample 100 networks • We obtain 100 outputs per image • We have 10 digits each with 100 scores • If the median of these 100 scores>0.2 we take (Indeed, we can accept more than one result)
  • 53.
  • 54.
  • 55.
  • 56.
  • 58. Summary • Accuracy 96% • Refuse 12.5% • Random images 95% have been refused
  • 60. My process • https://wjmaddox.github.io/assets/BNN_tutorial_CILVR.pdf • https://arxiv.org/pdf/2007.06823.pdf • https://towardsdatascience.com/what-uncertainties-tell-you-in-bayesian-neural-networks-6fbd5f85648e • https://medium.com/@uriitai/augmentation-and-groups-theory-795c287fec3f • https://github.com/paraschopra/bayesian-neural-network-mnist/blob/master/bnn.ipynb • https://towardsdatascience.com/making-your-neural-network-say-i-dont-know-bayesian-nns-using-pyro- and-pytorch-b1c24e6ab8cd • http://www.stats.ox.ac.uk/~teh/research/compstats/WelTeh2011a.pdf • https://arxiv.org/pdf/1206.1901.pdf • http://cgl.elte.hu/~racz/Stoch-diff-eq.pdf • https://arxiv.org/ftp/arxiv/papers/1103/1103.1184.pdf • https://henripal.github.io/blog/langevin • https://www.cs.princeton.edu/courses/archive/fall11/cos597C/lectures/variational-inference-i.pdf