SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Neural ODE
Natan Katz
Natan.katz@gmail.com
Lecture’s Summary
• Why do we care about ODE?
• What is ODE?
• Neural ODE –History
• Neural ODE –NeurIPS paper
Why do we care?
• NeurIPS 2018 research papers competition
• 4500 papers have been submitted
• One of the best 4 :
Neural ODE (Qi Chen ,Rubanova, Bettencourt ,Duvenaud)
An new usage of both mathematical tool an approach in DL
1. Observing a network as a continuous entity
2. Observing hidden layer as a time function rather a set of
discrete entities
What are Differential Equations?
• Equations that has the form
F(X,C) =0
C is a constants vector (e.g. weights).
F is a function.. “generously differentiable”
(until now it is as complicated as a quadratic equation..)
X is a the variable of F and it contains derivatives..
Derivatives of what??!!
Classes of Differential Equations
1 Autonomous ODE - 𝑥 =f(x)
2 Non-Autonomous ODE 𝑥 =f(x,t)
3 PDE
𝜕𝑢
𝜕𝑥
+
𝜕𝑢
𝜕𝑡
−
𝜕2 𝑢
𝜕𝑥2 -g(x) = 0
4 SDE 𝑥 =f(x) +𝑑𝑊
PDE –Real Life Example
Poisson Equation ∆u =f
u is the potential of a vector field and f is the “source function”
(density or electrical charge)
Burger Equation :
𝜕𝑢
𝜕𝑡
+u
𝜕𝑢
𝜕𝑥
=μ
𝜕2 𝑢
𝜕𝑥2 u is fluid velocity ,
μ the diffusion term, For μ=0 it is used often in shock waves.
and the coolest girl in the hood Navier-Stokes
𝜕𝑈
𝜕𝑡
+ u ∙ 𝛻u =-
𝛻𝑝
𝜌
- μ ∆u +f(x, t) u is fluid velocity
Example: Black & Scholes
Stock price:
𝑑S = μS𝑑t +σS𝑑W
Derivative price (using Ito’s lemma):
𝑑V=(μS
𝜕𝑉
𝜕𝑆
+
𝜕𝑉
𝜕𝑡
+
1
2
σ2
S2 𝑑2 𝑉
𝑑2 𝑆
)dt + σS
𝜕𝑉
𝜕𝑆
dW
We wish to have a portfolio with 1 derivative (option ) and 𝛿 stocks
P =V+ 𝛿S
𝑑P =(μS
𝜕𝑉
𝜕𝑆
+
𝜕𝑉
𝜕𝑡
+
1
2
σ2
S2 𝑑2 𝑉
𝑑2 𝑆
+ 𝛿 μS)dt +(σS
𝜕𝑉
𝜕𝑆
+ 𝛿 σS) dW
Black & Scholes
Let’s get rid of the randomness
𝛿 =−
𝜕𝑉
𝜕𝑆
We assume no arbitrages (namely we can put it in the bank with risk free r)
Π = -V + S
𝜕𝑉
𝜕𝑆
=> rP𝑑t=𝑑P
Which leads to the PDE
𝜕𝑉
𝜕𝑡
+
1
2
σ2
S2 𝑑2 𝑣
𝑑2 𝑆
+rS
𝜕𝑉
𝜕𝑆
-rV=0
ODE –Basic Terminology
𝑥 =f(x) or 𝑥 =f(x,t)
Initial condition
Let the eq. 𝑥 =f(x) we add the initial condition x[0] =c
Example:
𝑥=x by integrating both sides we get
x[t] =𝑒 𝑡
a . We need the i.c. to determine a
ODE –Basic Terminology
• ODE solutions never intersect
• For most cases we cannot solve the equation analytically
We aim to study flow patterns in the state space
Ω Limit –the set of points in which flows may converge as time goes to
infinity
α Limit –the set of points in which flows may converge as time goes to minus
infinity
• Elements that we may find :fixed points, closed curves
strange attractors
ODE -Terminology
Attractors
A point or compact set in which attracts every i.c.
Fixed Point
F(x)=0 Namely the point that the flow “rests”
Stability
F.p. is stable if the flow does not leave a ε-neighborhood. (homoclinic)
Determine stability
Autonomous system
If the Jacobian has non -zero real part eigen values
• Lyapunov function
• Dulac Theorem
Non-Autonomous system
Lyapunov exponents
Bifurcations
Further Reading
• Non Autonomous DS, Kloeden & Rasmussen
• ODE - Jack Hale
• Navier Stokes –several books, papers of Edriss Titti
• Theory & applications of SDE –Zeev Schuss
• Books on Heat equation
DE & DL
• Consider Resnet
Every layer t satisfies :
ℎ 𝑡+1 =δt f(ℎ 𝑡 θ) + ℎ 𝑡
Haber & Ruthotto (2017) ,Yiping Lu ,Zhong
For infinitesimal time step (nearly continuity) We obtain:
ℎ = f(h, θ)
What does it mean?
Neural ODE –Chen Rubanova et al
One of the best research papers in NeurIPS 2018
What does it contain?
• Description of solving neural with ODE solver
• A backpropagation algorithm for ODE solver
• Comparison of this method for supervised learning
• Generative process
• Continuous normalized flow
A backpropagation algorithm for ODE solver
• There are several methods to solve ODEs such as Euler and
Runge-Kutta , their main difficulties is the amount of
gradients needed
Adjoint Method
min
θ
𝐹 F(z,θ) = 0
𝑇
𝑓 𝑧, 𝑡, θ 𝑑𝑡
g(x(0), θ) = 0
h(x, 𝑥, 𝑡, θ) =0
Note : g,h define together an initial condition problem
Adjoint Method (cont.)
So what do they do in the paper?
𝑧 =f(z,t,θ)
We assume a loss L s.t.
L(z(T) =L[z (0) + 0
𝑇
𝑓 𝑧, 𝑡, θ 𝑑𝑡] -ODE solver friendly 
We define
a(T) =
𝜕𝐿
𝜕𝑧(𝑇)
What is actually z(T)?
Adjoint Method (cont.)
We simply solve the three equations:
𝑎 = a(T) 𝑓𝑍 𝑧, 𝑡, θ
𝜕𝐿
𝜕θ
= - 𝑡
0
𝑎(𝑡)𝑓θ 𝑧, 𝑡, θ 𝑑𝑡
𝑧 =f(z,t,θ)
With the i.c. a(T), z(T) , θ0
Torch version github.com/rtqichen/torchdiffeq.
Comparison of this method for
supervised learning
They compared on MNIST:
1. Resnet
2. ODE
3. Runge-Kutta
The error is nearly similar where ResNet uses more params.
(ODE –net has about the same as a single layer with 300 units
of Resnet)
Continuous Normalization Flow- CNF
• A method that maps a generic distribution (Gaussianexponents)
Into a more complicate distributions through a sequence of maps
𝑓1 , 𝑓2 , 𝑓3 .…. 𝑓𝑘
The main difficulties here are:
𝑧1= 𝑓(𝑧0 ) => log 𝑝(𝑧1)=log 𝑝(𝑧0) -log det(𝑓𝑍[𝑧0])
Calculating determinants is “costly”.
CNF
ODE –solution:
We assume a continuous sequence of maps:
𝜕 log 𝑝( 𝑧 𝑡)
𝜕𝑡
= -tr(𝑓𝑍(t) )
Traces are easier to calculate and linear which allow us to
measure summation of fumctions as well
CNF
Generative Tools
• The main motivation: data that is irregularly sampled: traffic, medical
records . Data that is discretized although we expect a continuous
distribution to govern it.
• The ODE solution uses VAE to generate data .
For observations 𝑥1 , 𝑥2 , 𝑥3 … 𝑥 𝑚 and latent 𝑧1 , 𝑧2 , 𝑧3 … z 𝑚
𝑧0 ~ P(z)
𝑧1 , 𝑧2 , 𝑧3.. = ODEsolver(0,f, θ, 𝑡1 , 𝑡2 , 𝑡3 … t 𝑚)
𝑥𝑡 ~ P(x| 𝑧𝑡 , θ 𝑥 )
Generative ( cont)
In more details:
1. Put 𝑥1 , 𝑥2 , 𝑥3 … 𝑥 𝑚 to RNN
2. Calculate dist params 𝝀 from its hidden states (e.g. mean & std)
3. Sample 𝑧0 from q(𝑧0|𝝀. 𝑥1 , 𝑥2 , 𝑥3)
4. Run ODE solver with 𝑧0 and construct trajectory until 𝑡 𝑘
5. Decode 𝑥′
P(𝑥′
|𝑧𝑡 𝑘
, θ 𝑥)
6. Calculate KL divergence
Log(P(𝑥′
|𝑧𝑡 𝑘
, θ 𝑥)) +log(p(𝒛 𝟎)) –log(q(𝑧0|𝝀. 𝑥1 , 𝑥2 , 𝑥3))
p(𝒛 𝟎) ~N(0,1)
Thanks!!!

Weitere ähnliche Inhalte

Was ist angesagt?

Bayes Independence Test - HSIC と性能を比較する-
Bayes Independence Test - HSIC と性能を比較する-Bayes Independence Test - HSIC と性能を比較する-
Bayes Independence Test - HSIC と性能を比較する-Joe Suzuki
 
PR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networks
PR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networksPR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networks
PR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networksTaesu Kim
 
Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018
Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018
Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
High Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNEHigh Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNEKai-Wen Zhao
 
東京都市大学 データ解析入門 5 スパース性と圧縮センシング 2
東京都市大学 データ解析入門 5 スパース性と圧縮センシング 2東京都市大学 データ解析入門 5 スパース性と圧縮センシング 2
東京都市大学 データ解析入門 5 スパース性と圧縮センシング 2hirokazutanaka
 
Optimization for Deep Learning
Optimization for Deep LearningOptimization for Deep Learning
Optimization for Deep LearningSebastian Ruder
 
パターン認識と機械学習6章(カーネル法)
パターン認識と機械学習6章(カーネル法)パターン認識と機械学習6章(カーネル法)
パターン認識と機械学習6章(カーネル法)Yukara Ikemiya
 
Explicit Density Models
Explicit Density ModelsExplicit Density Models
Explicit Density ModelsSangwoo Mo
 
Reservoir Computing Overview (with emphasis on Liquid State Machines)
Reservoir Computing Overview (with emphasis on Liquid State Machines)Reservoir Computing Overview (with emphasis on Liquid State Machines)
Reservoir Computing Overview (with emphasis on Liquid State Machines)Alex Klibisz
 
Perceptron (neural network)
Perceptron (neural network)Perceptron (neural network)
Perceptron (neural network)EdutechLearners
 
東京都市大学 データ解析入門 3 行列分解 2
東京都市大学 データ解析入門 3 行列分解 2東京都市大学 データ解析入門 3 行列分解 2
東京都市大学 データ解析入門 3 行列分解 2hirokazutanaka
 
Visualization using tSNE
Visualization using tSNEVisualization using tSNE
Visualization using tSNEYan Xu
 
Recurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: TheoryRecurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: TheoryAndrii Gakhov
 
PCAの最終形態GPLVMの解説
PCAの最終形態GPLVMの解説PCAの最終形態GPLVMの解説
PCAの最終形態GPLVMの解説弘毅 露崎
 
Support Vector Machine without tears
Support Vector Machine without tearsSupport Vector Machine without tears
Support Vector Machine without tearsAnkit Sharma
 

Was ist angesagt? (20)

Bayes Independence Test - HSIC と性能を比較する-
Bayes Independence Test - HSIC と性能を比較する-Bayes Independence Test - HSIC と性能を比較する-
Bayes Independence Test - HSIC と性能を比較する-
 
Prml 10 1
Prml 10 1Prml 10 1
Prml 10 1
 
PR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networks
PR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networksPR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networks
PR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networks
 
effect of learning rate
effect of learning rateeffect of learning rate
effect of learning rate
 
Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018
Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018
Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018
 
High Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNEHigh Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNE
 
東京都市大学 データ解析入門 5 スパース性と圧縮センシング 2
東京都市大学 データ解析入門 5 スパース性と圧縮センシング 2東京都市大学 データ解析入門 5 スパース性と圧縮センシング 2
東京都市大学 データ解析入門 5 スパース性と圧縮センシング 2
 
HOPFIELD NETWORK
HOPFIELD NETWORKHOPFIELD NETWORK
HOPFIELD NETWORK
 
Optimization for Deep Learning
Optimization for Deep LearningOptimization for Deep Learning
Optimization for Deep Learning
 
パターン認識と機械学習6章(カーネル法)
パターン認識と機械学習6章(カーネル法)パターン認識と機械学習6章(カーネル法)
パターン認識と機械学習6章(カーネル法)
 
Explicit Density Models
Explicit Density ModelsExplicit Density Models
Explicit Density Models
 
Reservoir Computing Overview (with emphasis on Liquid State Machines)
Reservoir Computing Overview (with emphasis on Liquid State Machines)Reservoir Computing Overview (with emphasis on Liquid State Machines)
Reservoir Computing Overview (with emphasis on Liquid State Machines)
 
Perceptron (neural network)
Perceptron (neural network)Perceptron (neural network)
Perceptron (neural network)
 
東京都市大学 データ解析入門 3 行列分解 2
東京都市大学 データ解析入門 3 行列分解 2東京都市大学 データ解析入門 3 行列分解 2
東京都市大学 データ解析入門 3 行列分解 2
 
Visualization using tSNE
Visualization using tSNEVisualization using tSNE
Visualization using tSNE
 
Recurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: TheoryRecurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: Theory
 
Wasserstein GAN
Wasserstein GANWasserstein GAN
Wasserstein GAN
 
PCAの最終形態GPLVMの解説
PCAの最終形態GPLVMの解説PCAの最終形態GPLVMの解説
PCAの最終形態GPLVMの解説
 
Perceptron & Neural Networks
Perceptron & Neural NetworksPerceptron & Neural Networks
Perceptron & Neural Networks
 
Support Vector Machine without tears
Support Vector Machine without tearsSupport Vector Machine without tears
Support Vector Machine without tears
 

Ähnlich wie Neural ODE

Numerical_PDE_Paper
Numerical_PDE_PaperNumerical_PDE_Paper
Numerical_PDE_PaperWilliam Ruys
 
Variational inference
Variational inference  Variational inference
Variational inference Natan Katz
 
IVR - Chapter 1 - Introduction
IVR - Chapter 1 - IntroductionIVR - Chapter 1 - Introduction
IVR - Chapter 1 - IntroductionCharles Deledalle
 
A common fixed point theorem in cone metric spaces
A common fixed point theorem in cone metric spacesA common fixed point theorem in cone metric spaces
A common fixed point theorem in cone metric spacesAlexander Decker
 
An Exact Exponential Branch-And-Merge Algorithm For The Single Machine Total ...
An Exact Exponential Branch-And-Merge Algorithm For The Single Machine Total ...An Exact Exponential Branch-And-Merge Algorithm For The Single Machine Total ...
An Exact Exponential Branch-And-Merge Algorithm For The Single Machine Total ...Joe Andelija
 
Bayesian Neural Networks
Bayesian Neural NetworksBayesian Neural Networks
Bayesian Neural NetworksNatan Katz
 
DSP_DiscSignals_LinearS_150417.pptx
DSP_DiscSignals_LinearS_150417.pptxDSP_DiscSignals_LinearS_150417.pptx
DSP_DiscSignals_LinearS_150417.pptxHamedNassar5
 
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...Chiheb Ben Hammouda
 
Algorithm Design and Complexity - Course 3
Algorithm Design and Complexity - Course 3Algorithm Design and Complexity - Course 3
Algorithm Design and Complexity - Course 3Traian Rebedea
 
On clusteredsteinertree slide-ver 1.1
On clusteredsteinertree slide-ver 1.1On clusteredsteinertree slide-ver 1.1
On clusteredsteinertree slide-ver 1.1VitAnhNguyn94
 
MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...
MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...
MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...Chiheb Ben Hammouda
 
Noisy optimization --- (theory oriented) Survey
Noisy optimization --- (theory oriented) SurveyNoisy optimization --- (theory oriented) Survey
Noisy optimization --- (theory oriented) SurveyOlivier Teytaud
 
Nonlinear Systems Term Project: Averaged Modeling of the Cardiovascular System
Nonlinear Systems Term Project: Averaged Modeling of the Cardiovascular SystemNonlinear Systems Term Project: Averaged Modeling of the Cardiovascular System
Nonlinear Systems Term Project: Averaged Modeling of the Cardiovascular SystemPhilip Diette
 
The existence of common fixed point theorems of generalized contractive mappi...
The existence of common fixed point theorems of generalized contractive mappi...The existence of common fixed point theorems of generalized contractive mappi...
The existence of common fixed point theorems of generalized contractive mappi...Alexander Decker
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?Christian Robert
 
Foundation of KL Divergence
Foundation of KL DivergenceFoundation of KL Divergence
Foundation of KL DivergenceNatan Katz
 

Ähnlich wie Neural ODE (20)

Numerical_PDE_Paper
Numerical_PDE_PaperNumerical_PDE_Paper
Numerical_PDE_Paper
 
Variational inference
Variational inference  Variational inference
Variational inference
 
IVR - Chapter 1 - Introduction
IVR - Chapter 1 - IntroductionIVR - Chapter 1 - Introduction
IVR - Chapter 1 - Introduction
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Q
QQ
Q
 
A common fixed point theorem in cone metric spaces
A common fixed point theorem in cone metric spacesA common fixed point theorem in cone metric spaces
A common fixed point theorem in cone metric spaces
 
9 pd es
9 pd es9 pd es
9 pd es
 
Data structures
Data structuresData structures
Data structures
 
An Exact Exponential Branch-And-Merge Algorithm For The Single Machine Total ...
An Exact Exponential Branch-And-Merge Algorithm For The Single Machine Total ...An Exact Exponential Branch-And-Merge Algorithm For The Single Machine Total ...
An Exact Exponential Branch-And-Merge Algorithm For The Single Machine Total ...
 
Bayesian Neural Networks
Bayesian Neural NetworksBayesian Neural Networks
Bayesian Neural Networks
 
DSP_DiscSignals_LinearS_150417.pptx
DSP_DiscSignals_LinearS_150417.pptxDSP_DiscSignals_LinearS_150417.pptx
DSP_DiscSignals_LinearS_150417.pptx
 
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
 
Algorithm Design and Complexity - Course 3
Algorithm Design and Complexity - Course 3Algorithm Design and Complexity - Course 3
Algorithm Design and Complexity - Course 3
 
On clusteredsteinertree slide-ver 1.1
On clusteredsteinertree slide-ver 1.1On clusteredsteinertree slide-ver 1.1
On clusteredsteinertree slide-ver 1.1
 
MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...
MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...
MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...
 
Noisy optimization --- (theory oriented) Survey
Noisy optimization --- (theory oriented) SurveyNoisy optimization --- (theory oriented) Survey
Noisy optimization --- (theory oriented) Survey
 
Nonlinear Systems Term Project: Averaged Modeling of the Cardiovascular System
Nonlinear Systems Term Project: Averaged Modeling of the Cardiovascular SystemNonlinear Systems Term Project: Averaged Modeling of the Cardiovascular System
Nonlinear Systems Term Project: Averaged Modeling of the Cardiovascular System
 
The existence of common fixed point theorems of generalized contractive mappi...
The existence of common fixed point theorems of generalized contractive mappi...The existence of common fixed point theorems of generalized contractive mappi...
The existence of common fixed point theorems of generalized contractive mappi...
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?
 
Foundation of KL Divergence
Foundation of KL DivergenceFoundation of KL Divergence
Foundation of KL Divergence
 

Mehr von Natan Katz

AI for PM.pptx
AI for PM.pptxAI for PM.pptx
AI for PM.pptxNatan Katz
 
SGLD Berlin ML GROUP
SGLD Berlin ML GROUPSGLD Berlin ML GROUP
SGLD Berlin ML GROUPNatan Katz
 
Ancestry, Anecdotes & Avanan -DL for Amateurs
Ancestry, Anecdotes & Avanan -DL for Amateurs Ancestry, Anecdotes & Avanan -DL for Amateurs
Ancestry, Anecdotes & Avanan -DL for Amateurs Natan Katz
 
Deep VI with_beta_likelihood
Deep VI with_beta_likelihoodDeep VI with_beta_likelihood
Deep VI with_beta_likelihoodNatan Katz
 
NICE Research -Variational inference project
NICE Research -Variational inference projectNICE Research -Variational inference project
NICE Research -Variational inference projectNatan Katz
 
NICE Implementations of Variational Inference
NICE Implementations of Variational Inference NICE Implementations of Variational Inference
NICE Implementations of Variational Inference Natan Katz
 
Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement LearningNatan Katz
 
GAN for Bayesian Inference objectives
GAN for Bayesian Inference objectivesGAN for Bayesian Inference objectives
GAN for Bayesian Inference objectivesNatan Katz
 

Mehr von Natan Katz (14)

final_v.pptx
final_v.pptxfinal_v.pptx
final_v.pptx
 
AI for PM.pptx
AI for PM.pptxAI for PM.pptx
AI for PM.pptx
 
SGLD Berlin ML GROUP
SGLD Berlin ML GROUPSGLD Berlin ML GROUP
SGLD Berlin ML GROUP
 
Ancestry, Anecdotes & Avanan -DL for Amateurs
Ancestry, Anecdotes & Avanan -DL for Amateurs Ancestry, Anecdotes & Avanan -DL for Amateurs
Ancestry, Anecdotes & Avanan -DL for Amateurs
 
Cyn meetup
Cyn meetupCyn meetup
Cyn meetup
 
Finalver
FinalverFinalver
Finalver
 
Quant2a
Quant2aQuant2a
Quant2a
 
Bismark
BismarkBismark
Bismark
 
Deep VI with_beta_likelihood
Deep VI with_beta_likelihoodDeep VI with_beta_likelihood
Deep VI with_beta_likelihood
 
NICE Research -Variational inference project
NICE Research -Variational inference projectNICE Research -Variational inference project
NICE Research -Variational inference project
 
NICE Implementations of Variational Inference
NICE Implementations of Variational Inference NICE Implementations of Variational Inference
NICE Implementations of Variational Inference
 
Ucb
UcbUcb
Ucb
 
Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement Learning
 
GAN for Bayesian Inference objectives
GAN for Bayesian Inference objectivesGAN for Bayesian Inference objectives
GAN for Bayesian Inference objectives
 

Kürzlich hochgeladen

Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 

Kürzlich hochgeladen (20)

Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 

Neural ODE

  • 2. Lecture’s Summary • Why do we care about ODE? • What is ODE? • Neural ODE –History • Neural ODE –NeurIPS paper
  • 3. Why do we care? • NeurIPS 2018 research papers competition • 4500 papers have been submitted • One of the best 4 : Neural ODE (Qi Chen ,Rubanova, Bettencourt ,Duvenaud) An new usage of both mathematical tool an approach in DL 1. Observing a network as a continuous entity 2. Observing hidden layer as a time function rather a set of discrete entities
  • 4. What are Differential Equations? • Equations that has the form F(X,C) =0 C is a constants vector (e.g. weights). F is a function.. “generously differentiable” (until now it is as complicated as a quadratic equation..) X is a the variable of F and it contains derivatives.. Derivatives of what??!!
  • 5. Classes of Differential Equations 1 Autonomous ODE - 𝑥 =f(x) 2 Non-Autonomous ODE 𝑥 =f(x,t) 3 PDE 𝜕𝑢 𝜕𝑥 + 𝜕𝑢 𝜕𝑡 − 𝜕2 𝑢 𝜕𝑥2 -g(x) = 0 4 SDE 𝑥 =f(x) +𝑑𝑊
  • 6. PDE –Real Life Example Poisson Equation ∆u =f u is the potential of a vector field and f is the “source function” (density or electrical charge) Burger Equation : 𝜕𝑢 𝜕𝑡 +u 𝜕𝑢 𝜕𝑥 =μ 𝜕2 𝑢 𝜕𝑥2 u is fluid velocity , μ the diffusion term, For μ=0 it is used often in shock waves. and the coolest girl in the hood Navier-Stokes 𝜕𝑈 𝜕𝑡 + u ∙ 𝛻u =- 𝛻𝑝 𝜌 - μ ∆u +f(x, t) u is fluid velocity
  • 7. Example: Black & Scholes Stock price: 𝑑S = μS𝑑t +σS𝑑W Derivative price (using Ito’s lemma): 𝑑V=(μS 𝜕𝑉 𝜕𝑆 + 𝜕𝑉 𝜕𝑡 + 1 2 σ2 S2 𝑑2 𝑉 𝑑2 𝑆 )dt + σS 𝜕𝑉 𝜕𝑆 dW We wish to have a portfolio with 1 derivative (option ) and 𝛿 stocks P =V+ 𝛿S 𝑑P =(μS 𝜕𝑉 𝜕𝑆 + 𝜕𝑉 𝜕𝑡 + 1 2 σ2 S2 𝑑2 𝑉 𝑑2 𝑆 + 𝛿 μS)dt +(σS 𝜕𝑉 𝜕𝑆 + 𝛿 σS) dW
  • 8. Black & Scholes Let’s get rid of the randomness 𝛿 =− 𝜕𝑉 𝜕𝑆 We assume no arbitrages (namely we can put it in the bank with risk free r) Π = -V + S 𝜕𝑉 𝜕𝑆 => rP𝑑t=𝑑P Which leads to the PDE 𝜕𝑉 𝜕𝑡 + 1 2 σ2 S2 𝑑2 𝑣 𝑑2 𝑆 +rS 𝜕𝑉 𝜕𝑆 -rV=0
  • 9. ODE –Basic Terminology 𝑥 =f(x) or 𝑥 =f(x,t) Initial condition Let the eq. 𝑥 =f(x) we add the initial condition x[0] =c Example: 𝑥=x by integrating both sides we get x[t] =𝑒 𝑡 a . We need the i.c. to determine a
  • 10. ODE –Basic Terminology • ODE solutions never intersect • For most cases we cannot solve the equation analytically We aim to study flow patterns in the state space Ω Limit –the set of points in which flows may converge as time goes to infinity α Limit –the set of points in which flows may converge as time goes to minus infinity • Elements that we may find :fixed points, closed curves strange attractors
  • 11. ODE -Terminology Attractors A point or compact set in which attracts every i.c. Fixed Point F(x)=0 Namely the point that the flow “rests” Stability F.p. is stable if the flow does not leave a ε-neighborhood. (homoclinic)
  • 12.
  • 13. Determine stability Autonomous system If the Jacobian has non -zero real part eigen values • Lyapunov function • Dulac Theorem Non-Autonomous system Lyapunov exponents Bifurcations
  • 14. Further Reading • Non Autonomous DS, Kloeden & Rasmussen • ODE - Jack Hale • Navier Stokes –several books, papers of Edriss Titti • Theory & applications of SDE –Zeev Schuss • Books on Heat equation
  • 15. DE & DL • Consider Resnet Every layer t satisfies : ℎ 𝑡+1 =δt f(ℎ 𝑡 θ) + ℎ 𝑡 Haber & Ruthotto (2017) ,Yiping Lu ,Zhong For infinitesimal time step (nearly continuity) We obtain: ℎ = f(h, θ)
  • 16. What does it mean?
  • 17.
  • 18. Neural ODE –Chen Rubanova et al One of the best research papers in NeurIPS 2018 What does it contain? • Description of solving neural with ODE solver • A backpropagation algorithm for ODE solver • Comparison of this method for supervised learning • Generative process • Continuous normalized flow
  • 19. A backpropagation algorithm for ODE solver • There are several methods to solve ODEs such as Euler and Runge-Kutta , their main difficulties is the amount of gradients needed Adjoint Method min θ 𝐹 F(z,θ) = 0 𝑇 𝑓 𝑧, 𝑡, θ 𝑑𝑡 g(x(0), θ) = 0 h(x, 𝑥, 𝑡, θ) =0 Note : g,h define together an initial condition problem
  • 20. Adjoint Method (cont.) So what do they do in the paper? 𝑧 =f(z,t,θ) We assume a loss L s.t. L(z(T) =L[z (0) + 0 𝑇 𝑓 𝑧, 𝑡, θ 𝑑𝑡] -ODE solver friendly  We define a(T) = 𝜕𝐿 𝜕𝑧(𝑇) What is actually z(T)?
  • 21.
  • 22. Adjoint Method (cont.) We simply solve the three equations: 𝑎 = a(T) 𝑓𝑍 𝑧, 𝑡, θ 𝜕𝐿 𝜕θ = - 𝑡 0 𝑎(𝑡)𝑓θ 𝑧, 𝑡, θ 𝑑𝑡 𝑧 =f(z,t,θ) With the i.c. a(T), z(T) , θ0 Torch version github.com/rtqichen/torchdiffeq.
  • 23. Comparison of this method for supervised learning They compared on MNIST: 1. Resnet 2. ODE 3. Runge-Kutta The error is nearly similar where ResNet uses more params. (ODE –net has about the same as a single layer with 300 units of Resnet)
  • 24. Continuous Normalization Flow- CNF • A method that maps a generic distribution (Gaussianexponents) Into a more complicate distributions through a sequence of maps 𝑓1 , 𝑓2 , 𝑓3 .…. 𝑓𝑘 The main difficulties here are: 𝑧1= 𝑓(𝑧0 ) => log 𝑝(𝑧1)=log 𝑝(𝑧0) -log det(𝑓𝑍[𝑧0]) Calculating determinants is “costly”.
  • 25. CNF ODE –solution: We assume a continuous sequence of maps: 𝜕 log 𝑝( 𝑧 𝑡) 𝜕𝑡 = -tr(𝑓𝑍(t) ) Traces are easier to calculate and linear which allow us to measure summation of fumctions as well
  • 26. CNF
  • 27. Generative Tools • The main motivation: data that is irregularly sampled: traffic, medical records . Data that is discretized although we expect a continuous distribution to govern it. • The ODE solution uses VAE to generate data . For observations 𝑥1 , 𝑥2 , 𝑥3 … 𝑥 𝑚 and latent 𝑧1 , 𝑧2 , 𝑧3 … z 𝑚 𝑧0 ~ P(z) 𝑧1 , 𝑧2 , 𝑧3.. = ODEsolver(0,f, θ, 𝑡1 , 𝑡2 , 𝑡3 … t 𝑚) 𝑥𝑡 ~ P(x| 𝑧𝑡 , θ 𝑥 )
  • 28. Generative ( cont) In more details: 1. Put 𝑥1 , 𝑥2 , 𝑥3 … 𝑥 𝑚 to RNN 2. Calculate dist params 𝝀 from its hidden states (e.g. mean & std) 3. Sample 𝑧0 from q(𝑧0|𝝀. 𝑥1 , 𝑥2 , 𝑥3) 4. Run ODE solver with 𝑧0 and construct trajectory until 𝑡 𝑘 5. Decode 𝑥′ P(𝑥′ |𝑧𝑡 𝑘 , θ 𝑥) 6. Calculate KL divergence Log(P(𝑥′ |𝑧𝑡 𝑘 , θ 𝑥)) +log(p(𝒛 𝟎)) –log(q(𝑧0|𝝀. 𝑥1 , 𝑥2 , 𝑥3)) p(𝒛 𝟎) ~N(0,1)
  • 29.