SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Downloaden Sie, um offline zu lesen
Score-Based Generative Modeling through
Stochastic Differential Equations
KAIST ALIN-LAB
Sangwoo Mo
2020.10.14.
1
Why this paper is interesting?
Methodological aspects:
• Unified framework for score matching and diffusion model
• The framework suggests a way to improve both models
• New application for neural ODE/SDE
Experimental aspects:
• SOTA FID score on CIFAR-10 = 2.20 (StyleGAN-ADA = 3.26)
• Scale to 1024×1024 CelebA-HQ dataset
• Conditional generation with a post-hoc classifier (relatively small cost)
2
Outline
Background
• Score Matching
• Noise Conditional Score Networks (NCSN) – NeurIPS’19 oral, NeurIPS’20
• Denoising Diffusion Probabilistic Models (DDPM) – NeurIPS’20
Generative Modeling via SDE – ICLR’21 under review
3
Score Matching
• Score matching matches the score 𝑠 𝑥 ≔ ∇! log 𝑝(𝑥) of data and model
• Instead of computing the score of data, we use an alternative loss
• Theorem 1. The score matching objective has an equivalent form:
1
2
𝔼"!"#"
𝑠# 𝑥 − 𝑠$%&%(𝑥) '
'
= 𝔼"!"#"
tr ∇! 𝑠#(𝑥) +
1
2
𝑠# 𝑥 '
'
+ const.
• Choice of 𝒔 𝜽 𝒙 : One can define 𝑠#(𝑥) as a gradient of unnormalized density
function (i.e., energy), or directly model it with a neural network
• Computation of the trace: One may use the Hutchinson’s estimator:
tr 𝐴 = 𝔼) 𝑣* 𝐴𝑣
4
Score Matching
• Proof of Theorem 1. It is sufficient to show that:
𝔼"!"#"
−𝑠$%&%(𝑥)𝑠# 𝑥 = <
+
− = 𝑝$%&% 𝑥
𝜕 log 𝑝$%&% 𝑥
𝑑𝑥+
𝑠#,+(𝑥) 𝑑𝑥
= <
+
− =
𝜕𝑝$%&% 𝑥
𝑑𝑥+
𝑠#,+(𝑥) 𝑑𝑥 = <
+
=𝑝$%&% 𝑥
𝜕𝑠#,+ 𝑥
𝑑𝑥+
𝑑𝑥 + const.
• The last equality = is from the integration by parts:
= 𝑝- 𝑥 𝑓 𝑥 𝑑𝑥 = A𝑝 𝑥 𝑓 𝑥
./
/
− = 𝑝 𝑥 𝑓- 𝑥 𝑑𝑥
• and the assumption 𝑝$%&% 𝑥 𝑠# 𝑥 → 0 at (both side of) infinity
5
Noise Conditional Score Networks (NCSN)
• Limitations of the score matching:
1. Scores are not well-defined in the outside of data manifold
2. Score estimation is inaccurate at low-density regions
• Idea of NCSN: “Perturb the data with the noise of various magnitudes”
• Large noise facilitates the learning of the scores at low-density regions
• At inference time, we need an annealed sampling of the noise levels
• Concretely, let 𝜎0 > ⋯ > 𝜎1 ≈ 0 and 𝑞2 𝑥 ≔ ∫ 𝑝$%&% 𝑥 𝑞2(J𝑥 ∣ 𝑥) 𝑑𝑥
• Then, NCSN model a score function of all noise levels as 𝑠#(𝑥, 𝜎)
6
Noise Conditional Score Networks (NCSN)
• Training. (Denoising) score matching of 𝑠#(𝑥, 𝜎) and 𝑞2(𝑥) is given by:
1
2
𝔼3$( 5!∣!)"!"#"(!) 𝑠# 𝑥, 𝜎 − ∇5! log 𝑞2(J𝑥 ∣ 𝑥) '
'
• where the score of perturbation is easily computed, e.g., for 𝑞2 ≔ 𝒩(J𝑥 ∣ 𝑥, 𝜎' 𝐼)
• Sampling. Run SGLD, start from 𝜎0, and anneal to 𝜎1 ≈ 0
• Remark that 𝑠#(𝑥, 𝜎0) is now
well-estimated, hence SGLD
gives a good initial point
7
Stochastic gradient Langevin dynamics (SGLD)
= just a gradient descent + some noise
Noise Conditional Score Networks (NCSN)
• Choice of the hyperparameters. The theory suggests that
1. Initial noise level. Set 𝜎0 large, yet 𝜎0 < 𝑥 + − 𝑥 8
'
2. Other noise levels. Set 𝛾 ≔ 𝜎+/𝜎+90 that satisfies SomeFormula 𝛾 ≈ 0.5
3. Noise conditioning. Parametrize the score function as 𝑠# 𝑥, 𝜎 = 𝑠#(𝑥)/𝜎
4. Selecting 𝑇 and 𝜖. Choose 𝑇 large and set SomeFormula 𝜖 ≈ 1
5. Step size. Set 𝛼+ ∝ 𝜎+
'
• See the paper for details
8
Noise Conditional Score Networks (NCSN)
• Experiments. NCSN shows near-SOTA image generation (of then)
9
Denoising Diffusion Probabilistic Models (DDPM)
• Diffusion probabilistic models (DPM)
• DPM is a parametrized Markov chain such that
reverse and forward processes 𝑝# and 𝑞 are defined as:
10
Reverse process
Forward (diffusion) process
Denoising Diffusion Probabilistic Models (DDPM)
• Diffusion probabilistic models (DPM)
• Here, 𝛽: are usually pre-defined hyperparameters (also can be learned)
• Then, 𝑞 can be represented by closed forms:
• Training. Due to the property above, the objective ELBO
• can be easily computed (KL divergences between Gaussians)
• Sampling. Apply the reverse process 𝑝# 𝑥; ∏:<0
*
𝑝#(𝑥:.0|𝑥:)
11
𝛼% ≔ 1 − 𝛽% and &𝛼% ≔ ∏&'(
%
𝛼&
Denoising Diffusion Probabilistic Models (DDPM)
• Idea of DDPM: Smart parametrization of 𝜇#
• By rearranging the ELBO, 𝐿:.0 is given by
• Hence, we parametrize 𝜇# as
• where 𝜖# estimates 𝜖 (of the forward process) from 𝑥:
• Training. Use simplified objective
• Sampling. Resembles SGLD
• ⇒ DDPM resembles the denoising score matching!
12
𝑥% 𝑥), 𝜖 = &𝛼% 𝑥) + 1 − &𝛼% 𝜖 for 𝜖 ∼ 𝒩(0, 𝐈)
Simply set Σ* = 𝜎%
+ 𝐈 for a constant 𝜎%
(use 𝜎%
+ = 𝛽% or 𝜎%
+ = 5𝛽%)
Denoising Diffusion Probabilistic Models (DDPM)
• Experiments
• DDPM achieved the SOTA FID score (of then) on CIFAR-10
• It also provides the (upper bound of) negative log-likelihood (NLL)
13
Generative Modeling via SDE
• Motivation
• NCSN and DDPM are discretizations of some corresponding SDEs
• NCSN: →
• DDPM: →
• The key of success is perturbating data with multiple noise scales
• Generalize {𝜎+} to an infinite number of noise scales 𝜎(𝑡)
• We consider the general (forward) form of SDE:
• Then, the reverse is also SDE:
• ⇒ We can generate samples via reverse SDE, if the score ∇! log 𝑝:(𝑥) is given
14
𝑑𝑤 is Brownian motion, a stochastic process
generalization of Gaussian distribution
Generative Modeling via SDE
• Training
• Extending NCSN, the (time-dependent) score is modeled as 𝑠#(𝑥, 𝑡)
• Then, the denoising score matching loss is:
• For (cont. ver. of) NCSN and DDPM, the forward process
𝑝;: 𝑥 𝑡 𝑥 0 = 𝒩(𝑥 𝑡 ; 𝐦! ; 𝑡 , 𝚺! ; 𝑡 )
• is given by the closed form (hence, no simulation is needed)
• Sampling. Interestingly, the reverse SDE permits several sampling methods
• (1) General-purpose solver, (2) MCMC, (3) convert to deterministic ODE
15
Generative Modeling via SDE
1. General-purpose SDE solver (a.k.a. predictor)
• Ancestral sampling (of DDPM) is one specific discretization of reverse SDE
• Instead, one can use the same discretization with the forward process
2. Score-based MCMC (a.k.a. corrector)
• As annealed SGLD (of NCSN), directly run MCMC using the score
• Combining both, the predictor-corrector sampler get the best-of-the-both-worlds
16
It slightly improves the performance of DDPM
DDPMNCSN
Generative Modeling via SDE
3. Convert to deterministic ODE (a.k.a. probability flow)
• Every Itô process (SDE we consider) has a corresponding deterministic ODE
• whose trajectories induce the same evolution of densities
• Remark that neural ODE can be used for continuous normalizing flow (CNF)
• Recall. Normalizing flow is an invertible generative model
• CNF computes the trace instead of the determinant!
• ⇒ Can (1) compute exact likelihood (2) and manipulate latent via encoder!
17
Generative Modeling via SDE
• Conditional generation
• With a pre-trained score function, one can use it for conditional generation
using a post-hoc classifier (relatively small cost)
• Let 𝑦 be a (time-invariant) condition of data 𝑥; then, collect {𝑥 𝑡 , 𝑦}
• After training a (time-dependent) classifier 𝑝:(𝑦 ∣ 𝑥 𝑡 ), solve reverse SDE
• Applications: (1) class-conditional, (2) imputation, (3) colorization
18
Generative Modeling via SDE
• Experiments. The practical advantages of SDE-based generative model is:
1. High-quality image generation via predictor-corrector sampler
2. Invertible model via ODE → exact likelihood and controllable latent
19
Generative Modeling via SDE
• Experiments. The practical advantages of SDE-based generative model is:
1. High-quality image generation via predictor-corrector sampler
2. Invertible model via ODE → exact likelihood and controllable latent
20
Scale to 1024×1024 CelebA-HQ
Generative Modeling via SDE
• Experiments. The practical advantages of SDE-based generative model is:
3. Conditional generation with post-hoc classifier
• The score 𝑠#(𝑥, 𝑡) is only trained once
21
Future Direction
• Towards faster generation
• Score-based models show promising generation results
• However, the sampling often requires lots of (e.g., 1,000) iterations
• Denoising diffusion implicit models (DDIM) – ICLR 2021 under review
reduce the sampling to 10-20 iterations
• DDIM combines the idea of score-based models and GAN
• Combining the idea of SDE and GAN would be an interesting direction! J
22
References
• Hyvärinen. “Estimation of Non-Normalized Statistical Models by Score Matching”, JMLR 2005.
• Song & Ermon. “Generative Modeling by Estimating Gradients of the Data Distribution”, NeurIPS 2019.
• Song & Ermon. “Improved Techniques for Training Score-Based Generative Models”, NeurIPS 2020.
• Sohl-Dickstein et al. “Deep Unsupervised Learning using Nonequilibrium Thermodynamics”, ICML 2015.
• Ho et al. “Denoising Diffusion Probabilistic Models”, NeurIPS 2020.
• Anonymous. “Score-Based Generative Modeling through Stochastic Differential Equations”, ICLR 2021
under review.
• Chen et al. “Neural Ordinary Differential Equations”, NeurIPS 2018.
• Grathwohl et al. “FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models”,
ICLR 2019.
• Song et al. “Denoising Diffusion Implicit Models”, ICLR 2021 under review.
23
Thank you for listening!
24

Weitere ähnliche Inhalte

Was ist angesagt?

Deep Learning for Computer Vision: Generative models and adversarial training...
Deep Learning for Computer Vision: Generative models and adversarial training...Deep Learning for Computer Vision: Generative models and adversarial training...
Deep Learning for Computer Vision: Generative models and adversarial training...Universitat Politècnica de Catalunya
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks남주 김
 
Explicit Density Models
Explicit Density ModelsExplicit Density Models
Explicit Density ModelsSangwoo Mo
 
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Appsilon Data Science
 
Generative Models for General Audiences
Generative Models for General AudiencesGenerative Models for General Audiences
Generative Models for General AudiencesSangwoo Mo
 
Generative Adversarial Networks
Generative Adversarial NetworksGenerative Adversarial Networks
Generative Adversarial NetworksMark Chang
 
Learning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat MinimaLearning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat MinimaSangwoo Mo
 
Latent diffusions vs DALL-E v2
Latent diffusions vs DALL-E v2Latent diffusions vs DALL-E v2
Latent diffusions vs DALL-E v2Vitaly Bondar
 
Generative models (Geek hub 2021 lecture)
Generative models (Geek hub 2021 lecture)Generative models (Geek hub 2021 lecture)
Generative models (Geek hub 2021 lecture)Vitaly Bondar
 
Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)Prakhar Rastogi
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational AutoencoderMark Chang
 
Deep Generative Models
Deep Generative Models Deep Generative Models
Deep Generative Models Chia-Wen Cheng
 
Unsupervised learning represenation with DCGAN
Unsupervised learning represenation with DCGANUnsupervised learning represenation with DCGAN
Unsupervised learning represenation with DCGANShyam Krishna Khadka
 
Image to image translation with Pix2Pix GAN
Image to image translation with Pix2Pix GANImage to image translation with Pix2Pix GAN
Image to image translation with Pix2Pix GANS.Shayan Daneshvar
 
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAIGenerative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAIWithTheBest
 
A (Very) Gentle Introduction to Generative Adversarial Networks (a.k.a GANs)
 A (Very) Gentle Introduction to Generative Adversarial Networks (a.k.a GANs) A (Very) Gentle Introduction to Generative Adversarial Networks (a.k.a GANs)
A (Very) Gentle Introduction to Generative Adversarial Networks (a.k.a GANs)Thomas da Silva Paula
 
Image-to-Image Translation
Image-to-Image TranslationImage-to-Image Translation
Image-to-Image TranslationJunho Kim
 
Research of adversarial example on a deep neural network
Research of adversarial example on a deep neural networkResearch of adversarial example on a deep neural network
Research of adversarial example on a deep neural networkNAVER Engineering
 
Domain Transfer and Adaptation Survey
Domain Transfer and Adaptation SurveyDomain Transfer and Adaptation Survey
Domain Transfer and Adaptation SurveySangwoo Mo
 

Was ist angesagt? (20)

Deep Learning for Computer Vision: Generative models and adversarial training...
Deep Learning for Computer Vision: Generative models and adversarial training...Deep Learning for Computer Vision: Generative models and adversarial training...
Deep Learning for Computer Vision: Generative models and adversarial training...
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
Explicit Density Models
Explicit Density ModelsExplicit Density Models
Explicit Density Models
 
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)
 
Generative Models for General Audiences
Generative Models for General AudiencesGenerative Models for General Audiences
Generative Models for General Audiences
 
Generative Adversarial Networks
Generative Adversarial NetworksGenerative Adversarial Networks
Generative Adversarial Networks
 
Learning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat MinimaLearning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat Minima
 
Latent diffusions vs DALL-E v2
Latent diffusions vs DALL-E v2Latent diffusions vs DALL-E v2
Latent diffusions vs DALL-E v2
 
Generative models (Geek hub 2021 lecture)
Generative models (Geek hub 2021 lecture)Generative models (Geek hub 2021 lecture)
Generative models (Geek hub 2021 lecture)
 
Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational Autoencoder
 
Deep Generative Models
Deep Generative Models Deep Generative Models
Deep Generative Models
 
Unsupervised learning represenation with DCGAN
Unsupervised learning represenation with DCGANUnsupervised learning represenation with DCGAN
Unsupervised learning represenation with DCGAN
 
Image to image translation with Pix2Pix GAN
Image to image translation with Pix2Pix GANImage to image translation with Pix2Pix GAN
Image to image translation with Pix2Pix GAN
 
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAIGenerative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
 
A (Very) Gentle Introduction to Generative Adversarial Networks (a.k.a GANs)
 A (Very) Gentle Introduction to Generative Adversarial Networks (a.k.a GANs) A (Very) Gentle Introduction to Generative Adversarial Networks (a.k.a GANs)
A (Very) Gentle Introduction to Generative Adversarial Networks (a.k.a GANs)
 
Image-to-Image Translation
Image-to-Image TranslationImage-to-Image Translation
Image-to-Image Translation
 
Research of adversarial example on a deep neural network
Research of adversarial example on a deep neural networkResearch of adversarial example on a deep neural network
Research of adversarial example on a deep neural network
 
Introduction of VAE
Introduction of VAEIntroduction of VAE
Introduction of VAE
 
Domain Transfer and Adaptation Survey
Domain Transfer and Adaptation SurveyDomain Transfer and Adaptation Survey
Domain Transfer and Adaptation Survey
 

Ähnlich wie Score-Based Generative Modeling through Stochastic Differential Equations

6 large-scale-learning.pptx
6 large-scale-learning.pptx6 large-scale-learning.pptx
6 large-scale-learning.pptxmustafa sarac
 
Efficient anomaly detection via matrix sketching
Efficient anomaly detection via matrix sketchingEfficient anomaly detection via matrix sketching
Efficient anomaly detection via matrix sketchingHsing-chuan Hsieh
 
Exploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningExploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningSungchul Kim
 
High-Dimensional Network Estimation using ECL
High-Dimensional Network Estimation using ECLHigh-Dimensional Network Estimation using ECL
High-Dimensional Network Estimation using ECLHPCC Systems
 
Deep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural NetworksDeep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural NetworksSangwoo Mo
 
PR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation LearningPR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation LearningSungchul Kim
 
Paper study: Attention, learn to solve routing problems!
Paper study: Attention, learn to solve routing problems!Paper study: Attention, learn to solve routing problems!
Paper study: Attention, learn to solve routing problems!ChenYiHuang5
 
20230213_ComputerVision_연구.pptx
20230213_ComputerVision_연구.pptx20230213_ComputerVision_연구.pptx
20230213_ComputerVision_연구.pptxssuser7807522
 
End-to-End Object Detection with Transformers
End-to-End Object Detection with TransformersEnd-to-End Object Detection with Transformers
End-to-End Object Detection with TransformersSeunghyun Hwang
 
Efficient Variable Size Template Matching Using Fast Normalized Cross Correla...
Efficient Variable Size Template Matching Using Fast Normalized Cross Correla...Efficient Variable Size Template Matching Using Fast Normalized Cross Correla...
Efficient Variable Size Template Matching Using Fast Normalized Cross Correla...Gurbinder Gill
 
Machine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersAlbert Y. C. Chen
 
Super resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun YooSuper resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun YooJaeJun Yoo
 
powerpoint feb
powerpoint febpowerpoint feb
powerpoint febimu409
 
Nonlinear dimension reduction
Nonlinear dimension reductionNonlinear dimension reduction
Nonlinear dimension reductionYan Xu
 
On the Validity of Bayesian Neural Networks for Uncertainty Estimation
On the Validity of Bayesian Neural Networks for Uncertainty EstimationOn the Validity of Bayesian Neural Networks for Uncertainty Estimation
On the Validity of Bayesian Neural Networks for Uncertainty EstimationJane Dane
 
Transition-based Dependency Parsing with Selectional Branching
Transition-based Dependency Parsing with Selectional BranchingTransition-based Dependency Parsing with Selectional Branching
Transition-based Dependency Parsing with Selectional BranchingJinho Choi
 
Superefficient Monte Carlo Simulations
Superefficient Monte Carlo SimulationsSuperefficient Monte Carlo Simulations
Superefficient Monte Carlo SimulationsCheng-An Yang
 
Nimrita deep learning
Nimrita deep learningNimrita deep learning
Nimrita deep learningNimrita Koul
 

Ähnlich wie Score-Based Generative Modeling through Stochastic Differential Equations (20)

6 large-scale-learning.pptx
6 large-scale-learning.pptx6 large-scale-learning.pptx
6 large-scale-learning.pptx
 
Efficient anomaly detection via matrix sketching
Efficient anomaly detection via matrix sketchingEfficient anomaly detection via matrix sketching
Efficient anomaly detection via matrix sketching
 
Exploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningExploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation Learning
 
High-Dimensional Network Estimation using ECL
High-Dimensional Network Estimation using ECLHigh-Dimensional Network Estimation using ECL
High-Dimensional Network Estimation using ECL
 
Deep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural NetworksDeep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural Networks
 
Mnist soln
Mnist solnMnist soln
Mnist soln
 
PR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation LearningPR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation Learning
 
Paper study: Attention, learn to solve routing problems!
Paper study: Attention, learn to solve routing problems!Paper study: Attention, learn to solve routing problems!
Paper study: Attention, learn to solve routing problems!
 
20230213_ComputerVision_연구.pptx
20230213_ComputerVision_연구.pptx20230213_ComputerVision_연구.pptx
20230213_ComputerVision_연구.pptx
 
End-to-End Object Detection with Transformers
End-to-End Object Detection with TransformersEnd-to-End Object Detection with Transformers
End-to-End Object Detection with Transformers
 
Efficient Variable Size Template Matching Using Fast Normalized Cross Correla...
Efficient Variable Size Template Matching Using Fast Normalized Cross Correla...Efficient Variable Size Template Matching Using Fast Normalized Cross Correla...
Efficient Variable Size Template Matching Using Fast Normalized Cross Correla...
 
Machine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional Managers
 
Super resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun YooSuper resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun Yoo
 
powerpoint feb
powerpoint febpowerpoint feb
powerpoint feb
 
Nonlinear dimension reduction
Nonlinear dimension reductionNonlinear dimension reduction
Nonlinear dimension reduction
 
On the Validity of Bayesian Neural Networks for Uncertainty Estimation
On the Validity of Bayesian Neural Networks for Uncertainty EstimationOn the Validity of Bayesian Neural Networks for Uncertainty Estimation
On the Validity of Bayesian Neural Networks for Uncertainty Estimation
 
Transition-based Dependency Parsing with Selectional Branching
Transition-based Dependency Parsing with Selectional BranchingTransition-based Dependency Parsing with Selectional Branching
Transition-based Dependency Parsing with Selectional Branching
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
Superefficient Monte Carlo Simulations
Superefficient Monte Carlo SimulationsSuperefficient Monte Carlo Simulations
Superefficient Monte Carlo Simulations
 
Nimrita deep learning
Nimrita deep learningNimrita deep learning
Nimrita deep learning
 

Mehr von Sangwoo Mo

Brief History of Visual Representation Learning
Brief History of Visual Representation LearningBrief History of Visual Representation Learning
Brief History of Visual Representation LearningSangwoo Mo
 
Learning Visual Representations from Uncurated Data
Learning Visual Representations from Uncurated DataLearning Visual Representations from Uncurated Data
Learning Visual Representations from Uncurated DataSangwoo Mo
 
Hyperbolic Deep Reinforcement Learning
Hyperbolic Deep Reinforcement LearningHyperbolic Deep Reinforcement Learning
Hyperbolic Deep Reinforcement LearningSangwoo Mo
 
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...Sangwoo Mo
 
Self-supervised Learning Lecture Note
Self-supervised Learning Lecture NoteSelf-supervised Learning Lecture Note
Self-supervised Learning Lecture NoteSangwoo Mo
 
Deep Learning Theory Seminar (Chap 3, part 2)
Deep Learning Theory Seminar (Chap 3, part 2)Deep Learning Theory Seminar (Chap 3, part 2)
Deep Learning Theory Seminar (Chap 3, part 2)Sangwoo Mo
 
Deep Learning Theory Seminar (Chap 1-2, part 1)
Deep Learning Theory Seminar (Chap 1-2, part 1)Deep Learning Theory Seminar (Chap 1-2, part 1)
Deep Learning Theory Seminar (Chap 1-2, part 1)Sangwoo Mo
 
Object-Region Video Transformers
Object-Region Video TransformersObject-Region Video Transformers
Object-Region Video TransformersSangwoo Mo
 
Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)Sangwoo Mo
 
Self-Attention with Linear Complexity
Self-Attention with Linear ComplexitySelf-Attention with Linear Complexity
Self-Attention with Linear ComplexitySangwoo Mo
 
Meta-Learning with Implicit Gradients
Meta-Learning with Implicit GradientsMeta-Learning with Implicit Gradients
Meta-Learning with Implicit GradientsSangwoo Mo
 
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Sangwoo Mo
 
Bayesian Model-Agnostic Meta-Learning
Bayesian Model-Agnostic Meta-LearningBayesian Model-Agnostic Meta-Learning
Bayesian Model-Agnostic Meta-LearningSangwoo Mo
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingSangwoo Mo
 
Neural Processes
Neural ProcessesNeural Processes
Neural ProcessesSangwoo Mo
 
Improved Trainings of Wasserstein GANs (WGAN-GP)
Improved Trainings of Wasserstein GANs (WGAN-GP)Improved Trainings of Wasserstein GANs (WGAN-GP)
Improved Trainings of Wasserstein GANs (WGAN-GP)Sangwoo Mo
 
Recursive Neural Networks
Recursive Neural NetworksRecursive Neural Networks
Recursive Neural NetworksSangwoo Mo
 
Emergence of Invariance and Disentangling in Deep Representations
Emergence of Invariance and Disentangling in Deep RepresentationsEmergence of Invariance and Disentangling in Deep Representations
Emergence of Invariance and Disentangling in Deep RepresentationsSangwoo Mo
 
REBAR: Low-variance, unbiased gradient estimates for discrete latent variable...
REBAR: Low-variance, unbiased gradient estimates for discrete latent variable...REBAR: Low-variance, unbiased gradient estimates for discrete latent variable...
REBAR: Low-variance, unbiased gradient estimates for discrete latent variable...Sangwoo Mo
 
Topology for Computing: Homology
Topology for Computing: HomologyTopology for Computing: Homology
Topology for Computing: HomologySangwoo Mo
 

Mehr von Sangwoo Mo (20)

Brief History of Visual Representation Learning
Brief History of Visual Representation LearningBrief History of Visual Representation Learning
Brief History of Visual Representation Learning
 
Learning Visual Representations from Uncurated Data
Learning Visual Representations from Uncurated DataLearning Visual Representations from Uncurated Data
Learning Visual Representations from Uncurated Data
 
Hyperbolic Deep Reinforcement Learning
Hyperbolic Deep Reinforcement LearningHyperbolic Deep Reinforcement Learning
Hyperbolic Deep Reinforcement Learning
 
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
 
Self-supervised Learning Lecture Note
Self-supervised Learning Lecture NoteSelf-supervised Learning Lecture Note
Self-supervised Learning Lecture Note
 
Deep Learning Theory Seminar (Chap 3, part 2)
Deep Learning Theory Seminar (Chap 3, part 2)Deep Learning Theory Seminar (Chap 3, part 2)
Deep Learning Theory Seminar (Chap 3, part 2)
 
Deep Learning Theory Seminar (Chap 1-2, part 1)
Deep Learning Theory Seminar (Chap 1-2, part 1)Deep Learning Theory Seminar (Chap 1-2, part 1)
Deep Learning Theory Seminar (Chap 1-2, part 1)
 
Object-Region Video Transformers
Object-Region Video TransformersObject-Region Video Transformers
Object-Region Video Transformers
 
Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)
 
Self-Attention with Linear Complexity
Self-Attention with Linear ComplexitySelf-Attention with Linear Complexity
Self-Attention with Linear Complexity
 
Meta-Learning with Implicit Gradients
Meta-Learning with Implicit GradientsMeta-Learning with Implicit Gradients
Meta-Learning with Implicit Gradients
 
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
 
Bayesian Model-Agnostic Meta-Learning
Bayesian Model-Agnostic Meta-LearningBayesian Model-Agnostic Meta-Learning
Bayesian Model-Agnostic Meta-Learning
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
Neural Processes
Neural ProcessesNeural Processes
Neural Processes
 
Improved Trainings of Wasserstein GANs (WGAN-GP)
Improved Trainings of Wasserstein GANs (WGAN-GP)Improved Trainings of Wasserstein GANs (WGAN-GP)
Improved Trainings of Wasserstein GANs (WGAN-GP)
 
Recursive Neural Networks
Recursive Neural NetworksRecursive Neural Networks
Recursive Neural Networks
 
Emergence of Invariance and Disentangling in Deep Representations
Emergence of Invariance and Disentangling in Deep RepresentationsEmergence of Invariance and Disentangling in Deep Representations
Emergence of Invariance and Disentangling in Deep Representations
 
REBAR: Low-variance, unbiased gradient estimates for discrete latent variable...
REBAR: Low-variance, unbiased gradient estimates for discrete latent variable...REBAR: Low-variance, unbiased gradient estimates for discrete latent variable...
REBAR: Low-variance, unbiased gradient estimates for discrete latent variable...
 
Topology for Computing: Homology
Topology for Computing: HomologyTopology for Computing: Homology
Topology for Computing: Homology
 

Kürzlich hochgeladen

Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 

Kürzlich hochgeladen (20)

Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

Score-Based Generative Modeling through Stochastic Differential Equations

  • 1. Score-Based Generative Modeling through Stochastic Differential Equations KAIST ALIN-LAB Sangwoo Mo 2020.10.14. 1
  • 2. Why this paper is interesting? Methodological aspects: • Unified framework for score matching and diffusion model • The framework suggests a way to improve both models • New application for neural ODE/SDE Experimental aspects: • SOTA FID score on CIFAR-10 = 2.20 (StyleGAN-ADA = 3.26) • Scale to 1024×1024 CelebA-HQ dataset • Conditional generation with a post-hoc classifier (relatively small cost) 2
  • 3. Outline Background • Score Matching • Noise Conditional Score Networks (NCSN) – NeurIPS’19 oral, NeurIPS’20 • Denoising Diffusion Probabilistic Models (DDPM) – NeurIPS’20 Generative Modeling via SDE – ICLR’21 under review 3
  • 4. Score Matching • Score matching matches the score 𝑠 𝑥 ≔ ∇! log 𝑝(𝑥) of data and model • Instead of computing the score of data, we use an alternative loss • Theorem 1. The score matching objective has an equivalent form: 1 2 𝔼"!"#" 𝑠# 𝑥 − 𝑠$%&%(𝑥) ' ' = 𝔼"!"#" tr ∇! 𝑠#(𝑥) + 1 2 𝑠# 𝑥 ' ' + const. • Choice of 𝒔 𝜽 𝒙 : One can define 𝑠#(𝑥) as a gradient of unnormalized density function (i.e., energy), or directly model it with a neural network • Computation of the trace: One may use the Hutchinson’s estimator: tr 𝐴 = 𝔼) 𝑣* 𝐴𝑣 4
  • 5. Score Matching • Proof of Theorem 1. It is sufficient to show that: 𝔼"!"#" −𝑠$%&%(𝑥)𝑠# 𝑥 = < + − = 𝑝$%&% 𝑥 𝜕 log 𝑝$%&% 𝑥 𝑑𝑥+ 𝑠#,+(𝑥) 𝑑𝑥 = < + − = 𝜕𝑝$%&% 𝑥 𝑑𝑥+ 𝑠#,+(𝑥) 𝑑𝑥 = < + =𝑝$%&% 𝑥 𝜕𝑠#,+ 𝑥 𝑑𝑥+ 𝑑𝑥 + const. • The last equality = is from the integration by parts: = 𝑝- 𝑥 𝑓 𝑥 𝑑𝑥 = A𝑝 𝑥 𝑓 𝑥 ./ / − = 𝑝 𝑥 𝑓- 𝑥 𝑑𝑥 • and the assumption 𝑝$%&% 𝑥 𝑠# 𝑥 → 0 at (both side of) infinity 5
  • 6. Noise Conditional Score Networks (NCSN) • Limitations of the score matching: 1. Scores are not well-defined in the outside of data manifold 2. Score estimation is inaccurate at low-density regions • Idea of NCSN: “Perturb the data with the noise of various magnitudes” • Large noise facilitates the learning of the scores at low-density regions • At inference time, we need an annealed sampling of the noise levels • Concretely, let 𝜎0 > ⋯ > 𝜎1 ≈ 0 and 𝑞2 𝑥 ≔ ∫ 𝑝$%&% 𝑥 𝑞2(J𝑥 ∣ 𝑥) 𝑑𝑥 • Then, NCSN model a score function of all noise levels as 𝑠#(𝑥, 𝜎) 6
  • 7. Noise Conditional Score Networks (NCSN) • Training. (Denoising) score matching of 𝑠#(𝑥, 𝜎) and 𝑞2(𝑥) is given by: 1 2 𝔼3$( 5!∣!)"!"#"(!) 𝑠# 𝑥, 𝜎 − ∇5! log 𝑞2(J𝑥 ∣ 𝑥) ' ' • where the score of perturbation is easily computed, e.g., for 𝑞2 ≔ 𝒩(J𝑥 ∣ 𝑥, 𝜎' 𝐼) • Sampling. Run SGLD, start from 𝜎0, and anneal to 𝜎1 ≈ 0 • Remark that 𝑠#(𝑥, 𝜎0) is now well-estimated, hence SGLD gives a good initial point 7 Stochastic gradient Langevin dynamics (SGLD) = just a gradient descent + some noise
  • 8. Noise Conditional Score Networks (NCSN) • Choice of the hyperparameters. The theory suggests that 1. Initial noise level. Set 𝜎0 large, yet 𝜎0 < 𝑥 + − 𝑥 8 ' 2. Other noise levels. Set 𝛾 ≔ 𝜎+/𝜎+90 that satisfies SomeFormula 𝛾 ≈ 0.5 3. Noise conditioning. Parametrize the score function as 𝑠# 𝑥, 𝜎 = 𝑠#(𝑥)/𝜎 4. Selecting 𝑇 and 𝜖. Choose 𝑇 large and set SomeFormula 𝜖 ≈ 1 5. Step size. Set 𝛼+ ∝ 𝜎+ ' • See the paper for details 8
  • 9. Noise Conditional Score Networks (NCSN) • Experiments. NCSN shows near-SOTA image generation (of then) 9
  • 10. Denoising Diffusion Probabilistic Models (DDPM) • Diffusion probabilistic models (DPM) • DPM is a parametrized Markov chain such that reverse and forward processes 𝑝# and 𝑞 are defined as: 10 Reverse process Forward (diffusion) process
  • 11. Denoising Diffusion Probabilistic Models (DDPM) • Diffusion probabilistic models (DPM) • Here, 𝛽: are usually pre-defined hyperparameters (also can be learned) • Then, 𝑞 can be represented by closed forms: • Training. Due to the property above, the objective ELBO • can be easily computed (KL divergences between Gaussians) • Sampling. Apply the reverse process 𝑝# 𝑥; ∏:<0 * 𝑝#(𝑥:.0|𝑥:) 11 𝛼% ≔ 1 − 𝛽% and &𝛼% ≔ ∏&'( % 𝛼&
  • 12. Denoising Diffusion Probabilistic Models (DDPM) • Idea of DDPM: Smart parametrization of 𝜇# • By rearranging the ELBO, 𝐿:.0 is given by • Hence, we parametrize 𝜇# as • where 𝜖# estimates 𝜖 (of the forward process) from 𝑥: • Training. Use simplified objective • Sampling. Resembles SGLD • ⇒ DDPM resembles the denoising score matching! 12 𝑥% 𝑥), 𝜖 = &𝛼% 𝑥) + 1 − &𝛼% 𝜖 for 𝜖 ∼ 𝒩(0, 𝐈) Simply set Σ* = 𝜎% + 𝐈 for a constant 𝜎% (use 𝜎% + = 𝛽% or 𝜎% + = 5𝛽%)
  • 13. Denoising Diffusion Probabilistic Models (DDPM) • Experiments • DDPM achieved the SOTA FID score (of then) on CIFAR-10 • It also provides the (upper bound of) negative log-likelihood (NLL) 13
  • 14. Generative Modeling via SDE • Motivation • NCSN and DDPM are discretizations of some corresponding SDEs • NCSN: → • DDPM: → • The key of success is perturbating data with multiple noise scales • Generalize {𝜎+} to an infinite number of noise scales 𝜎(𝑡) • We consider the general (forward) form of SDE: • Then, the reverse is also SDE: • ⇒ We can generate samples via reverse SDE, if the score ∇! log 𝑝:(𝑥) is given 14 𝑑𝑤 is Brownian motion, a stochastic process generalization of Gaussian distribution
  • 15. Generative Modeling via SDE • Training • Extending NCSN, the (time-dependent) score is modeled as 𝑠#(𝑥, 𝑡) • Then, the denoising score matching loss is: • For (cont. ver. of) NCSN and DDPM, the forward process 𝑝;: 𝑥 𝑡 𝑥 0 = 𝒩(𝑥 𝑡 ; 𝐦! ; 𝑡 , 𝚺! ; 𝑡 ) • is given by the closed form (hence, no simulation is needed) • Sampling. Interestingly, the reverse SDE permits several sampling methods • (1) General-purpose solver, (2) MCMC, (3) convert to deterministic ODE 15
  • 16. Generative Modeling via SDE 1. General-purpose SDE solver (a.k.a. predictor) • Ancestral sampling (of DDPM) is one specific discretization of reverse SDE • Instead, one can use the same discretization with the forward process 2. Score-based MCMC (a.k.a. corrector) • As annealed SGLD (of NCSN), directly run MCMC using the score • Combining both, the predictor-corrector sampler get the best-of-the-both-worlds 16 It slightly improves the performance of DDPM DDPMNCSN
  • 17. Generative Modeling via SDE 3. Convert to deterministic ODE (a.k.a. probability flow) • Every Itô process (SDE we consider) has a corresponding deterministic ODE • whose trajectories induce the same evolution of densities • Remark that neural ODE can be used for continuous normalizing flow (CNF) • Recall. Normalizing flow is an invertible generative model • CNF computes the trace instead of the determinant! • ⇒ Can (1) compute exact likelihood (2) and manipulate latent via encoder! 17
  • 18. Generative Modeling via SDE • Conditional generation • With a pre-trained score function, one can use it for conditional generation using a post-hoc classifier (relatively small cost) • Let 𝑦 be a (time-invariant) condition of data 𝑥; then, collect {𝑥 𝑡 , 𝑦} • After training a (time-dependent) classifier 𝑝:(𝑦 ∣ 𝑥 𝑡 ), solve reverse SDE • Applications: (1) class-conditional, (2) imputation, (3) colorization 18
  • 19. Generative Modeling via SDE • Experiments. The practical advantages of SDE-based generative model is: 1. High-quality image generation via predictor-corrector sampler 2. Invertible model via ODE → exact likelihood and controllable latent 19
  • 20. Generative Modeling via SDE • Experiments. The practical advantages of SDE-based generative model is: 1. High-quality image generation via predictor-corrector sampler 2. Invertible model via ODE → exact likelihood and controllable latent 20 Scale to 1024×1024 CelebA-HQ
  • 21. Generative Modeling via SDE • Experiments. The practical advantages of SDE-based generative model is: 3. Conditional generation with post-hoc classifier • The score 𝑠#(𝑥, 𝑡) is only trained once 21
  • 22. Future Direction • Towards faster generation • Score-based models show promising generation results • However, the sampling often requires lots of (e.g., 1,000) iterations • Denoising diffusion implicit models (DDIM) – ICLR 2021 under review reduce the sampling to 10-20 iterations • DDIM combines the idea of score-based models and GAN • Combining the idea of SDE and GAN would be an interesting direction! J 22
  • 23. References • Hyvärinen. “Estimation of Non-Normalized Statistical Models by Score Matching”, JMLR 2005. • Song & Ermon. “Generative Modeling by Estimating Gradients of the Data Distribution”, NeurIPS 2019. • Song & Ermon. “Improved Techniques for Training Score-Based Generative Models”, NeurIPS 2020. • Sohl-Dickstein et al. “Deep Unsupervised Learning using Nonequilibrium Thermodynamics”, ICML 2015. • Ho et al. “Denoising Diffusion Probabilistic Models”, NeurIPS 2020. • Anonymous. “Score-Based Generative Modeling through Stochastic Differential Equations”, ICLR 2021 under review. • Chen et al. “Neural Ordinary Differential Equations”, NeurIPS 2018. • Grathwohl et al. “FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models”, ICLR 2019. • Song et al. “Denoising Diffusion Implicit Models”, ICLR 2021 under review. 23
  • 24. Thank you for listening! 24