SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Downloaden Sie, um offline zu lesen
Sungchul Kim
Contents
1. Introduction
2. Background
3. Score-based generative modeling with SDEs
4. Solving the Reverse SDE
5. Controllable Generation
6. Discussion
2
Introduction
3
• Probabilistic generative models
• Slowly increasing noise / removing noise
• Score matching with Langevin dynamics (SMLD)
• 각 noise scale에서 score (i.e., the gradient of the log probability density, ∇𝐱log 𝑝𝑡 𝐱 ) 계산
• Noise scale 감소에 따른 샘플에 Langevin dynamics 적용
• Denoising diffusion probabilistic modeling (DDPM)
• Noise corruption을 거꾸로 하면서 학습
• 추적가능한 reverse distribution 사용
→ score-based generative models
• The practical performance of these two model classes is often quite different for reasons
that are not fully understood.
Introduction
4
• Stochastic Differential Equations (SDEs)
• 유한한 noise distribution으로 perturbing하는 대신,
diffusion하는 방식으로 시간이 지남에 따라 변화하는 a continuum of distributions 고려
• Data → random noise로 diffusion : SDE로 표현
• Random noise → data로 sample generation : a reverse-time SDE
Introduction
5
• Contributions
• Flexible sampling
• Reverse-time SDE를 위한 general-purpose SDE
• Predictor-Corrector (PC) : score-based MCMC로 numerical SDE solvers를 결합
• Deterministic samplers : the probability flow ordinary differential equations (ODE)
• Controllable generation
• 학습 이후 조건을 걸어서 생성과정에 변화
• Conditional reverse-time SDE는 unconditional score로부터 효율적으로 계산됨
• Class-conditional generation, image inpainting, colorization
• Unified picture
• SMLD와 DDPM은 다른 SDE의 discretization으로 통합 가능
• SDE 자체로 이익이지만, 다른 디자인과 결합할 수 있음
Background
6
• Denoising Score Matching with Langevin Dynamics (SMLD)
• Notation
• 𝑝𝜎 ෤
𝐱 𝐱 ≜ 𝒩 ෤
𝐱; 𝐱, 𝜎2
I : a perturbation kernel
• 𝑝𝜎 ෤
𝐱 ≜ ∫ 𝑝data 𝐱 𝑝𝜎 ෤
𝐱 𝐱 d𝐱 (𝑝data : the data distribution)
• 𝜎min = 𝜎1 < 𝜎2 < ⋯ < 𝜎𝑁 = 𝜎max : a sequence of positive noise scales
• 𝑝𝜎min
𝐱 ≈ 𝑝data 𝐱 / 𝑝𝜎m𝑎𝑥
𝐱 ≈ 𝒩 x; 𝟎, 𝜎max
2
𝐈
• Noise Conditional Score Network (NCSN) : 𝒔𝜃 𝐱, 𝜎
• A weighted sum of denoising score matching
𝜃∗
= argmin𝜃 ෍
𝑖=1
𝑁
𝜎𝑖
2
𝔼𝑝data 𝐱 𝔼𝑝𝜎𝑖
෤
𝐱 𝐱 𝐬𝜃 ෤
𝐱, 𝜎𝑖 − ∇෤
𝐱 log𝑝𝜎𝑖
෤
𝐱 𝐱
2
2
• 충분한 데이터와 모델 용량이 주어지면 optimal score-based model 𝐬𝜃∗ 𝐱, 𝜎 은 𝜎 ∈ 𝜎𝑖 𝑖=1
𝑁
에서 ∇𝐱 log𝑝𝜎 𝐱 와 같아짐
Background
7
• Denoising Score Matching with Langevin Dynamics (SMLD)
• Noise Conditional Score Network (NCSN) : 𝒔𝜃 𝐱, 𝜎
• A weighted sum of denoising score matching
𝜃∗
= argmin𝜃 ෍
𝑖=1
𝑁
𝜎𝑖
2
𝔼𝑝data 𝐱 𝔼𝑝𝜎𝑖
෤
𝐱 𝐱 𝐬𝜃 ෤
𝐱, 𝜎𝑖 − ∇෤
x log 𝑝𝜎𝑖
෤
𝐱 𝐱
2
2
• 충분한 데이터와 모델 용량이 주어지면 optimal score-based model 𝐬𝜃∗ 𝐱, 𝜎 은 𝜎 ∈ 𝜎𝑖 𝑖=1
𝑁
에서 ∇𝐱 log𝑝𝜎 𝐱 와 같아짐
• 각 𝑝𝜎𝑖
x 에서 샘플을 얻기 위해 Langevin MCMC를 𝑀 step 수행
𝐱𝑖
𝑚
= 𝐱𝑖
𝑚−1
+ 𝜖𝑖𝐬𝜃∗ 𝐱𝑖
𝑚−1
, 𝜎𝑖 + 2𝜖𝑖𝐳𝑖
𝑚
• 𝜖𝑖 > 0 : the step size / 𝐳𝑖
𝑚
: standard normal
• 𝑖 = 𝑁, 𝑁 − 1, … , 1을 반복 (𝐱𝑁
0
≈ 𝒩 𝐱 𝟎, 𝜎max
2
𝐈 , 𝐱𝑖
0
= 𝐱𝑖+1
𝑀
when 𝑖 < 𝑁)
• 𝑀 → ∞ and 𝜖𝑖 → 0 for all 𝑖, 𝐱1
𝑁
은 𝑝𝜎min
𝐱 ≈ 𝑝data 𝐱 의 샘플
Background
8
• Denoising Diffusion Probabilistic Models (DDPM)
• A sequence of positive noise scales 0 < 𝛽1, 𝛽2, … , 𝛽𝑁 < 1
• A discrete Markov chain 𝐱0, 𝐱1, … , 𝐱𝑁
• 𝑝 𝐱𝑖 𝐱𝑖−1 = 𝒩 𝐱𝑖; 1 − 𝛽𝑖𝐱𝑖−1, 𝛽𝑖𝐈 → 𝑝𝛼𝑖
𝐱𝑖 𝐱0 = 𝒩 𝐱𝑖; 𝛼𝑖𝐱0, 1 − 𝛼𝑖 𝐈 (𝛼𝑖 ≜ Π𝑗=1
𝑖
1 − 𝛽𝑗 )
• The perturbed data distribution : 𝑝𝛼𝑖
෤
x ≜ ∫ 𝑝data 𝐱 𝑝𝛼𝑖
෤
𝐱 𝐱 dx
• A variational Markov chain in the reverse direction : 𝑝𝜃 𝐱𝑖−1 𝐱 = 𝒩 𝐱𝑖−1;
1
1−𝛽𝑖
𝐱𝑖 + 𝛽𝑖𝐬𝜃 𝐱𝑖, 𝑖 , 𝛽𝑖𝐈
• A re-weighted variant of the evidence lower bound (ELBO)
𝜃∗
= argmin𝜃 ෍
𝑖=1
𝑁
1 − 𝛼𝑖 𝔼𝑝data 𝐱 𝔼𝑝𝛼𝑖
෤
𝐱 𝐱 𝐬𝜃 ෤
𝐱, 𝑖 − ∇෤
𝐱 log 𝑝𝛼𝑖
෤
𝐱 𝐱
2
2
Background
9
• Denoising Diffusion Probabilistic Models (DDPM)
• The estimated reverse Markov chain (ancestral sampling)
𝐱𝑖−1 =
1
1 − 𝛽𝑖
𝐱𝑖 + 𝛽𝑖𝑠𝜃∗ 𝐱𝑖, 𝑖 + 𝛽𝑖𝐳𝑖, 𝑖 = 𝑁, 𝑁 − 1, … , 1
• 𝜎𝑖
2
∝ 1/𝔼 ∇𝐱 log 𝑝𝜎𝑖
෤
𝐱 𝐱
2
2
• 1 − 𝛼𝑖 ∝ 1/𝔼 ∇𝐱 log𝑝𝜎𝑖
෤
𝐱 𝐱
2
2
Score-based Generative Modeling with SDEs
10
• Contribution
→ generalize these ideas further to an infinite number of noise scales
Score-based Generative Modeling with SDEs
11
• Perturbing Data with SDEs
• 목표 : a diffusion process 𝐱 𝑡 𝑡=0
𝑇
indexed by a continuous time variable 𝑡 ∈ 0, 𝑇
• x 0 ∼ 𝑝0 for which we have a dataset of i.i.d. samples (𝑝0 : the data distribution)
• x 𝑇 ∼ 𝑝𝑇 for which we have a tractable form to generate samples efficiently (𝑝𝑇 : the prior distribution)
d𝐱 = 𝐟 𝐱, 𝑡 d𝑡 + 𝑔 𝑡 d𝐰
• 𝐱 0 ∼ 𝑝0 : the data distribution / 𝐱 𝑇 ∼ 𝑝𝑇 : the prior distribution
• 𝐰 : the standard Wiener process (a.k.a., Brownian motion)
• 𝐟 ⋅, 𝑡 ∶ ℝ𝑑
→ ℝ𝑑
: a vector-valued function called the drift coefficient of 𝐱 𝑡
• 𝑔 ⋅ ∶ ℝ → ℝ : a scalar function known as the diffusion coefficient of 𝐱 𝑡
• 쉬운 계산을 위해 scalar로 가정 (𝑑 × 𝑑 matrix)
• State와 time이 전체적으로 Lipschitz일 때, SDE는 unique strong solution을 가짐
• 𝑝𝑡 𝐱 : the probability density of 𝐱 𝑡
• 𝑝𝑠𝑡 𝐱 𝑡 𝐱 𝑠 : the transition kernel from 𝐱 𝑠 to 𝐱 𝑡 (where 0 ≤ 𝑠 < 𝑡 ≤ 𝑇)
Score-based Generative Modeling with SDEs
12
• Generating Samples by Reversing the SDE
• 𝐱 𝑇 ∼ 𝑝𝑇의 샘플을 시작으로 reverse → 𝐱 0 ∼ 𝑝0 생성
d𝐱 = 𝐟 𝐱, 𝑡 − 𝑔 𝑡 2
∇𝐱log 𝑝𝑡 𝐱 d𝑡 + 𝑔 𝑡 d ഥ
𝐰
• ഥ
𝐰 : a standard Wiener process when time flows backwards from 𝑇 from 0
• d𝑡 : an infinitesimal negative timestep
• 모든 𝑡에 대해 marginal distribution ∇𝐱 log 𝑝𝑡 𝐱 가 주어지면
→ reverse diffusion process
Score-based Generative Modeling with SDEs
13
• Estimating Scores for the SDE
• The score of a distribution can be estimated by training a score-based model on samples with score
matching.
• Train a time-dependent score-based model 𝐬𝜽 𝐱, 𝑡 via a continuous generalization
𝜽∗
= argmin𝜃𝔼𝑡 𝜆 𝑡 𝔼𝐱 0 𝔼𝐱 𝑡 𝐬𝜽 𝐱 𝑡 , 𝑡 − ∇𝐱 𝑡 log𝑝0𝑡 𝐱 𝑡 𝐱 0 2
2
• 𝜆 ∶ 0, 𝑇 → ℝ+
: a positive weighting function
• With sufficient data and model capacity, 𝐬𝜽∗ 𝐱, 𝑡 = ∇𝐱 log 𝑝𝑡 𝐱 for almost all 𝐱 and 𝑡
• 𝜆 ∝ 1/𝔼 ∇𝐱 𝑡 log 𝑝0𝑡 𝐱 𝑡 𝐱 0 2
2
Score-based Generative Modeling with SDEs
14
• Special Cases: VE and VP SDEs
• The noise perturbations used in SMLD and DDPM can be regarded as discretizations of two different SDEs
• Variance Exploding (VE) and Variance Preserving (VP) SDEs
• Each perturbation kernel 𝑝𝜎𝑖
𝐱 𝐱0 of SMLD
𝐱𝑖 = 𝐱𝑖−1 + 𝜎𝑖
2
− 𝜎𝑖−1
2
𝐳𝑖−1, 𝑖 = 1, … , 𝑁
• 𝐳𝑖−1 ∼ 𝒩 𝟎, 𝐈 , 𝜎0 = 0
• 𝑁 → ∞, 𝜎𝑖 𝑖=1
𝑁
= 𝜎 𝑡 , 𝐳𝑖 = 𝐳 𝑡
• The Markov chain 𝐱𝑖 𝑖=1
𝑁
becomes a continuous stochastic process 𝐱 𝑡 𝑡=0
1
with a continuous time variable 𝑡 ∈ 0,1
• The process 𝐱 𝑡 𝑡=0
1
is given by the following SDE
d𝐱 =
d 𝜎2 𝑡
d𝑡
d𝐰
Score-based Generative Modeling with SDEs
15
• Special Cases: VE and VP SDEs
• The noise perturbations used in SMLD and DDPM can be regarded as discretizations of two different SDEs
• Variance Exploding (VE) and Variance Preserving (VP) SDEs
• For the perturbation kernels 𝑝𝛼𝑖
𝐱 𝐱0 𝑖=1
𝑁
of DDPM, the discrete Markov chain is
𝐱𝑖 = 1 − 𝛽𝑖𝐱𝑖−1 + 𝛽𝑖𝐳𝑖−1, 𝑖 = 1, ⋯ , 𝑁.
• As 𝑁 → ∞,
d𝐱 = −
1
2
𝛽 𝑡 𝐱d𝑡 + 𝛽 𝑡 d𝐰
• SDE of SMLD : a process with exploding variance when 𝑡 → ∞ : Variance Exploding (VE) SDE
• SDE of DDPM : a process with bounded variance : Variance Preserving (VP) SDE
→ VE and VP SDEs have affine drift coefficients : perturbation kernels 𝑝0𝑡 𝐱 𝑡 𝐱 0 are Gaussian & in closed-form
Solving the Reverse SDE
16
• General-Purpose Numerical SDE Solvers
• Numerical solvers provide approximate trajectories from SDEs.
• Euler-Maruyama, stochastic Runge-Kutta methods
• Predictor : SMLD or DDPM → the reverse diffusion sampler
Solving the Reverse SDE
17
• Predictor-Corrector Samplers
• Score-based MCMC approaches (e.g., Langevin MCMC, HMC)
• Sample from 𝑝𝑡 directly, and correct the solution of a numerical SDE solver
• The numerical SDE (predictor) → score-based MCMC (corrector)
Solving the Reverse SDE
18
• Probability Flow and Equivalence to Neural ODEs
• A corresponding deterministic process
• Trajectories share the same marginal probability densities 𝑝𝑡 𝐱 𝑡=0
𝑇
d𝐱 = 𝐟 𝐱, 𝑡 −
1
2
𝑔 𝑡 2
∇𝐱 log 𝑝𝑡 𝐱 d𝑡
• The probability flow ODE
• Efficient sampling
• Sample 𝐱 0 ∼ 𝑝0 from different final conditions 𝐱 𝑇 ∼ 𝑝𝑇
• Generate competitive samples with a fixed discretization strategy
• The number of function evaluations can be reduced by over 90%
without affecting the visual quality of samples with a large error tolerance.
Solving the Reverse SDE
19
• Probability Flow and Equivalence to Neural ODEs
• Manipulating latent representations
• Encode any datapoint 𝐱 0 into a latent space 𝐱 𝑇
• Decoding can be achieved by integrating a corresponding ODE for the reverse-time SDE.
• Interpolation, temperature scaling
Controllable Generation
20
• The continuous structure
• Produce data samples from 𝑝0 & 𝑝0 𝐱 𝑡 𝐲 (if 𝑝𝑡 𝐲 𝐱 𝑡 is known)
• Sample from 𝑝𝑡 𝐱 𝑡 𝐲 by starting from 𝑝𝑇 𝐱 𝑇 𝐲 and solving a conditional reverse-time SDE
d𝐱 = 𝐟 𝐱 𝑡 − 𝑔 𝑡 2
∇𝐱 log 𝑝𝑡 𝐱 + ∇𝐱 log 𝑝𝑡 𝐲 𝐱 d𝑡 + 𝑔 𝑡 d ഥ
𝐰
• 𝐲 : class labels → train a time-dependent classifier 𝑝𝑡 𝐲 𝐱 𝑡 for class-conditional sampling
Discussion
21
• A framework for score-based generative modeling based on SDEs
• A better understanding of
• Existing approaches
• New sampling algorithms
• Exact likelihood computation
• Uniquely identifiable encoding
• Latent code manipulation
• Conditional generation
• Slower at sampling than GANs on the same datasets
• Combining the stable learning of score-based generative models with the fast sampling of implicit models like GANs
• A number of hyper-parameters
• Automatically select and tune these hyper-parameters
감 사 합 니 다
22

Weitere ähnliche Inhalte

Was ist angesagt?

変分推論と Normalizing Flow
変分推論と Normalizing Flow変分推論と Normalizing Flow
変分推論と Normalizing Flow
Akihiro Nitta
 

Was ist angesagt? (20)

[DL輪読会]ドメイン転移と不変表現に関するサーベイ
[DL輪読会]ドメイン転移と不変表現に関するサーベイ[DL輪読会]ドメイン転移と不変表現に関するサーベイ
[DL輪読会]ドメイン転移と不変表現に関するサーベイ
 
【DL輪読会】Flow Matching for Generative Modeling
【DL輪読会】Flow Matching for Generative Modeling【DL輪読会】Flow Matching for Generative Modeling
【DL輪読会】Flow Matching for Generative Modeling
 
Randomized smoothing
Randomized smoothingRandomized smoothing
Randomized smoothing
 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
 
Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recog...
Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recog...Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recog...
Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recog...
 
【DL輪読会】Efficiently Modeling Long Sequences with Structured State Spaces
【DL輪読会】Efficiently Modeling Long Sequences with Structured State Spaces【DL輪読会】Efficiently Modeling Long Sequences with Structured State Spaces
【DL輪読会】Efficiently Modeling Long Sequences with Structured State Spaces
 
GANの概要とDCGANのアーキテクチャ/アルゴリズム
GANの概要とDCGANのアーキテクチャ/アルゴリズムGANの概要とDCGANのアーキテクチャ/アルゴリズム
GANの概要とDCGANのアーキテクチャ/アルゴリズム
 
【DL輪読会】マルチモーダル 基盤モデル
【DL輪読会】マルチモーダル 基盤モデル【DL輪読会】マルチモーダル 基盤モデル
【DL輪読会】マルチモーダル 基盤モデル
 
Generative Models(メタサーベイ )
Generative Models(メタサーベイ )Generative Models(メタサーベイ )
Generative Models(メタサーベイ )
 
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAIGenerative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
 
[DL輪読会]Geometric Unsupervised Domain Adaptation for Semantic Segmentation
[DL輪読会]Geometric Unsupervised Domain Adaptation for Semantic Segmentation[DL輪読会]Geometric Unsupervised Domain Adaptation for Semantic Segmentation
[DL輪読会]Geometric Unsupervised Domain Adaptation for Semantic Segmentation
 
変分推論と Normalizing Flow
変分推論と Normalizing Flow変分推論と Normalizing Flow
変分推論と Normalizing Flow
 
Semi supervised, weakly-supervised, unsupervised, and active learning
Semi supervised, weakly-supervised, unsupervised, and active learningSemi supervised, weakly-supervised, unsupervised, and active learning
Semi supervised, weakly-supervised, unsupervised, and active learning
 
Score-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential EquationsScore-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential Equations
 
DNNの曖昧性に関する研究動向
DNNの曖昧性に関する研究動向DNNの曖昧性に関する研究動向
DNNの曖昧性に関する研究動向
 
[DL輪読会]GENESIS: Generative Scene Inference and Sampling with Object-Centric L...
[DL輪読会]GENESIS: Generative Scene Inference and Sampling with Object-Centric L...[DL輪読会]GENESIS: Generative Scene Inference and Sampling with Object-Centric L...
[DL輪読会]GENESIS: Generative Scene Inference and Sampling with Object-Centric L...
 
[DL輪読会]ICLR2020の分布外検知速報
[DL輪読会]ICLR2020の分布外検知速報[DL輪読会]ICLR2020の分布外検知速報
[DL輪読会]ICLR2020の分布外検知速報
 
[DL輪読会]Learning Latent Dynamics for Planning from Pixels
[DL輪読会]Learning Latent Dynamics for Planning from Pixels[DL輪読会]Learning Latent Dynamics for Planning from Pixels
[DL輪読会]Learning Latent Dynamics for Planning from Pixels
 
Noisy Labels と戦う深層学習
Noisy Labels と戦う深層学習Noisy Labels と戦う深層学習
Noisy Labels と戦う深層学習
 
[DL Hacks]Semi-Supervised Classification with Graph Convolutional Networks
[DL Hacks]Semi-Supervised Classification with Graph Convolutional Networks[DL Hacks]Semi-Supervised Classification with Graph Convolutional Networks
[DL Hacks]Semi-Supervised Classification with Graph Convolutional Networks
 

Ähnlich wie Score based Generative Modeling through Stochastic Differential Equations

Compressed learning for time series classification
Compressed learning for time series classificationCompressed learning for time series classification
Compressed learning for time series classification
學翰 施
 
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
eXascale Infolab
 
Adversarial Reinforced Learning for Unsupervised Domain Adaptation
Adversarial Reinforced Learning for Unsupervised Domain AdaptationAdversarial Reinforced Learning for Unsupervised Domain Adaptation
Adversarial Reinforced Learning for Unsupervised Domain Adaptation
taeseon ryu
 
generalized_nbody_acs_2015_challacombe
generalized_nbody_acs_2015_challacombegeneralized_nbody_acs_2015_challacombe
generalized_nbody_acs_2015_challacombe
Matt Challacombe
 

Ähnlich wie Score based Generative Modeling through Stochastic Differential Equations (20)

Paper study: Attention, learn to solve routing problems!
Paper study: Attention, learn to solve routing problems!Paper study: Attention, learn to solve routing problems!
Paper study: Attention, learn to solve routing problems!
 
Exploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningExploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation Learning
 
Paper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satPaper study: Learning to solve circuit sat
Paper study: Learning to solve circuit sat
 
Slides for "Do Deep Generative Models Know What They Don't know?"
Slides for "Do Deep Generative Models Know What They Don't know?"Slides for "Do Deep Generative Models Know What They Don't know?"
Slides for "Do Deep Generative Models Know What They Don't know?"
 
Efficient anomaly detection via matrix sketching
Efficient anomaly detection via matrix sketchingEfficient anomaly detection via matrix sketching
Efficient anomaly detection via matrix sketching
 
PR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation LearningPR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation Learning
 
Compressed learning for time series classification
Compressed learning for time series classificationCompressed learning for time series classification
Compressed learning for time series classification
 
Distributional RL via Moment Matching
Distributional RL via Moment MatchingDistributional RL via Moment Matching
Distributional RL via Moment Matching
 
Training DNN Models - II.pptx
Training DNN Models - II.pptxTraining DNN Models - II.pptx
Training DNN Models - II.pptx
 
ZK Study Club: Supernova (Srinath Setty - MS Research)
ZK Study Club: Supernova (Srinath Setty - MS Research)ZK Study Club: Supernova (Srinath Setty - MS Research)
ZK Study Club: Supernova (Srinath Setty - MS Research)
 
Deep Learning Theory Seminar (Chap 3, part 2)
Deep Learning Theory Seminar (Chap 3, part 2)Deep Learning Theory Seminar (Chap 3, part 2)
Deep Learning Theory Seminar (Chap 3, part 2)
 
Distance-based bias in model-directed optimization of additively decomposable...
Distance-based bias in model-directed optimization of additively decomposable...Distance-based bias in model-directed optimization of additively decomposable...
Distance-based bias in model-directed optimization of additively decomposable...
 
Neural Networks. Overview
Neural Networks. OverviewNeural Networks. Overview
Neural Networks. Overview
 
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
 
Adversarial Reinforced Learning for Unsupervised Domain Adaptation
Adversarial Reinforced Learning for Unsupervised Domain AdaptationAdversarial Reinforced Learning for Unsupervised Domain Adaptation
Adversarial Reinforced Learning for Unsupervised Domain Adaptation
 
Continuous control with deep reinforcement learning (DDPG)
Continuous control with deep reinforcement learning (DDPG)Continuous control with deep reinforcement learning (DDPG)
Continuous control with deep reinforcement learning (DDPG)
 
Vector-based, Structure Preserving Stroke Gesture Recognition
Vector-based, Structure Preserving Stroke Gesture RecognitionVector-based, Structure Preserving Stroke Gesture Recognition
Vector-based, Structure Preserving Stroke Gesture Recognition
 
Temporal Superpixels Based on Proximity-Weighted Patch Matching
Temporal Superpixels Based on Proximity-Weighted Patch MatchingTemporal Superpixels Based on Proximity-Weighted Patch Matching
Temporal Superpixels Based on Proximity-Weighted Patch Matching
 
generalized_nbody_acs_2015_challacombe
generalized_nbody_acs_2015_challacombegeneralized_nbody_acs_2015_challacombe
generalized_nbody_acs_2015_challacombe
 
Sharp Characterization of Optimal Minibatch Size for Stochastic Finite Sum Co...
Sharp Characterization of Optimal Minibatch Size for Stochastic Finite Sum Co...Sharp Characterization of Optimal Minibatch Size for Stochastic Finite Sum Co...
Sharp Characterization of Optimal Minibatch Size for Stochastic Finite Sum Co...
 

Mehr von Sungchul Kim

Mehr von Sungchul Kim (20)

PR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision
PR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo SupervisionPR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision
PR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision
 
Revisiting the Calibration of Modern Neural Networks
Revisiting the Calibration of Modern Neural NetworksRevisiting the Calibration of Modern Neural Networks
Revisiting the Calibration of Modern Neural Networks
 
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersEmerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
 
Revisiting the Sibling Head in Object Detector
Revisiting the Sibling Head in Object DetectorRevisiting the Sibling Head in Object Detector
Revisiting the Sibling Head in Object Detector
 
Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...
Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...
Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...
 
Deeplabv1, v2, v3, v3+
Deeplabv1, v2, v3, v3+Deeplabv1, v2, v3, v3+
Deeplabv1, v2, v3, v3+
 
Going Deeper with Convolutions
Going Deeper with ConvolutionsGoing Deeper with Convolutions
Going Deeper with Convolutions
 
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based LocalizationGrad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
 
Generalized Intersection over Union: A Metric and A Loss for Bounding Box Reg...
Generalized Intersection over Union: A Metric and A Loss for Bounding Box Reg...Generalized Intersection over Union: A Metric and A Loss for Bounding Box Reg...
Generalized Intersection over Union: A Metric and A Loss for Bounding Box Reg...
 
Panoptic Segmentation
Panoptic SegmentationPanoptic Segmentation
Panoptic Segmentation
 
On the Variance of the Adaptive Learning Rate and Beyond
On the Variance of the Adaptive Learning Rate and BeyondOn the Variance of the Adaptive Learning Rate and Beyond
On the Variance of the Adaptive Learning Rate and Beyond
 
A Benchmark for Interpretability Methods in Deep Neural Networks
A Benchmark for Interpretability Methods in Deep Neural NetworksA Benchmark for Interpretability Methods in Deep Neural Networks
A Benchmark for Interpretability Methods in Deep Neural Networks
 
KDGAN: Knowledge Distillation with Generative Adversarial Networks
KDGAN: Knowledge Distillation with Generative Adversarial NetworksKDGAN: Knowledge Distillation with Generative Adversarial Networks
KDGAN: Knowledge Distillation with Generative Adversarial Networks
 
Designing Network Design Spaces
Designing Network Design SpacesDesigning Network Design Spaces
Designing Network Design Spaces
 
Search to Distill: Pearls are Everywhere but not the Eyes
Search to Distill: Pearls are Everywhere but not the EyesSearch to Distill: Pearls are Everywhere but not the Eyes
Search to Distill: Pearls are Everywhere but not the Eyes
 
Supervised Constrastive Learning
Supervised Constrastive LearningSupervised Constrastive Learning
Supervised Constrastive Learning
 
Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning
Bootstrap Your Own Latent: A New Approach to Self-Supervised LearningBootstrap Your Own Latent: A New Approach to Self-Supervised Learning
Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning
 
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...
 
Regularizing Class-wise Predictions via Self-knowledge Distillation
Regularizing Class-wise Predictions via Self-knowledge DistillationRegularizing Class-wise Predictions via Self-knowledge Distillation
Regularizing Class-wise Predictions via Self-knowledge Distillation
 
FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence
FixMatch: Simplifying Semi-Supervised Learning with Consistency and ConfidenceFixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence
FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence
 

Kürzlich hochgeladen

Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
jaanualu31
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
Epec Engineered Technologies
 

Kürzlich hochgeladen (20)

Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech Civil
 
Wadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptxWadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptx
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planes
 
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxOrlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 

Score based Generative Modeling through Stochastic Differential Equations

  • 2. Contents 1. Introduction 2. Background 3. Score-based generative modeling with SDEs 4. Solving the Reverse SDE 5. Controllable Generation 6. Discussion 2
  • 3. Introduction 3 • Probabilistic generative models • Slowly increasing noise / removing noise • Score matching with Langevin dynamics (SMLD) • 각 noise scale에서 score (i.e., the gradient of the log probability density, ∇𝐱log 𝑝𝑡 𝐱 ) 계산 • Noise scale 감소에 따른 샘플에 Langevin dynamics 적용 • Denoising diffusion probabilistic modeling (DDPM) • Noise corruption을 거꾸로 하면서 학습 • 추적가능한 reverse distribution 사용 → score-based generative models • The practical performance of these two model classes is often quite different for reasons that are not fully understood.
  • 4. Introduction 4 • Stochastic Differential Equations (SDEs) • 유한한 noise distribution으로 perturbing하는 대신, diffusion하는 방식으로 시간이 지남에 따라 변화하는 a continuum of distributions 고려 • Data → random noise로 diffusion : SDE로 표현 • Random noise → data로 sample generation : a reverse-time SDE
  • 5. Introduction 5 • Contributions • Flexible sampling • Reverse-time SDE를 위한 general-purpose SDE • Predictor-Corrector (PC) : score-based MCMC로 numerical SDE solvers를 결합 • Deterministic samplers : the probability flow ordinary differential equations (ODE) • Controllable generation • 학습 이후 조건을 걸어서 생성과정에 변화 • Conditional reverse-time SDE는 unconditional score로부터 효율적으로 계산됨 • Class-conditional generation, image inpainting, colorization • Unified picture • SMLD와 DDPM은 다른 SDE의 discretization으로 통합 가능 • SDE 자체로 이익이지만, 다른 디자인과 결합할 수 있음
  • 6. Background 6 • Denoising Score Matching with Langevin Dynamics (SMLD) • Notation • 𝑝𝜎 ෤ 𝐱 𝐱 ≜ 𝒩 ෤ 𝐱; 𝐱, 𝜎2 I : a perturbation kernel • 𝑝𝜎 ෤ 𝐱 ≜ ∫ 𝑝data 𝐱 𝑝𝜎 ෤ 𝐱 𝐱 d𝐱 (𝑝data : the data distribution) • 𝜎min = 𝜎1 < 𝜎2 < ⋯ < 𝜎𝑁 = 𝜎max : a sequence of positive noise scales • 𝑝𝜎min 𝐱 ≈ 𝑝data 𝐱 / 𝑝𝜎m𝑎𝑥 𝐱 ≈ 𝒩 x; 𝟎, 𝜎max 2 𝐈 • Noise Conditional Score Network (NCSN) : 𝒔𝜃 𝐱, 𝜎 • A weighted sum of denoising score matching 𝜃∗ = argmin𝜃 ෍ 𝑖=1 𝑁 𝜎𝑖 2 𝔼𝑝data 𝐱 𝔼𝑝𝜎𝑖 ෤ 𝐱 𝐱 𝐬𝜃 ෤ 𝐱, 𝜎𝑖 − ∇෤ 𝐱 log𝑝𝜎𝑖 ෤ 𝐱 𝐱 2 2 • 충분한 데이터와 모델 용량이 주어지면 optimal score-based model 𝐬𝜃∗ 𝐱, 𝜎 은 𝜎 ∈ 𝜎𝑖 𝑖=1 𝑁 에서 ∇𝐱 log𝑝𝜎 𝐱 와 같아짐
  • 7. Background 7 • Denoising Score Matching with Langevin Dynamics (SMLD) • Noise Conditional Score Network (NCSN) : 𝒔𝜃 𝐱, 𝜎 • A weighted sum of denoising score matching 𝜃∗ = argmin𝜃 ෍ 𝑖=1 𝑁 𝜎𝑖 2 𝔼𝑝data 𝐱 𝔼𝑝𝜎𝑖 ෤ 𝐱 𝐱 𝐬𝜃 ෤ 𝐱, 𝜎𝑖 − ∇෤ x log 𝑝𝜎𝑖 ෤ 𝐱 𝐱 2 2 • 충분한 데이터와 모델 용량이 주어지면 optimal score-based model 𝐬𝜃∗ 𝐱, 𝜎 은 𝜎 ∈ 𝜎𝑖 𝑖=1 𝑁 에서 ∇𝐱 log𝑝𝜎 𝐱 와 같아짐 • 각 𝑝𝜎𝑖 x 에서 샘플을 얻기 위해 Langevin MCMC를 𝑀 step 수행 𝐱𝑖 𝑚 = 𝐱𝑖 𝑚−1 + 𝜖𝑖𝐬𝜃∗ 𝐱𝑖 𝑚−1 , 𝜎𝑖 + 2𝜖𝑖𝐳𝑖 𝑚 • 𝜖𝑖 > 0 : the step size / 𝐳𝑖 𝑚 : standard normal • 𝑖 = 𝑁, 𝑁 − 1, … , 1을 반복 (𝐱𝑁 0 ≈ 𝒩 𝐱 𝟎, 𝜎max 2 𝐈 , 𝐱𝑖 0 = 𝐱𝑖+1 𝑀 when 𝑖 < 𝑁) • 𝑀 → ∞ and 𝜖𝑖 → 0 for all 𝑖, 𝐱1 𝑁 은 𝑝𝜎min 𝐱 ≈ 𝑝data 𝐱 의 샘플
  • 8. Background 8 • Denoising Diffusion Probabilistic Models (DDPM) • A sequence of positive noise scales 0 < 𝛽1, 𝛽2, … , 𝛽𝑁 < 1 • A discrete Markov chain 𝐱0, 𝐱1, … , 𝐱𝑁 • 𝑝 𝐱𝑖 𝐱𝑖−1 = 𝒩 𝐱𝑖; 1 − 𝛽𝑖𝐱𝑖−1, 𝛽𝑖𝐈 → 𝑝𝛼𝑖 𝐱𝑖 𝐱0 = 𝒩 𝐱𝑖; 𝛼𝑖𝐱0, 1 − 𝛼𝑖 𝐈 (𝛼𝑖 ≜ Π𝑗=1 𝑖 1 − 𝛽𝑗 ) • The perturbed data distribution : 𝑝𝛼𝑖 ෤ x ≜ ∫ 𝑝data 𝐱 𝑝𝛼𝑖 ෤ 𝐱 𝐱 dx • A variational Markov chain in the reverse direction : 𝑝𝜃 𝐱𝑖−1 𝐱 = 𝒩 𝐱𝑖−1; 1 1−𝛽𝑖 𝐱𝑖 + 𝛽𝑖𝐬𝜃 𝐱𝑖, 𝑖 , 𝛽𝑖𝐈 • A re-weighted variant of the evidence lower bound (ELBO) 𝜃∗ = argmin𝜃 ෍ 𝑖=1 𝑁 1 − 𝛼𝑖 𝔼𝑝data 𝐱 𝔼𝑝𝛼𝑖 ෤ 𝐱 𝐱 𝐬𝜃 ෤ 𝐱, 𝑖 − ∇෤ 𝐱 log 𝑝𝛼𝑖 ෤ 𝐱 𝐱 2 2
  • 9. Background 9 • Denoising Diffusion Probabilistic Models (DDPM) • The estimated reverse Markov chain (ancestral sampling) 𝐱𝑖−1 = 1 1 − 𝛽𝑖 𝐱𝑖 + 𝛽𝑖𝑠𝜃∗ 𝐱𝑖, 𝑖 + 𝛽𝑖𝐳𝑖, 𝑖 = 𝑁, 𝑁 − 1, … , 1 • 𝜎𝑖 2 ∝ 1/𝔼 ∇𝐱 log 𝑝𝜎𝑖 ෤ 𝐱 𝐱 2 2 • 1 − 𝛼𝑖 ∝ 1/𝔼 ∇𝐱 log𝑝𝜎𝑖 ෤ 𝐱 𝐱 2 2
  • 10. Score-based Generative Modeling with SDEs 10 • Contribution → generalize these ideas further to an infinite number of noise scales
  • 11. Score-based Generative Modeling with SDEs 11 • Perturbing Data with SDEs • 목표 : a diffusion process 𝐱 𝑡 𝑡=0 𝑇 indexed by a continuous time variable 𝑡 ∈ 0, 𝑇 • x 0 ∼ 𝑝0 for which we have a dataset of i.i.d. samples (𝑝0 : the data distribution) • x 𝑇 ∼ 𝑝𝑇 for which we have a tractable form to generate samples efficiently (𝑝𝑇 : the prior distribution) d𝐱 = 𝐟 𝐱, 𝑡 d𝑡 + 𝑔 𝑡 d𝐰 • 𝐱 0 ∼ 𝑝0 : the data distribution / 𝐱 𝑇 ∼ 𝑝𝑇 : the prior distribution • 𝐰 : the standard Wiener process (a.k.a., Brownian motion) • 𝐟 ⋅, 𝑡 ∶ ℝ𝑑 → ℝ𝑑 : a vector-valued function called the drift coefficient of 𝐱 𝑡 • 𝑔 ⋅ ∶ ℝ → ℝ : a scalar function known as the diffusion coefficient of 𝐱 𝑡 • 쉬운 계산을 위해 scalar로 가정 (𝑑 × 𝑑 matrix) • State와 time이 전체적으로 Lipschitz일 때, SDE는 unique strong solution을 가짐 • 𝑝𝑡 𝐱 : the probability density of 𝐱 𝑡 • 𝑝𝑠𝑡 𝐱 𝑡 𝐱 𝑠 : the transition kernel from 𝐱 𝑠 to 𝐱 𝑡 (where 0 ≤ 𝑠 < 𝑡 ≤ 𝑇)
  • 12. Score-based Generative Modeling with SDEs 12 • Generating Samples by Reversing the SDE • 𝐱 𝑇 ∼ 𝑝𝑇의 샘플을 시작으로 reverse → 𝐱 0 ∼ 𝑝0 생성 d𝐱 = 𝐟 𝐱, 𝑡 − 𝑔 𝑡 2 ∇𝐱log 𝑝𝑡 𝐱 d𝑡 + 𝑔 𝑡 d ഥ 𝐰 • ഥ 𝐰 : a standard Wiener process when time flows backwards from 𝑇 from 0 • d𝑡 : an infinitesimal negative timestep • 모든 𝑡에 대해 marginal distribution ∇𝐱 log 𝑝𝑡 𝐱 가 주어지면 → reverse diffusion process
  • 13. Score-based Generative Modeling with SDEs 13 • Estimating Scores for the SDE • The score of a distribution can be estimated by training a score-based model on samples with score matching. • Train a time-dependent score-based model 𝐬𝜽 𝐱, 𝑡 via a continuous generalization 𝜽∗ = argmin𝜃𝔼𝑡 𝜆 𝑡 𝔼𝐱 0 𝔼𝐱 𝑡 𝐬𝜽 𝐱 𝑡 , 𝑡 − ∇𝐱 𝑡 log𝑝0𝑡 𝐱 𝑡 𝐱 0 2 2 • 𝜆 ∶ 0, 𝑇 → ℝ+ : a positive weighting function • With sufficient data and model capacity, 𝐬𝜽∗ 𝐱, 𝑡 = ∇𝐱 log 𝑝𝑡 𝐱 for almost all 𝐱 and 𝑡 • 𝜆 ∝ 1/𝔼 ∇𝐱 𝑡 log 𝑝0𝑡 𝐱 𝑡 𝐱 0 2 2
  • 14. Score-based Generative Modeling with SDEs 14 • Special Cases: VE and VP SDEs • The noise perturbations used in SMLD and DDPM can be regarded as discretizations of two different SDEs • Variance Exploding (VE) and Variance Preserving (VP) SDEs • Each perturbation kernel 𝑝𝜎𝑖 𝐱 𝐱0 of SMLD 𝐱𝑖 = 𝐱𝑖−1 + 𝜎𝑖 2 − 𝜎𝑖−1 2 𝐳𝑖−1, 𝑖 = 1, … , 𝑁 • 𝐳𝑖−1 ∼ 𝒩 𝟎, 𝐈 , 𝜎0 = 0 • 𝑁 → ∞, 𝜎𝑖 𝑖=1 𝑁 = 𝜎 𝑡 , 𝐳𝑖 = 𝐳 𝑡 • The Markov chain 𝐱𝑖 𝑖=1 𝑁 becomes a continuous stochastic process 𝐱 𝑡 𝑡=0 1 with a continuous time variable 𝑡 ∈ 0,1 • The process 𝐱 𝑡 𝑡=0 1 is given by the following SDE d𝐱 = d 𝜎2 𝑡 d𝑡 d𝐰
  • 15. Score-based Generative Modeling with SDEs 15 • Special Cases: VE and VP SDEs • The noise perturbations used in SMLD and DDPM can be regarded as discretizations of two different SDEs • Variance Exploding (VE) and Variance Preserving (VP) SDEs • For the perturbation kernels 𝑝𝛼𝑖 𝐱 𝐱0 𝑖=1 𝑁 of DDPM, the discrete Markov chain is 𝐱𝑖 = 1 − 𝛽𝑖𝐱𝑖−1 + 𝛽𝑖𝐳𝑖−1, 𝑖 = 1, ⋯ , 𝑁. • As 𝑁 → ∞, d𝐱 = − 1 2 𝛽 𝑡 𝐱d𝑡 + 𝛽 𝑡 d𝐰 • SDE of SMLD : a process with exploding variance when 𝑡 → ∞ : Variance Exploding (VE) SDE • SDE of DDPM : a process with bounded variance : Variance Preserving (VP) SDE → VE and VP SDEs have affine drift coefficients : perturbation kernels 𝑝0𝑡 𝐱 𝑡 𝐱 0 are Gaussian & in closed-form
  • 16. Solving the Reverse SDE 16 • General-Purpose Numerical SDE Solvers • Numerical solvers provide approximate trajectories from SDEs. • Euler-Maruyama, stochastic Runge-Kutta methods • Predictor : SMLD or DDPM → the reverse diffusion sampler
  • 17. Solving the Reverse SDE 17 • Predictor-Corrector Samplers • Score-based MCMC approaches (e.g., Langevin MCMC, HMC) • Sample from 𝑝𝑡 directly, and correct the solution of a numerical SDE solver • The numerical SDE (predictor) → score-based MCMC (corrector)
  • 18. Solving the Reverse SDE 18 • Probability Flow and Equivalence to Neural ODEs • A corresponding deterministic process • Trajectories share the same marginal probability densities 𝑝𝑡 𝐱 𝑡=0 𝑇 d𝐱 = 𝐟 𝐱, 𝑡 − 1 2 𝑔 𝑡 2 ∇𝐱 log 𝑝𝑡 𝐱 d𝑡 • The probability flow ODE • Efficient sampling • Sample 𝐱 0 ∼ 𝑝0 from different final conditions 𝐱 𝑇 ∼ 𝑝𝑇 • Generate competitive samples with a fixed discretization strategy • The number of function evaluations can be reduced by over 90% without affecting the visual quality of samples with a large error tolerance.
  • 19. Solving the Reverse SDE 19 • Probability Flow and Equivalence to Neural ODEs • Manipulating latent representations • Encode any datapoint 𝐱 0 into a latent space 𝐱 𝑇 • Decoding can be achieved by integrating a corresponding ODE for the reverse-time SDE. • Interpolation, temperature scaling
  • 20. Controllable Generation 20 • The continuous structure • Produce data samples from 𝑝0 & 𝑝0 𝐱 𝑡 𝐲 (if 𝑝𝑡 𝐲 𝐱 𝑡 is known) • Sample from 𝑝𝑡 𝐱 𝑡 𝐲 by starting from 𝑝𝑇 𝐱 𝑇 𝐲 and solving a conditional reverse-time SDE d𝐱 = 𝐟 𝐱 𝑡 − 𝑔 𝑡 2 ∇𝐱 log 𝑝𝑡 𝐱 + ∇𝐱 log 𝑝𝑡 𝐲 𝐱 d𝑡 + 𝑔 𝑡 d ഥ 𝐰 • 𝐲 : class labels → train a time-dependent classifier 𝑝𝑡 𝐲 𝐱 𝑡 for class-conditional sampling
  • 21. Discussion 21 • A framework for score-based generative modeling based on SDEs • A better understanding of • Existing approaches • New sampling algorithms • Exact likelihood computation • Uniquely identifiable encoding • Latent code manipulation • Conditional generation • Slower at sampling than GANs on the same datasets • Combining the stable learning of score-based generative models with the fast sampling of implicit models like GANs • A number of hyper-parameters • Automatically select and tune these hyper-parameters
  • 22. 감 사 합 니 다 22