SlideShare ist ein Scribd-Unternehmen logo
1 von 82
Downloaden Sie, um offline zu lesen
Santiago Pascual de la Puente
santi.pascual@upc.edu
PhD Candidate
Universitat Politecnica de Catalunya
Technical University of Catalonia
Deep Generative Models III:
PixelCNN/Wavenet
Normalizing Flows
#DLUPC
http://bit.ly/dlai2018
Outline
● What is in here?
● Introduction
● Taxonomy
● Variational Auto-Encoders (VAEs) (DGMs I)
● Generative Adversarial Networks (GANs) (DGMs II)
● PixelCNN/Wavenet (DGMs III)
● Normalizing Flows (and flow-based gen. models) (DGMs III)
○ Real NVP
○ GLOW
● Models comparison
● Conclusions
2
Previously in DGMs...
What is a generative model?
4
We have datapoints that emerge from some
generating process (landscapes in the nature,
speech from people, etc.)
X = {x1
, x2
, …, xN
}
Having our dataset X with example datapoints
(images, waveforms, written text, etc.) each point xn
lays in some M-dimensional space.
Each point xn
comes from an M-dimensional
probability distribution P(X) → Model it!
5
We can generate any type of data, like speech waveform samples or image pixels.
What is a generative model?
M samples = M dimensions
32
32
.
.
.
Scan and unroll
x3 channels
M = 32x32x3 = 3072 dimensions
Our learned model should be able to make up new samples from the
distribution, not just copy and paste existing samples!
6
What is a generative model?
Figure from NIPS 2016 Tutorial: Generative Adversarial Networks (I. Goodfellow)
7
We will see models that learn the probability density function:
● Explicitly (we impose a known loss function)
○ (1) With approximate density → Variational
Auto-Encoders (VAEs)
○ (2) With tractable density & likelihood-based:
■ PixelCNN, Wavenet.
■ Flow-based models: real-NVP, GLOW.
● Implicitly (we “do not know” the loss function)
○ (3) Generative Adversarial Networks (GANs).
Taxonomy
Variational Auto-Encoder
We finally reach the Variational AutoEncoder objective function.
log likelihood of our data
Not computable and non-negative (KL)
approximation error.
Reconstruction loss of our data
given latent space → NEURAL
DECODER reconstruction loss!
Regularization of our latent
representation → NEURAL
ENCODER projects over prior.
8Credit: Kristiadi
Variational Auto-Encoder
z
Reparameterization trick
Sample and operate with it, multiplying by and summing
9
Variational Auto-Encoder
z1
Generative behavior
Q: How can we now generate new samples once the underlying generating distribution is learned?
A: We can sample from our prior, for example, discarding the encoder path.
z2
z3
10
GAN schematic
11
First things first: what is an encoder neural network? Some deep network that extracts high level
knowledge from our (potentially raw) data, and we can use that knowledge to discriminate among
classes, to score probabilistic outcomes, to extract features, etc...
...
...
...
...
ẍz
...
...
...
...
cx
Adversarial Learning = Let’s use the encoder’s abstraction to
learn whether the generator made a good job generating
something realistic ẍ = G(z) or not.
Generator
Real world
samples
Database
Discriminator
Real
Loss
Latentrandomvariable
Sample
Sample
Fake
12
z
D is a binary classifier, whose objectives
are: database images are Real, whereas
generated ones are Fake .
Generative + Adversarial
Conditional GANs
GANs can be conditioned on other info extra to z:
text, labels, speech, etc..
z might capture random characteristics of the data
(variabilities of plausible futures), whilst c would
condition the deterministic parts !
For details on ways to condition GANs:
Ways of Conditioning Generative
Adversarial Networks (Wack et al.)
Evolving GANs
Different loss functions in the Discriminator can fit your purpose:
● Binary cross-entropy (BCE): Vanilla GAN
● Least-Squares: LSGAN
● Wasserstein GAN + Gradient Penalty: WGAN-GP
● BEGAN: Energy based (D as an autoencoder) with boundary equilibrium.
● Maximum margin: Improved binary classification
Likelihood-based Models
PixelRNN, PixelCNN
& Wavenet
Factorizing the joint distribution
● Model explicitly the joint probability distribution of data streams x as a product of
element-wise conditional distributions for each element xi
in the stream.
○ Example: An image x of size (n, n) is decomposed scanning pixels in raster mode
(Row by row and pixel by pixel within every row)
○ Apply probability chain rule: xi
is the i-th pixel in the image.
17
Factorizing the joint distribution
● To model highly nonlinear and long-range correlations between pixels and their conditional
distributions, we will need a powerful non-linear sequential model.
○ Q: How can we deal with this (which model)?
18
Factorizing the joint distribution
● To model highly nonlinear and long-range correlations between pixels and their conditional
distributions, we will need a powerful non-linear sequential model.
○ Q: How can we deal with this (which model)?
■ A: Recurrent Neural Network.
19
PixelRNN
An RNN predicts the probability of each sample xi
with a categorical output distribution: Softmax
RNN/LSTM
xi-1
xi
xi-2
xi-3
xi-4
(x, y)
At every pixel, in a greyscale image:
256 classes (8 bits/pixel).
20
Recall RNN formulation:
(van den Oord et al. 2016)
Factorizing the joint distribution
● To model highly nonlinear and long-range correlations between pixels and their conditional
distributions, we will need a powerful non-linear sequential model.
○ Q: How can we deal with this (which model)?
■ A: Recurrent Neural Network.
■ B: Convolutional Neural Network.
21(van den Oord et al. 2016)²
Factorizing the joint distribution
● To model highly nonlinear and long-range correlations between pixels and their conditional
distributions, we will need a powerful non-linear sequential model.
○ Q: How can we deal with this (which model)?
■ A: Recurrent Neural Network.
■ B: Convolutional Neural Network.
A CNN?
Whut??
22
My CNN is going causal...
hi there how are you doing ?
100 dim
Let’s say we have a sequence of 100 dimensional vectors describing words.
sequence length = 8
arrangein 2D
100
dim
sequence length = 8
,
Focus in 1D to
exemplify causalitaion
23
24
We can apply a 1D convolutional activation over the 2D matrix: for an arbitrary kernel of width=3
100
dim
sequence length = 8
K1
Each 1D convolutional kernel is a
2D matrix of size (3, 100)100
dim
My CNN is going causal...
25
Keep in mind we are working with 100 dimensions although here we depict just one for simplicity
1 2 3 4 5 6 7 8
w1 w2 w3
w1 w2 w3
w1 w2 w3
w1 w2 w3
w1 w2 w3
1 2 3 4 5 6
w1 w2 w3
The length result of the convolution is
well known to be:
seqlength - kwidth + 1 = 8 - 3 + 1 = 6
So the output matrix will be (6, 100)
because there was no padding
My CNN is going causal...
26
My CNN is going causal...
When we add zero padding, we normally do so on both sides of the sequence (as in image padding)
1 2 3 4 5 6 7 8
w1 w2 w3
w1 w2 w3
w1 w2 w3
w1 w2 w3
w1 w2 w3
1 2 3 4 5 6 7 8
w1 w2 w3
The length result of the convolution is
well known to be:
seqlength - kwidth + 1 = 10 - 3 + 1 = 8
So the output matrix will be (8, 100)
because we had padding
0 0
w1 w2 w3
w1 w2 w3
27
Add the zero padding just on the left side of the sequence, not symmetrically
1 2 3 4 5 6 7 8
w1 w2 w3
w1 w2 w3
w1 w2 w3
w1 w2 w3
w1 w2 w3
1 2 3 4 5 6 7 8
w1 w2 w3
The length result of the convolution is
well known to be:
seqlength - kwidth + 1 = 10 - 3 + 1 = 8
So the output matrix will be (8, 100)
because we had padding
HOWEVER: now every time-step t
depends on the two previous inputs as
well as the current time-step → every
output is causal
Roughly: We make a causal convolution
by padding left the sequence with
(kwidth - 1) zeros
0
w1 w2 w3
w1 w2 w3
0
My CNN went causal
PixelCNN/Wavenet
We can simply exemplify how causal convolutions help us with the SoTA model for audio
generation Wavenet. We have a one dimensional stream of samples of length T:
28
PixelCNN/Wavenet
We can simply exemplify how causal convolutions help us with the SoTA model for audio
generation Wavenet. We have a one dimensional stream of samples of length T (e.g. audio):
receptive field hast to be T 29(van den Oord et al. 2016)³
PixelCNN/Wavenet
We can simply exemplify how causal convolutions help us with the SoTA model for audio
generation Wavenet. We have a one dimensional stream of samples of length T:
receptive field hast to be T
Dilated convolutions help
us reach a receptive field
sufficiently large to
emulate an RNN!
30
Samples available online
Conditional PixelCNN/Wavenet
● Q: Can we make the generative model learn the natural distribution of X and perform a
specific task conditioned on it?
○ A: Yes! we can condition each sample to an embedding/feature vector h, which can
represent encoded text, or speaker identity, generation speed, etc.
31
Conditional PixelCNN/Wavenet
32
● PixelCNN produces very sharp and realistic samples, but...
● Q: Is this a cheap generative model computationally?
Conditional PixelCNN/Wavenet
● PixelCNN produces very sharp and realistic samples, but...
● Q: Is this a cheap generative model computationally?
○ A: Nope, take the example of wavenet: Forward 16.000 times in
auto-regressive way to predict 1 second of audio at standard 16 kHz
sampling.
33
Normalizing Flows
35
Normalizing Flows
36
Normalizing Flows
37
Normalizing Flows
38
Normalizing Flows
39
Normalizing Flows
40
Normalizing Flows
41
Normalizing Flows and the Effective Chained Composition
42
Normalizing Flows and the Effective Chained Composition
(Mohamed and Rezende 2017)
43
Normalizing Flows and the Effective Chained Composition
(Mohamed and Rezende 2017)
44
Normalizing Flows and the Effective Chained Composition
(Mohamed and Rezende 2017)
45
Planar Flow
46
Planar Flow
47
Planar Flow
We can apply this simple flow to an already seen model to enhance its representational
power though...
z
We still sample N(0, I) !!
48
Planar Flow
We can apply this simple flow to an already seen model to enhance its representational
power though...
z
We still sample N(0, I) !!
But if we had a bit more sophisticated flow formulation in some
form, we could directly build an end-to-end generative model!
49
Enhancing Normalizing Flows
Real-NVP
Real-Non Volume Preserving Flows
51
Real-Non Volume Preserving Flows
52
Real-Non Volume Preserving Flows
53
Real-Non Volume Preserving Flows
54
Copy
Affine coupling
Real-NVP Permutations
55
Copy
Affine coupling
Affine coupling
Copy
Copy
Affine coupling
GLOW
57
GLOW
58
GLOW
59
GLOW
Improvements over Real-NVP.
60
GLOW step of flow formulations
61
GLOW Multi-scale structure
62
GLOW Multi-scale structure
This is a very powerful structure: we can infer z features from an image and we can
sample an image from z features, EITHER WAY!
63
GLOW Multi-scale structure
Additionally, during training loss is computed in the simplistic Z space!
64
GLOW Results on CelebA
65
GLOW Results with temperature
66
GLOW Results interpolating Z
67
GLOW Semantic Manipulation
68
GLOW’s depth dependency
69
WaveGLOW (Prenger et al. 2018)
Modified structure to deal
with 1D audio data, giving
SoTA results without
auto-regressive restriction.
Audio samples available online
Models Comparison
71
Models comparison
72
Models comparison
Latent space → feature disentangle, we can
navigate and control better what we generate
73
Models comparison
No direct likelihood measurement, just an
approximation or qualitative insights.
74
Models comparison
Direct likelihood values, direct
objective “good fit” measurements.
75
Models comparison
Very deep models (many flow
steps), need much much memory
and compute power to get best
results.
76
Models comparison
Fast training.
77
Models comparison
Slow auto-regressive sampling.
78
Models comparison
One-shot predictions,
quickest to sample from.
Conclusions
80
Conclusions
● (Deep) Generative models are built to learn the underlying structures hidden in our (high-dimensional) data.
● There are currently four main successful deep generative models: VAE, GAN, pixelCNN and NF-based.
● PixelCNN factorizes an explicitly known discrete distribution with probability chain rule
○ They are the slowest generative models for their recursive nature.
● VAEs, GANs, and NFs are capable of factorizing our highly-complex PDFs into a simpler prior of our choice z,
which we can then explore and control to generate specific features.
● Once we trained VAEs with variational lower bound, we can generate new samples from our learned z manifold
assuming most significant data lays over the prior → hence we like powerful priors (NFs can help in this matter).
● GANs use an implicit learning method to mimic our data distribution Pdata.
● GANs and NFs are the sharpiest generators at the moment → very active research fields.
● GANs are the hardest generative models to train owing to intrinsic stabilities of the G-D synergies.
● NFs (Real-NVP based) require many steps to be effective in their task, consuming many resources at the
moment.
Thanks! Questions?
References
82
● Glow: generative flow with invertible 1x1 convolutions (Kingma and Dhariwal, 2018)
● Density Estimation Using Real-NVP (Dinh et al. 2016)
● Variational Inference With Normalizing Flows (Rezende and Mohamed, 2015)
● Paper Review on Glow (Pascual 2018)
● Normalizing Flows (Kosiorek 2016)
● Normalizing Flows Tutorial 1 (Jang 2016)
● Normalizing Flows Tutorial 2 (Jang 2016)

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
 
The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018
The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018
The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018
 
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
 
Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018
Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018
Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018
 
Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018
Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018
Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018
 
Intepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural NetworksIntepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural Networks
 
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
 
Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...
Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...
Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...
 
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
 
Backpropagation for Deep Learning
Backpropagation for Deep LearningBackpropagation for Deep Learning
Backpropagation for Deep Learning
 
Backpropagation for Neural Networks
Backpropagation for Neural NetworksBackpropagation for Neural Networks
Backpropagation for Neural Networks
 
Recurrent Neural Networks (DLAI D7L1 2017 UPC Deep Learning for Artificial In...
Recurrent Neural Networks (DLAI D7L1 2017 UPC Deep Learning for Artificial In...Recurrent Neural Networks (DLAI D7L1 2017 UPC Deep Learning for Artificial In...
Recurrent Neural Networks (DLAI D7L1 2017 UPC Deep Learning for Artificial In...
 
Deep Generative Models - Kevin McGuinness - UPC Barcelona 2018
Deep Generative Models - Kevin McGuinness - UPC Barcelona 2018Deep Generative Models - Kevin McGuinness - UPC Barcelona 2018
Deep Generative Models - Kevin McGuinness - UPC Barcelona 2018
 
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
 
The Perceptron (D1L1 Insight@DCU Machine Learning Workshop 2017)
The Perceptron (D1L1 Insight@DCU Machine Learning Workshop 2017)The Perceptron (D1L1 Insight@DCU Machine Learning Workshop 2017)
The Perceptron (D1L1 Insight@DCU Machine Learning Workshop 2017)
 
Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018
Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018
Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018
 
The Perceptron (D1L2 Deep Learning for Speech and Language)
The Perceptron (D1L2 Deep Learning for Speech and Language)The Perceptron (D1L2 Deep Learning for Speech and Language)
The Perceptron (D1L2 Deep Learning for Speech and Language)
 
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
 
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
 
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
 

Ähnlich wie PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018

Oleksandr Obiednikov “Affine transforms and how CNN lives with them”
Oleksandr Obiednikov “Affine transforms and how CNN lives with them”Oleksandr Obiednikov “Affine transforms and how CNN lives with them”
Oleksandr Obiednikov “Affine transforms and how CNN lives with them”
Lviv Startup Club
 
Dl1 deep learning_algorithms
Dl1 deep learning_algorithmsDl1 deep learning_algorithms
Dl1 deep learning_algorithms
Armando Vieira
 

Ähnlich wie PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018 (20)

Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...
Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...
Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...
 
AILABS - Lecture Series - Is AI the New Electricity? Topic:- Classification a...
AILABS - Lecture Series - Is AI the New Electricity? Topic:- Classification a...AILABS - Lecture Series - Is AI the New Electricity? Topic:- Classification a...
AILABS - Lecture Series - Is AI the New Electricity? Topic:- Classification a...
 
From RNN to neural networks for cyclic undirected graphs
From RNN to neural networks for cyclic undirected graphsFrom RNN to neural networks for cyclic undirected graphs
From RNN to neural networks for cyclic undirected graphs
 
Deep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryDeep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistry
 
convolutional_neural_networks in deep learning
convolutional_neural_networks in deep learningconvolutional_neural_networks in deep learning
convolutional_neural_networks in deep learning
 
Alberto Massidda - Scenes from a memory - Codemotion Rome 2019
Alberto Massidda - Scenes from a memory - Codemotion Rome 2019Alberto Massidda - Scenes from a memory - Codemotion Rome 2019
Alberto Massidda - Scenes from a memory - Codemotion Rome 2019
 
The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)
The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)
The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)
 
Neural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningNeural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learning
 
Learning multifractal structure in large networks (KDD 2014)
Learning multifractal structure in large networks (KDD 2014)Learning multifractal structure in large networks (KDD 2014)
Learning multifractal structure in large networks (KDD 2014)
 
Illustration Clamor Echelon Evaluation via Prime Piece Psychotherapy
Illustration Clamor Echelon Evaluation via Prime Piece PsychotherapyIllustration Clamor Echelon Evaluation via Prime Piece Psychotherapy
Illustration Clamor Echelon Evaluation via Prime Piece Psychotherapy
 
Neural networks
Neural networksNeural networks
Neural networks
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
 
Oleksandr Obiednikov “Affine transforms and how CNN lives with them”
Oleksandr Obiednikov “Affine transforms and how CNN lives with them”Oleksandr Obiednikov “Affine transforms and how CNN lives with them”
Oleksandr Obiednikov “Affine transforms and how CNN lives with them”
 
Deep learning (2)
Deep learning (2)Deep learning (2)
Deep learning (2)
 
Dl1 deep learning_algorithms
Dl1 deep learning_algorithmsDl1 deep learning_algorithms
Dl1 deep learning_algorithms
 
tutorial.ppt
tutorial.ppttutorial.ppt
tutorial.ppt
 
Java and Deep Learning
Java and Deep LearningJava and Deep Learning
Java and Deep Learning
 
Skip RNN: Learning to Skip State Updates in RNNs (ICLR 2018)
Skip RNN: Learning to Skip State Updates in RNNs (ICLR 2018)Skip RNN: Learning to Skip State Updates in RNNs (ICLR 2018)
Skip RNN: Learning to Skip State Updates in RNNs (ICLR 2018)
 
WaveNet
WaveNetWaveNet
WaveNet
 
M7 - Neural Networks in machine learning.pdf
M7 - Neural Networks in machine learning.pdfM7 - Neural Networks in machine learning.pdf
M7 - Neural Networks in machine learning.pdf
 

Mehr von Universitat Politècnica de Catalunya

Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Universitat Politècnica de Catalunya
 

Mehr von Universitat Politècnica de Catalunya (20)

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Deep Generative Learning for All
Deep Generative Learning for AllDeep Generative Learning for All
Deep Generative Learning for All
 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
 
The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
 
Open challenges in sign language translation and production
Open challenges in sign language translation and productionOpen challenges in sign language translation and production
Open challenges in sign language translation and production
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
 
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in MinecraftDiscovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in Minecraft
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
 
Curriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object SegmentationCurriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object Segmentation
 
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
 
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
 
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
 
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
 
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...
 
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC Barcelona
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC BarcelonaSelf-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC Barcelona
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC Barcelona
 
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN Barcelona
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN BarcelonaDeep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN Barcelona
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN Barcelona
 

Kürzlich hochgeladen

In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
vexqp
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Health
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
vexqp
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
ptikerjasaptiker
 

Kürzlich hochgeladen (20)

In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 

PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018

  • 1. Santiago Pascual de la Puente santi.pascual@upc.edu PhD Candidate Universitat Politecnica de Catalunya Technical University of Catalonia Deep Generative Models III: PixelCNN/Wavenet Normalizing Flows #DLUPC http://bit.ly/dlai2018
  • 2. Outline ● What is in here? ● Introduction ● Taxonomy ● Variational Auto-Encoders (VAEs) (DGMs I) ● Generative Adversarial Networks (GANs) (DGMs II) ● PixelCNN/Wavenet (DGMs III) ● Normalizing Flows (and flow-based gen. models) (DGMs III) ○ Real NVP ○ GLOW ● Models comparison ● Conclusions 2
  • 4. What is a generative model? 4 We have datapoints that emerge from some generating process (landscapes in the nature, speech from people, etc.) X = {x1 , x2 , …, xN } Having our dataset X with example datapoints (images, waveforms, written text, etc.) each point xn lays in some M-dimensional space. Each point xn comes from an M-dimensional probability distribution P(X) → Model it!
  • 5. 5 We can generate any type of data, like speech waveform samples or image pixels. What is a generative model? M samples = M dimensions 32 32 . . . Scan and unroll x3 channels M = 32x32x3 = 3072 dimensions
  • 6. Our learned model should be able to make up new samples from the distribution, not just copy and paste existing samples! 6 What is a generative model? Figure from NIPS 2016 Tutorial: Generative Adversarial Networks (I. Goodfellow)
  • 7. 7 We will see models that learn the probability density function: ● Explicitly (we impose a known loss function) ○ (1) With approximate density → Variational Auto-Encoders (VAEs) ○ (2) With tractable density & likelihood-based: ■ PixelCNN, Wavenet. ■ Flow-based models: real-NVP, GLOW. ● Implicitly (we “do not know” the loss function) ○ (3) Generative Adversarial Networks (GANs). Taxonomy
  • 8. Variational Auto-Encoder We finally reach the Variational AutoEncoder objective function. log likelihood of our data Not computable and non-negative (KL) approximation error. Reconstruction loss of our data given latent space → NEURAL DECODER reconstruction loss! Regularization of our latent representation → NEURAL ENCODER projects over prior. 8Credit: Kristiadi
  • 9. Variational Auto-Encoder z Reparameterization trick Sample and operate with it, multiplying by and summing 9
  • 10. Variational Auto-Encoder z1 Generative behavior Q: How can we now generate new samples once the underlying generating distribution is learned? A: We can sample from our prior, for example, discarding the encoder path. z2 z3 10
  • 11. GAN schematic 11 First things first: what is an encoder neural network? Some deep network that extracts high level knowledge from our (potentially raw) data, and we can use that knowledge to discriminate among classes, to score probabilistic outcomes, to extract features, etc... ... ... ... ... ẍz ... ... ... ... cx Adversarial Learning = Let’s use the encoder’s abstraction to learn whether the generator made a good job generating something realistic ẍ = G(z) or not.
  • 12. Generator Real world samples Database Discriminator Real Loss Latentrandomvariable Sample Sample Fake 12 z D is a binary classifier, whose objectives are: database images are Real, whereas generated ones are Fake . Generative + Adversarial
  • 13. Conditional GANs GANs can be conditioned on other info extra to z: text, labels, speech, etc.. z might capture random characteristics of the data (variabilities of plausible futures), whilst c would condition the deterministic parts ! For details on ways to condition GANs: Ways of Conditioning Generative Adversarial Networks (Wack et al.)
  • 14. Evolving GANs Different loss functions in the Discriminator can fit your purpose: ● Binary cross-entropy (BCE): Vanilla GAN ● Least-Squares: LSGAN ● Wasserstein GAN + Gradient Penalty: WGAN-GP ● BEGAN: Energy based (D as an autoencoder) with boundary equilibrium. ● Maximum margin: Improved binary classification
  • 17. Factorizing the joint distribution ● Model explicitly the joint probability distribution of data streams x as a product of element-wise conditional distributions for each element xi in the stream. ○ Example: An image x of size (n, n) is decomposed scanning pixels in raster mode (Row by row and pixel by pixel within every row) ○ Apply probability chain rule: xi is the i-th pixel in the image. 17
  • 18. Factorizing the joint distribution ● To model highly nonlinear and long-range correlations between pixels and their conditional distributions, we will need a powerful non-linear sequential model. ○ Q: How can we deal with this (which model)? 18
  • 19. Factorizing the joint distribution ● To model highly nonlinear and long-range correlations between pixels and their conditional distributions, we will need a powerful non-linear sequential model. ○ Q: How can we deal with this (which model)? ■ A: Recurrent Neural Network. 19
  • 20. PixelRNN An RNN predicts the probability of each sample xi with a categorical output distribution: Softmax RNN/LSTM xi-1 xi xi-2 xi-3 xi-4 (x, y) At every pixel, in a greyscale image: 256 classes (8 bits/pixel). 20 Recall RNN formulation: (van den Oord et al. 2016)
  • 21. Factorizing the joint distribution ● To model highly nonlinear and long-range correlations between pixels and their conditional distributions, we will need a powerful non-linear sequential model. ○ Q: How can we deal with this (which model)? ■ A: Recurrent Neural Network. ■ B: Convolutional Neural Network. 21(van den Oord et al. 2016)²
  • 22. Factorizing the joint distribution ● To model highly nonlinear and long-range correlations between pixels and their conditional distributions, we will need a powerful non-linear sequential model. ○ Q: How can we deal with this (which model)? ■ A: Recurrent Neural Network. ■ B: Convolutional Neural Network. A CNN? Whut?? 22
  • 23. My CNN is going causal... hi there how are you doing ? 100 dim Let’s say we have a sequence of 100 dimensional vectors describing words. sequence length = 8 arrangein 2D 100 dim sequence length = 8 , Focus in 1D to exemplify causalitaion 23
  • 24. 24 We can apply a 1D convolutional activation over the 2D matrix: for an arbitrary kernel of width=3 100 dim sequence length = 8 K1 Each 1D convolutional kernel is a 2D matrix of size (3, 100)100 dim My CNN is going causal...
  • 25. 25 Keep in mind we are working with 100 dimensions although here we depict just one for simplicity 1 2 3 4 5 6 7 8 w1 w2 w3 w1 w2 w3 w1 w2 w3 w1 w2 w3 w1 w2 w3 1 2 3 4 5 6 w1 w2 w3 The length result of the convolution is well known to be: seqlength - kwidth + 1 = 8 - 3 + 1 = 6 So the output matrix will be (6, 100) because there was no padding My CNN is going causal...
  • 26. 26 My CNN is going causal... When we add zero padding, we normally do so on both sides of the sequence (as in image padding) 1 2 3 4 5 6 7 8 w1 w2 w3 w1 w2 w3 w1 w2 w3 w1 w2 w3 w1 w2 w3 1 2 3 4 5 6 7 8 w1 w2 w3 The length result of the convolution is well known to be: seqlength - kwidth + 1 = 10 - 3 + 1 = 8 So the output matrix will be (8, 100) because we had padding 0 0 w1 w2 w3 w1 w2 w3
  • 27. 27 Add the zero padding just on the left side of the sequence, not symmetrically 1 2 3 4 5 6 7 8 w1 w2 w3 w1 w2 w3 w1 w2 w3 w1 w2 w3 w1 w2 w3 1 2 3 4 5 6 7 8 w1 w2 w3 The length result of the convolution is well known to be: seqlength - kwidth + 1 = 10 - 3 + 1 = 8 So the output matrix will be (8, 100) because we had padding HOWEVER: now every time-step t depends on the two previous inputs as well as the current time-step → every output is causal Roughly: We make a causal convolution by padding left the sequence with (kwidth - 1) zeros 0 w1 w2 w3 w1 w2 w3 0 My CNN went causal
  • 28. PixelCNN/Wavenet We can simply exemplify how causal convolutions help us with the SoTA model for audio generation Wavenet. We have a one dimensional stream of samples of length T: 28
  • 29. PixelCNN/Wavenet We can simply exemplify how causal convolutions help us with the SoTA model for audio generation Wavenet. We have a one dimensional stream of samples of length T (e.g. audio): receptive field hast to be T 29(van den Oord et al. 2016)³
  • 30. PixelCNN/Wavenet We can simply exemplify how causal convolutions help us with the SoTA model for audio generation Wavenet. We have a one dimensional stream of samples of length T: receptive field hast to be T Dilated convolutions help us reach a receptive field sufficiently large to emulate an RNN! 30 Samples available online
  • 31. Conditional PixelCNN/Wavenet ● Q: Can we make the generative model learn the natural distribution of X and perform a specific task conditioned on it? ○ A: Yes! we can condition each sample to an embedding/feature vector h, which can represent encoded text, or speaker identity, generation speed, etc. 31
  • 32. Conditional PixelCNN/Wavenet 32 ● PixelCNN produces very sharp and realistic samples, but... ● Q: Is this a cheap generative model computationally?
  • 33. Conditional PixelCNN/Wavenet ● PixelCNN produces very sharp and realistic samples, but... ● Q: Is this a cheap generative model computationally? ○ A: Nope, take the example of wavenet: Forward 16.000 times in auto-regressive way to predict 1 second of audio at standard 16 kHz sampling. 33
  • 41. 41 Normalizing Flows and the Effective Chained Composition
  • 42. 42 Normalizing Flows and the Effective Chained Composition (Mohamed and Rezende 2017)
  • 43. 43 Normalizing Flows and the Effective Chained Composition (Mohamed and Rezende 2017)
  • 44. 44 Normalizing Flows and the Effective Chained Composition (Mohamed and Rezende 2017)
  • 47. 47 Planar Flow We can apply this simple flow to an already seen model to enhance its representational power though... z We still sample N(0, I) !!
  • 48. 48 Planar Flow We can apply this simple flow to an already seen model to enhance its representational power though... z We still sample N(0, I) !! But if we had a bit more sophisticated flow formulation in some form, we could directly build an end-to-end generative model!
  • 54. Real-Non Volume Preserving Flows 54 Copy Affine coupling
  • 55. Real-NVP Permutations 55 Copy Affine coupling Affine coupling Copy Copy Affine coupling
  • 56. GLOW
  • 60. 60 GLOW step of flow formulations
  • 62. 62 GLOW Multi-scale structure This is a very powerful structure: we can infer z features from an image and we can sample an image from z features, EITHER WAY!
  • 63. 63 GLOW Multi-scale structure Additionally, during training loss is computed in the simplistic Z space!
  • 65. 65 GLOW Results with temperature
  • 69. 69 WaveGLOW (Prenger et al. 2018) Modified structure to deal with 1D audio data, giving SoTA results without auto-regressive restriction. Audio samples available online
  • 72. 72 Models comparison Latent space → feature disentangle, we can navigate and control better what we generate
  • 73. 73 Models comparison No direct likelihood measurement, just an approximation or qualitative insights.
  • 74. 74 Models comparison Direct likelihood values, direct objective “good fit” measurements.
  • 75. 75 Models comparison Very deep models (many flow steps), need much much memory and compute power to get best results.
  • 80. 80 Conclusions ● (Deep) Generative models are built to learn the underlying structures hidden in our (high-dimensional) data. ● There are currently four main successful deep generative models: VAE, GAN, pixelCNN and NF-based. ● PixelCNN factorizes an explicitly known discrete distribution with probability chain rule ○ They are the slowest generative models for their recursive nature. ● VAEs, GANs, and NFs are capable of factorizing our highly-complex PDFs into a simpler prior of our choice z, which we can then explore and control to generate specific features. ● Once we trained VAEs with variational lower bound, we can generate new samples from our learned z manifold assuming most significant data lays over the prior → hence we like powerful priors (NFs can help in this matter). ● GANs use an implicit learning method to mimic our data distribution Pdata. ● GANs and NFs are the sharpiest generators at the moment → very active research fields. ● GANs are the hardest generative models to train owing to intrinsic stabilities of the G-D synergies. ● NFs (Real-NVP based) require many steps to be effective in their task, consuming many resources at the moment.
  • 82. References 82 ● Glow: generative flow with invertible 1x1 convolutions (Kingma and Dhariwal, 2018) ● Density Estimation Using Real-NVP (Dinh et al. 2016) ● Variational Inference With Normalizing Flows (Rezende and Mohamed, 2015) ● Paper Review on Glow (Pascual 2018) ● Normalizing Flows (Kosiorek 2016) ● Normalizing Flows Tutorial 1 (Jang 2016) ● Normalizing Flows Tutorial 2 (Jang 2016)