SlideShare ist ein Scribd-Unternehmen logo
1 von 140
Downloaden Sie, um offline zu lesen
Theory and Application
of Generative
Adversarial Networks
Ming-Yu Liu, Julie Bernauer, Jan Kautz

NVIDIA
1
Tutorial schedule
• 1:30 - 2:50 Part 1

• 2:50 - 3:20 Demo

• 3:30 - 4:00 Coffee Break

• 4:00 - 5:30 Part 2
2
Before we start
3
• Due to the
exponential growth,
we will miss several
important GAN
works.

• The GAN Zoo

https://deephunt.in/
the-gan-
zoo-79597dc8c347
Image from Deep Hunt, blog by Avinash Hindupur
Outlines
1. Introduction

2. GAN objective

3. GAN training

4. Joint image distribution and video distribution

5. Computer vision applications
4
1. Introduction
5
1. Introduction
1. Generative modeling

2. Gaussian mixture models

3. Manifold assumption

4. Variational autoencoders

5. Autoregressive models

6. Generative adversarial networks

7. Taxonomy of generative models
6
Generative modeling
• Density estimation
7
• Sample generation
Images & slide credit Goodfellow 2016
Step 1: observe a set of samples
Example: Gaussian mixture models
8
p(x|✓) =
X
i
⇡iN(x|µi, ⌃i)
Step 2: assume a GMM model
Step 3: perform maximum likelihood learning
max
✓
X
x(j)2Dataset
log p(✓|x(j)
)
Example: Gaussian mixture models
9
p(x|✓) =
X
i
⇡iN(x|µi, ⌃i)
1. [Density Estimation]
2. [Sample Generation]
Good for low-dimensional data
Manifold assumption
• Data lie approximately on a manifold of much lower
dimension than the input space.
10
Latent space Data space
Images credit Ward, A. D. et al 2007
Manifold examples
11
MNIST
Slide credit, Courville 2017
Generative modeling
via autoencoders and GMM
• Two stage process

1. Apply an autoencoder to embed
images in a low dimensional space. 



2. Fit a GMM to the embeddings

• Drawbacks

• Not end-to-end

• can still be high-dimensional 

• Tend to memorize samples.
12
Encoder
Network E
Decoder
Network: D
Input image
Latent code
Reconstructed image
Vincent et al. JMLR 2011
min
E,D
X
x(j)
||x(j)
D(E(x(j)
))||2
E(x(j)
)
E(x(j)
)
Variational autoencoders
• Derive from a variational framework

• Constrain the encoder to output a conditional
Gaussian distribution 

• The decoder reconstructs inputs from samples
from the conditional distribution
13
• Kingma and Welling, “Auto-encoding variational
Bayes," ICLR 2014

• Rezende, Mohamed, and Wierstra, “Stochastic
back-propagation and variational inference in deep
latent Gaussian models,” ICML 2014
E(x(j)
) = q✓(z|x(j)
) = N(z|µ(x(j)
), ⌃(x(j)
))
D(z(j)
) = p✓(x(j)
|z(j)
⇠ q✓(z|x(j)
))
Encoder
Network E
Decoder
Network: D
Input image
Latent code
Reconstructed image
~
~
Maximizing a variational lower bound
14
Encoder
Network E
Decoder
Network: D
Input image
Latent code
Reconstructed image Ideally, we would like to maximize
But we maximize a lower bound for tractability
max
✓
L(✓|Dataset) ⌘
X
x(j)
log p(✓|x(j)
)
max
✓
LV (✓|Dataset) ⌘
X
x(j)
KL(q✓(z|x(j)
)||N(z|0, I))+
Ez⇠(q✓(z|x(j))[log p✓(x(j)
|z)]
 L(✓|Dataset)
Reparameterization trick
15
• Make sampling a differentiable operation

• The variational distribution 

• Sampling
E(x) D(z)
q✓(z|x) = N(z|µ(x), ⌃(x))
z = µ(x) + ⌃(x)✏, where ✏ ⇠ N(0, I)
µ(x) ⌃(x)
Image credit, Courville 2017
Samples from variational
autoencoders
16
LFW dataset ImageNet dataset
Slide credit, Courville 2017
Why are generated samples blurry?
• Pixel values are modeled as a conditional Gaussian, which
leads to Euclidean loss minimization.

• Regress to the mean problem and rendering blurry image. 

• Difficult to hand-craft a good perceptual loss function.
17
log p✓(x(j)
|z) = ||x(j)
D(z)||2
+ Const
Pitfall of Euclidean distance
for image modeling
• Blue curve plots the Euclidean distance between a reference
image and its horizontal translation.

• Red curve is the Euclidean distance between and
18
Autoregressive generative models
19
Slide credit, Courville 2017
PixelCNNs
• Oord, Aaron van den, et al. 2016

• Use masked convolution to enforce the autoregressive
relationship.

• Minimizing cross entropy loss instead of Euclidean loss.
20
Slide credit, Courville 2017
PixelCNNs
21
Slide credit, Courville 2017
Example results from PixelCNNs
22
Slide credit, Courville 2017
Parallel multiscale autoregressive
density estimation
23
• Reed et al. ICML 2017

• Can we speed generation time of PixelCNNs?

• Yes, via multiscale generation
Slide credit, Courville 2017
Parallel multiscale autoregressive
density estimation
24
Slide credit, Courville 2017
• Also seems to help provide better global structure.
Generative adversarial networks
25
Discriminator
Network: D
Input image
1
Generator
Network G
Discriminator
Network: D
Random vector
Generated image
0 • Goodfellow et al. NIPS
2014

• Forget about designing a
perceptual loss. Let’s train
a discriminator to
differential real and fake
image.
Deep convolutional generative
adversarial networks (DCGANs)
26
By Redford et al. ICLR 2016
Results from DCGANs
27
Z space interpolation
Vector space arithmetic
28
Slide credit, Goodfellow 2016
GANs vs VAEs
29
z
x
G
GANs
z
x
?
z
x
D
VAEs
z
x
E
Taxonomy of generative models
30
Maximum
likelihood
Explicit density
Trackable
density
Approximate
density
Implicit density
GANs
VAEsPixelCNNs
Slide credit Goodfellow 2016
Quick recaps
• Generative modeling

• Manifold assumption

• VAEs, PixelCNNs, GANs

• Taxonomy
31
Maximum
likelihood
Explicit
density
Trackable
density
Approximate
density
Implicit
density
GAN
VAEsPixelCNN
Outlines
1. Introduction

2. GAN objective
3. GAN training

4. Joint image distribution and video distribution

5. Computer vision applications
32
2. GAN objective
33
2. GAN objective
1. GAN objective

2. Discriminator strategy

3. GAN theory

4. Effect of limited discriminator capacity
34
GAN objectives
35
min
G
max
D
Ex⇠pX
[log D(x)] + Ez⇠pZ
[log(1 D(G(z))]
Solving a minimax problem
pX : Data distribution,
usually represented by samples.
pG(Z) : Model distribution, where
Z is usually modeled as uniform or Gaussian.
Alternating gradient updates
• Step 1: Fix G and perform a gradient step to 

• Step 2: Fix D and perform a gradient step to 

(in theory)

(in practice)
36
max
D
Ex⇠pX
[log D(x)] + Ez⇠pZ
[log(1 D(G(z))]
min
G
Ez⇠pZ
[log(1 D(G(z))]
max
G
Ez⇠pZ
[log D(G(z))]
log D(G(z))
bounded above
log 1-D(G(z))
unbounded below
bad for minimization
Discriminator strategy
• Optimal (non-parametric) discriminator
D(x) =
pX (x)
pX(x) + pG(Z)(x)
37
Slide credit Goodfellow 2016
Learning a Gaussian mixture model
using GANs
• Red points: true samples;
samples from data
distribution

• Blue points: fake samples;
samples from model
distribution

• Z-space: a Gaussian
distribution of N(0,I)
• The model first learns 1
mode, then 2 modes, and
finally 3 modes.
38
2D GMM learning
GAN theory
• Optimal (non-parametric) discriminator

• Under an ideal discriminator, the generator minimizes the
Jensen-Shannon divergence between pX and pG(Z) . This also
requires that D and G have sufficient capacity and a sufficiently
large dataset.
39
D(x) =
pX (x)
pX(x) + pG(Z)(x)
JS(pX |pG(Z)) = KL(pX ||
pX + pG(Z)
2
)+
KL(pG(Z)||
pX + pG(Z)
2
)
KL(pX |pG(Z)) = EpX
[log
pX (x)
pG(Z)(x)
]
For a discriminator with capacity n, the generator only
need to remembers samples to be epsilon
away from the optimal objective.
In practice
• Dataset sizes are limited.

• Finite network capacity

• (Sanjeev Arora et al. ICML 2017)
40
C
n
✏2
log n
Consequences:

1. The generator does not need to generalize.

2. Larger training datasets have limited utility.
Capacity meaning # of trainable parameters.
Effect of limited discriminator capacity
41
CIFAR10
dataset
Image credit Arora and Zhang 2017
Quick Recaps
• Discriminator strategy

• GAN theory

• Effect of limited
discriminator capacity
42
D(x) =
pX (x)
pX(x) + pG(Z)(x)
Outlines
1. Introduction

2. GAN objective

3. GAN training
4. Joint image distribution and video distribution

5. Computer vision applications
43
3. GAN Training
44
3. GAN Training
1. Non-convergence in GANs

2. Mode collapse

3. Lake of global structure

4. Tricks

5. New objective functions

6. Surrogate objective functions

7. Network architectures
45
Non-convergence in GANs
• GAN training is theoretically guaranteed to converge if we
can modify the density functions directly, but 

• Instead, we modify G (sample generation function)

• We represent G and D as highly non-convex parametric
functions.

• “Oscillation”: can train for a very long time, generating
very many different categories of samples, without clearly
generating better samples.

• Mode collapse: most severe form of non-convergence
46
Slide credit Goodfellow 2016
Mode collapse
47
Slide credit Goodfellow 2016
Figure credit, Metz et al, 2016
Mode collapse
48
• Same data dimension.

• Performance
correlated with
number of modes.

• Performance
correlated with
manifold dimensions.
Mode collapse
49
• Same number of
modes.

• Performance
correlated with
manifold dimensions.

• Performance non-
correlated with data
dimensions.
Problem with global structure
50
Slide credit Goodfellow 2016
Problem with counting
51
Slide credit Goodfellow 2016
Improve GAN training
52
Tricks
• Label smoothing
• Historical batches
• …
Surrogate or auxiliary objective
• UnrolledGAN
• WGAN-GP
• DRAGAN
• …
New objectives
• EBGAN
• LSGAN
• WGAN
• BEGAN
• fGAN
• …
Network architecture
• LAPGAN
• …
Improve GAN training
53
Tricks
• Label smoothing
• Historical batches
• …
Surrogate or auxiliary objective
• UnrolledGAN
• WGAN-GP
• DRAGAN
• …
New objectives
• EBGAN
• LSGAN
• WGAN
• BEGAN
• fGAN
• …
Network architecture
• LAPGAN
• …
Tricks - one sided label smoothing
• Does not reduce classification accuracy, only confidence.

• Prevents discriminator from giving very large gradient
signal to generator.

• Prevents extrapolating to encourage extreme samples.

• Two sided label smoothing https://github.com/soumith/
ganhacks
54
Ex⇠pX
[0.9 log D(x)] + Ez⇠pZ
[log(1 D(G(z)))]
T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, X. Chen,
“Improved techniques for training GANs”, NIPS 2016
Tricks - historical generator batches
Help stabilize the discriminator
training in the early stage.
55
Discriminator
Network: D
Input images
@ t iteration
1
Generator
Network G
Discriminator
Network: D
Random vector
Generated
images
@ t
0
Discriminator
Network: D
Generated
images
@ t-1
0
Discriminator
Network: D
Generated
images
@ t-K
0
A. Srivastava, T. Ofister, O. Tuzel, J. Susskind, W. Wang, R. Webb, “Learning from
Simulated and Unsupervised Images through adversarial training”. CVPR’17
Other tricks
• https://github.com/soumith/ganhacks

• Use labels

• Normalize the inputs to [-1,1]

• Use TanH

• Use BatchNorm

• Avoid sparse gradients: ReLU, MaxPool

• … 

• Salimans et al. NIPS 2016

• Use Virtual BatchNorm

• MiniBatch Discrimination

• …
56
Improve GAN training
57
Tricks
• Label smoothing
• Historical batches
• …
Surrogate or auxiliary objective
• UnrolledGAN
• WGAN-GP
• DRAGAN
• …
New objectives
• EBGAN
• LSGAN
• WGAN
• BEGAN
• fGAN
• …
Network architecture
• LAPGAN
• …
EBGAN
58
Junbo Zhao, Michael Mathieu, Yann LeCun, “Energy-based generative adversarial
networks,” ICLR 2017
Discriminator
Network: D
Input image
1
Generator
Network G
Discriminator
Network: D
Random vector
Generated image
0
Energy
Function
Input image
Low energy
value
Generator
Network G
Energy
Function
Random vector
Generated image
High energy
value
Energy function
can be modeled
by an auto-
encoder.
EBGAN
59
Junbo Zhao, Michael Mathieu, Yann LeCun, “Energy-based generative adversarial
networks,” ICLR 2017
En(x) = ||Decoder(Encoder(x)) x||
Discriminator objective
min
En
Ex⇠pX
[En(x)] + Ez⇠pZ
E[[m En(G(z))]+
]
Generator objective
min
G
Ez⇠pZ
E[En(G(z))]
EBGAN
60
Junbo Zhao, Michael Mathieu, Yann LeCun, “Energy-based generative adversarial
networks,” ICLR 2017
LSGAN
X. Mao, Q. Li, H. Xie, R. Lau, Z. Wang, “Least squares generative adversarial
networks” 2016
Still use a classifier but replace cross-entropy loss with Euclidean loss.
min
D
Ex⇠pX
[ log D(x)] + Ez⇠pZ
[ log(1 D(G(z)))]
Discriminator
min
D
Ex⇠pX
[(D(x) 1)2
] + Ez⇠pZ
[D(G(z))2
]
GAN
LSGAN
Generator
min
G
Ez⇠pZ
[(D(G(z)) 1)2
]
min
G
Ez⇠pZ
[ log D(G(z))]GAN
LSGAN
61
LSGAN
X. Mao, Q. Li, H. Xie, R. Lau, Z. Wang, “Least squares generative adversarial
networks” 2016
JS(pX ||pG(Z)) = KL(pX ||
pX + pG(Z)
2
) + KL(pG(Z)||
pX + pG(Z)
2
)
GAN: minimize Jensen-Shannon divergence between pX and pG(Z)
LSGAN: minimize chi-square distance between pX + pG(Z) and 2pG(Z)
X2
(pX + pG(Z)||2pG(Z)) =
Z
X
2pG(Z)(x) (pG(Z)(x) + pX(x))
pG(Z)(x) + pX (x)
dx2
(
pX + pG(Z)
2
||pG(Z))
62
63
LSGAN
X. Mao, Q. Li, H. Xie, R. Lau, Z. Wang, “Least squares generative adversarial
networks” 2016
Wasserstein GAN
Replace classifier with a critic function
Discriminator
GAN
WGAN
64
M. Arjovsky, S. Chintala, L. Bottou “Wasserstein GAN” 2016
max
D
Ex⇠pX
[D(x)] Ez⇠pZ
[D(G(z))]
max
D
Ex⇠pX
[log D(x)] + Ez⇠pZ
[log(1 D(G(z))]
Generator
GAN
WGAN max
G
Ez⇠pZ
[D(G(z))]
max
G
Ez⇠pZ
[log D(G(z))]
Wasserstein GAN
M. Arjovsky, S. Chintala, L. Bottou “Wasserstein GAN” 2016
JS(pX ||pG(Z)) = KL(pX ||
pX + pG(Z)
2
) + KL(pG(Z)||
pX + pG(Z)
2
)
GAN: minimize Jensen-Shannon divergence between pX and pG(Z)
WGAN: minimize earth mover distance between pX and pG(Z)
EM(pX , pG(Z)) = inf
2
Q
(pX ,pG(Z))
E(x,y)⇠ [||x y||]
65
66
Wasserstein GAN
M. Arjovsky, S. Chintala, L. Bottou “Wasserstein GAN” 2016
GAN vs WGAN
67
Slide credit, Courville 2017
GAN vs WGAN
68
Slide credit, Courville 2017
ZJS(pX||pG(Z)) = KL(pX||
pX + pG(Z)
2
) + KL(pG(Z)||
pX + pG(Z)
2
)
JS(pX||pG(Z)) = KL(pX||
pX + pG(Z)
2
) + KL(pG(Z)||
pX + pG(Z)
2
)
Jesen-Shannon divergence in this
example
GAN vs WGAN
69
Slide credit, Courville 2017
Earth Mover distance in this
example
EM(pX , pG(Z)) = inf
2
Q
(pX ,pG(Z))
E(x,y)⇠ [||x y||]
EM(pX, pG(Z)) = |✓|
GAN vs WGAN
GAN WGAN
• If we can directly change the density shape parameter,
the Earth Mover distance is smoother.

• But we do not directly change the density shape
parameter, we change the generation function.
Boundary Equilibrium GAN (BEGAN)
71
David Berthelot, Thomas Schumm, Luke Metz, “Boundary equilibrium generative
adversarial networks” 2017
Energy
Function
Input image
Low energy
value
Generator
Network G
Energy
Function
Random vector
Generated image
High energy
value
• Use EBGAN autoencoder-based
discriminator.

• Not based on minimizing distance
between pX and pG(Z)

• Based on minimizing distance
between pEn(X) and pEn(G(Z)) 

• The distance is a lower bound on
the Wasserstein 1 (Earth Mover)
distance.
72
David Berthelot, Thomas Schumm, Luke Metz, “Boundary equilibrium generative
adversarial networks” 2017
Boundary Equilibrium GAN (BEGAN)
BEGAN
EM(pEn(X), pEn(G(Z))) = inf
2
Q
(pEn(X),pEn(G(Z)))
E(x1,x2)⇠ [|x1 x2|]
EM(pX, pG(Z)) = inf
2
Q
(pX ,pG(Z))
E(x,y)⇠ [||x y||]
WGAN
inf E(x1,x2)⇠ [|x1 x2|] inf |E(x1,x2)⇠ [x1 x2]| = mean(x1) mean(x2)
But low bound
max
G
min
D
Ex⇠pX
[En(x)] Ez⇠pZ
[En(G(z))]
Final objective function
73
Boundary Equilibrium GAN (BEGAN)
David Berthelot, Thomas Schumm, Luke Metz, “Boundary equilibrium generative
adversarial networks” 2017
Discriminator objective: min
En
En(x) ktEn(G(z))
Generator objective: min
G
En(G(z))
Equilibrium objective: kt+1 = kt + k( En(x) En(G(z)))
Hyperparamete: =
En(G(z))
En(x)
value between 0 and 1
74
Boundary Equilibrium GAN (BEGAN)
David Berthelot, Thomas Schumm, Luke Metz, “Boundary equilibrium generative
adversarial networks” 2017
Improve GAN training
75
Tricks
• Label smoothing
• Historical batches
• …
Surrogate or auxiliary objective
• UnrolledGAN
• WGAN-GP
• DRAGAN
• …
New objectives
• EBGAN
• LSGAN
• WGAN
• BEGAN
• fGAN
• …
Network architecture
• LAPGAN
• …
UnrolledGAN
• Considering several future updates of the discriminator as
updating the generator. 

• Update the discriminator in the same way.

• Avoid that the generator generates samples from few modes.
76
L. Metz, B. Poole, D. Pfau, J. Sohl-Dickstein, “Unrolled generative adversarial
networks” ICLR 2017
UnrolledGAN
77
UnrolledGAN
GAN
• DK if a function of D and G

• Considering several future updates of the discriminator as
updating the generator.
max
G
Ez⇠pZ
[log DK(G(z))]Surrogate objective
WGAN-GP
• y: imaginary samples
78
I. Gulrajani, F. Ahmed, M. Arjovsky, V. Domoulin, A. Courville “Improved Training of
Wasserstein GANs” 2017
y = ux + (1 u)G(z)
min
G
max
D
Ex⇠pX
[D(x)] Ez⇠pZ
[D(G(Z))] + Ey⇠pY
[(||ryD(y)||2 1)2
]min
G
max
D
Ex⇠pX
[D(x)] Ez⇠pZ
[D(G(Z))] + Ey⇠pY
[(||ryD(y)||2 1)2
]
Optimal critic has unit gradient norm almost everywhere
DRAGAN
79
N. Kodali, J. Abernethy, J. Hays, Z. Kira “How to train your DRAGAN” 2017
min
G
max
D
Ex⇠pX
[log D(x)] Ez⇠pZ
[log(1 D(G(Z)))] + Ey⇠pY
[(||ryD(y)||2 1)2
]
y = ↵x + (1 ↵)(x + ) y: imaginary samples around true sample
min
G
max
D
Ex⇠pX
[log D(x)] Ez⇠pZ
[log(1 D(G(Z)))] + Ey⇠pY
[(||ryD(y)||2 1)2
]
Improve GAN training
80
Tricks
• Label smoothing
• Historical batches
• …
Surrogate or auxiliary objective
• UnrolledGAN
• WGAN-GP
• DRAGAN
• …
New objectives
• EBGAN
• LSGAN
• WGAN
• BEGAN
• fGAN
• …
Network architecture
• LAPGAN
• …
LAPGAN
81
E. Denton, S. Chintala, A. Szlam, R. Fergus “Deep Generative Image Models
using a Laplacian Pyramid of Adversarial Networks” NIPS 2015
Testing
LAPGAN
82
E. Denton, S. Chintala, A. Szlam, R. Fergus “Deep Generative Image Models
using a Laplacian Pyramid of Adversarial Networks” NIPS 2015
Training of each resolution is performed independent of the others.
Evaluation
• Popular metrics for GAN evaluation.

• Inception loss

• Train an accurate classifier

• Train a conditional image generation model.

• Check how accurate the classifier can recognize the
the generated images.

• Human evaluation

• Generative models are difficult to evaluate. (This et al.
ICLR 2016)

• You can hack each evaluation metric.

• Still an open problem.
83
Outlines
1. Introduction

2. GAN objective

3. GAN training

4. Joint image distribution and video distribution
5. Computer vision applications
84
4. Joint image distribution and video
distribution
85
4. Joint image distribution and video
distribution
1. Joint image distribution learning

2. Video distribution learning
86
Joint distribution of multi-domain
images
87
Scene
Image	plane
Camera
color	
image
depth	
image
near
far
thermal	
image
cool
hot
• !(#$,#&,…, #():	where	#) are	images	of	the	scene	in	different	modalities.
• Ex.	!(#*+,+-,#./0-12,,#304./):
• In	this	presentation,	“modality”	=	“domain”
Joint distribution of multi-domain
images
88
• Define	domain	by	attribute.	
• Multi-domain	images	are	views	of	an	object	with	different	attributes.
• !(#$,#&,…, #():	where	#* are	images	of	the	object	with	different	attributes.
Non-smiling Smiling Young SeniorNon-beard Beard
summer winterimages Hand-drawings
Font#1 Font#2
Extending GAN for joint image
distribution learning
89
Discriminator
Network: D
Input image
1
Generator
Network G
Discriminator
Network: D
Random vector
Generated image
0
What if we do not have paired images
from different domains for learning?
90
CoGAN: Coupled Generative
Adversarial Networks
91
Ming-Yu Liu, Oncel Tuzel “Coupled Generative Adversarial Networks” NIPS 2016
Learning a joint distribution of images using samples from marginal distributions
Discriminator
Network 2: D2
Input image
1
Generator
Network 1: G1
Discriminator
Network 1: D1
Generated image
0
Discriminator
Network 1: D1
1
Input image
Generator
Network 2: G2
Discriminator
Network 2: D2
Generated image
0
Applying
weight-
sharing to
high-level
layer
Z
92
93
94
95
96
97
98
99
100
Video GANs
101
C. Vondrick, H. Pirsiavash, A. Torralba “Generating videos with scene
dynamics” NIPS 2016
102
MoCoGANs
103
S. Tulyakov, M. Liu, X. Yang, J. Kautz “MoCoGAN: Decomposing Motion and
Content for Video Generation” 2017
104
ZC = [zC, zC, ..., zC]
Motion trajectorySampled content
zC
z
(1)
M
z
(2)
M
z
(3)
M
z
(4)
M z
(5)
M
ZM = [z
(1)
M , z
(2)
M , ..., z
(K)
M ]
MoCoGANs
S. Tulyakov, M. Liu, X. Yang, J. Kautz “MoCoGAN: Decomposing Motion and
Content for Video Generation” 2017
MoCoGANs
105
˜x(1)GI
zC
z
(1)
M
GI ˜x(K)
z
(K)
M
DIS1
GI ˜x(2)
z
(2)
M
˜v
vh(0)
"(K)
"(1)
"(2)
RM
DVST
S. Tulyakov, M. Liu, X. Yang, J. Kautz “MoCoGAN: Decomposing Motion and
Content for Video Generation” 2017
106
Training:
min
DI
Ev⇠pV
[ log DI(S1(v))] + E˜v⇠p˜V
[ log(1 DI(S1(˜v)))]
min
DV
Ev⇠pV
[ log DV(ST(v))] + E˜v⇠p˜V
[ log(1 DV(ST(˜v)))]
max
GI,RM
E˜v⇠p˜V
[ log(1 DI(S1(˜v)))] + E˜v⇠p˜V
[ log(1 DV(ST(˜v)))].
MoCoGANs
S. Tulyakov, M. Liu, X. Yang, J. Kautz “MoCoGAN: Decomposing Motion and
Content for Video Generation” 2017
107
MoCoGANs
S. Tulyakov, M. Liu, X. Yang, J. Kautz “MoCoGAN: Decomposing Motion and
Content for Video Generation” 2017
VGAN vs. MoCoGAN
108
VGAN MoCoGAN
Video generation
Directly generating 3D
volume
Video frames are generated
sequentially.
Generator
Motion dynamic video
and static background
Image
Discriminator
One discriminator
- 3D CNN for clips
Two discriminators
- 3D CNN for clips
- 2D CNN for frames
Variable length No Yes
Motion and Content
Separation
No Yes
User preference on
generated face video
quality
5% 95%
Outlines
1. Introduction

2. GAN objective

3. GAN Training

4. Joint image distribution and video distribution

5. Computer vision applications
109
5. Computer vision applications
110
Why are GANs useful for computer
vision?
111
Deep NetworksHand-crafted features
Generative
Adversarial
Networks
Hand-crafted
objective function
• Let and be two different image domains (day-
time image and night-time image domains). 

• Let . 

• Image-to-image translation concerns the problem of
translating to a corresponding image 

• Correspondence can mean different things in different
contexts.
Image-to-image Translation
X1 X2
x1 2 X1
x1 x2 2 X2
Input Output
Image
Translator
112
Examples and use cases
Low-res to high-res Blurry to sharp
Noisy to cleanLDR to HDR
Thermal to color
Image to painting
Day to night Summer to winter
Synthetic to real
• Bad weather to good
weather 

• Greyscale to color

• …
113
Prior Works
• Image-to-image translation has been studied for decades
in computer vision. 

• Different approaches have been exploited including:

• Filtering-based approaches

• Optimization-based approaches

• Dictionary learning-based approaches

• Deep learning-based approaches

• GAN-based approaches.
114
Generative Adversarial Networks
(GANs)
• Training is done via applying alternating stochastic
gradient updates to the following two learning problems.

• Effect: minimizing JSD between and
max
G
Ez⇠pN
[log D(G(z))]
max
D
Ex⇠pX
[log D(X)] + Ez⇠pN
[log(1 D(G(z)))]
pXpG(z),z⇠N
G D
D
z ⇠ N
x 2 XGoodfellow et al. NIPS’14
x = G(z)
115
Conditional GANs
• Training is done via applying alternating stochastic
gradient updates to the following two learning problems.

• Effect: minimizing JSD between and
F D
Dx2 ⇠ X2
x1 ⇠ X1 x = F(x1)
max
D
Ex2⇠pX2
[log D(x2)] + Ex1⇠pX1
[log(1 D(F(x1)))]
pF (x1),x1⇠X1
pX2
max
F
Ex1⇠pX1
[log D(F(x1))]
116
Conditional GAN for Image-to-image
Translation
• Conditional GAN alone is insufficient for image
translation.

• No guarantee that the conditionally generated image is
related to the source image in a desired way.

• In the supervised setting, 

• 

• This can be easily fixed.
Dataset = {(x
(1)
1 , x
(1)
2 ), (x
(2)
1 , x
(2)
2 ), ..., (x
(N)
1 , x
(N)
2 )}
117
Supervised Image-to-image
Translation
• Supervisedly relating to

• Ledig et al, CVPR’17: Adding content loss
F Dx1 ⇠ X1 x = F(x1)
x = F(x
(i)
1 ) x
(i)
2
||x x
(i)
2 ||2 + ||VGG(x) VGG(x
(i)
2 )||2
118
SRGAN
119
C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunningham, A. Acosta, A. Aitken,
A. Tejani, J. Totz, Z. Wang, W. Shi “Photo-realistic image superresolution using a
generative adversarial networks “, CVPR 2017
Supervised Image-to-image
Translation
• Supervisedly relating to

• Isola et al, CVPR’17: Learning a joint distribution.
F Dx1 ⇠ X1 x = F(x1)
x = F(x
(i)
1 ) x
(i)
2
max
F
EpX1
[log(D(x1, F(x1)))]
120
Pix2Pix
121
P. Isola, J. Zhu, T. Zhou, A. Efros “Image-to-image translation with conditional
generative networks“, CVPR 2017
Unsupervised Image-to-image
Translation
• Corresponding images could be expensive to get. 

• In the unsupervised setting

• 

• 

• With no correspondences, we need additional
constraints/assumptions for learning image-to-image
translation.
Dataset2 = {x
(m1)
2 , x
(m2)
2 , ..., x
(mM )
2 }
Dataset1 = {x
(n1)
1 , x
(n2)
1 , ..., x
(nN )
1 }
122
SimGAN
F Dx1 ⇠ X1 x = F(x1)
• Srivastava et al. CVPR’17: adding cross-domain content
loss.
max
F
EpX1
[log D(F(x1)) ||F(x1) x1||1]
123
Cycle Constraint
• Learning two-way translation.

• DiscoGAN by Kim et al. ICML’17 (arXiv 1703.05192 )

• CycleGAN by Zhu et al. arXiv 1703.10593
F12
D2
F21
D1
ˆx2 = F12(x1)
max
F12,F21
EpX1
[log(D2(F12(x1)) ||F21(F12(x1)) x1||p
p]+
EpX2
[log(D1(F21(x2)) ||F12(F21(x2)) x2||p
p]
124
x1 ⇠ X1 F21(F12(x1))
CycleGAN unsupervised image to
image translation results
125
Shared Latent Space Constraint
• CoGAN by Liu et al. NIPS’16

• Assume corresponding images in different domains share
the same representation in a hidden space.

• and

•
G1 D1
D2G2
z ⇠ N
ˆx1 = G1(z)
ˆx2 = G2(z)
weight-sharing
G1 = GL1 GH G2 = GL2 GH
max
GH ,GL1,GL2
EpN
[log D1(GL1(GH(z)) + log D2(GL2(GH(z))]
126
Shared Latent Space Constraint
• Under sufficient capacity and descending to local
minimum assumptions as in Goodfellow et al., CoGAN
learns a joint distribution of multi-domain images. 

• Image translation
G1 D1
D2G2
z ⇠ N
ˆx1 = G1(z)
ˆx2 = G2(z)
weight-sharing
G2(argminz||G1(z) x1||p
p)
127
The CoVAE-GAN Framework
G1 D1
D2G2
E1
E2
x1 ⇠ X1
x2 ⇠ X2
G1(E1(x1))
G1(E2(x2))
G2(E1(x1))
G2(E2(x2))
• Augmenting CoGAN with VAEs for mapping images to the
shared latent space.

• Applying weight-sharing constraints to encoders E1 and E2

• Image reconstruction streams and image translation
streams 128
Ming-Yu Liu, Thomas Breuel, Jan Kautz “Unsupervised Image-to-image
translation networks“, arXiv 2017
The CoVAE-GAN Framework
G1 D1
D2G2
E1
E2
x1 ⇠ X1
x2 ⇠ X2
G1(E1(x1))
G1(E2(x2))
G2(E1(x1))
G2(E2(x2))
EpX1
[log D1(G1(E1(x1))) + log D2(G2(E1(x1)))
1||G1(E1(x1)) x1||p
p 2KL(E1(x1)||pN )]+
EpX2
[log D1(G1(E2(x2))) + log D2(G2(E2(x2)))
1||G2(E2(x2)) x2||p
p 2KL(E2(x2)||pN )]
129
Intuition
G1
G2
E1
E2
In the ideal case (supervised setting)
Embedding Space
We can enforce that corresponding images are
mapping to the same point in the embedding
space.
E1(x
(i)
1 ) ⌘ E2(x
(i)
2 )
130
Intuition
G1
G2
E1
In the unsupervised setting
Embedding Space
Although we can use a GAN discriminator to
make sure the images coming out from G2
resembling real images, there is no guarantee
that the output image from G1 and G2 form a
corresponding pair.
D2
131
Intuition
G1
G2
E1
In the unsupervised setting
Embedding Space
However, if we apply shared latent space
assumption via weight-sharing, then the
generated images from G1 and G2 will resemble a
pair of corresponding images. D2
132
G1
G2
E1
In the unsupervised setting
Embedding Space
Similarly, we can enforce weight-sharing to E1
and E2 to encourage a pair of corresponding
images to have the same embedding.
D2
E2
D1
133
Toy Examples
G1 D1
D2G2
E1
E2
x1 ⇠ X1
x2 ⇠ X2
G1(E1(x1))
G1(E2(x2))
G2(E1(x1))
G2(E2(x2))
Domain 1 Domain 2
Image Translation Errors
Results
134
Image Translation Results
135
Image Translation Results
Foggy image to clear sky image
Front ViewBack View Right ViewLeft View
136
Attribute-based Face Image
Translation
Input +Blond Hair +Eye Glasses +Goatee +Smiling Input +Blond Hair +Eye Glasses +Goatee +Smiling
137
Image translation results
Input +Blondhair Input +Eyeglasses
Input -SmilingInput -Goatee
138
Image translation results
Input +Blondhair Input +Eyeglasses
Input +SmilingInput +Goatee
139
Conclusions
• Family of generative models

• Theory of GANs

• Various training techniques

• Joint image distribution and video distribution

• Image translation applications
140

Weitere ähnliche Inhalte

Was ist angesagt?

Diffusion models beat gans on image synthesis
Diffusion models beat gans on image synthesisDiffusion models beat gans on image synthesis
Diffusion models beat gans on image synthesisBeerenSahu
 
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
Introduction to Generative Adversarial Networks
Introduction to Generative Adversarial NetworksIntroduction to Generative Adversarial Networks
Introduction to Generative Adversarial NetworksBennoG1
 
Basic Generative Adversarial Networks
Basic Generative Adversarial NetworksBasic Generative Adversarial Networks
Basic Generative Adversarial NetworksDong Heon Cho
 
Variational Autoencoder Tutorial
Variational Autoencoder Tutorial Variational Autoencoder Tutorial
Variational Autoencoder Tutorial Hojin Yang
 
Explicit Density Models
Explicit Density ModelsExplicit Density Models
Explicit Density ModelsSangwoo Mo
 
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기NAVER Engineering
 
Generative Adversarial Networks and Their Medical Imaging Applications
Generative Adversarial Networks and Their Medical Imaging ApplicationsGenerative Adversarial Networks and Their Medical Imaging Applications
Generative Adversarial Networks and Their Medical Imaging ApplicationsKyuhwan Jung
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networksYunjey Choi
 
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Appsilon Data Science
 
Imitation learning tutorial
Imitation learning tutorialImitation learning tutorial
Imitation learning tutorialYisong Yue
 
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAIGenerative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAIWithTheBest
 
Style space analysis paper review !
Style space analysis paper review !Style space analysis paper review !
Style space analysis paper review !taeseon ryu
 
GAN - Theory and Applications
GAN - Theory and ApplicationsGAN - Theory and Applications
GAN - Theory and ApplicationsEmanuele Ghelfi
 
A Short Introduction to Generative Adversarial Networks
A Short Introduction to Generative Adversarial NetworksA Short Introduction to Generative Adversarial Networks
A Short Introduction to Generative Adversarial NetworksJong Wook Kim
 
Generative Adversarial Networks and Their Applications in Medical Imaging
Generative Adversarial Networks  and Their Applications in Medical ImagingGenerative Adversarial Networks  and Their Applications in Medical Imaging
Generative Adversarial Networks and Their Applications in Medical ImagingSanghoon Hong
 
A Style-Based Generator Architecture for Generative Adversarial Networks
A Style-Based Generator Architecture for Generative Adversarial NetworksA Style-Based Generator Architecture for Generative Adversarial Networks
A Style-Based Generator Architecture for Generative Adversarial Networksivaderivader
 
Style gan2 review
Style gan2 reviewStyle gan2 review
Style gan2 reviewtaeseon ryu
 

Was ist angesagt? (20)

Diffusion models beat gans on image synthesis
Diffusion models beat gans on image synthesisDiffusion models beat gans on image synthesis
Diffusion models beat gans on image synthesis
 
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
 
Introduction to Generative Adversarial Networks
Introduction to Generative Adversarial NetworksIntroduction to Generative Adversarial Networks
Introduction to Generative Adversarial Networks
 
Basic Generative Adversarial Networks
Basic Generative Adversarial NetworksBasic Generative Adversarial Networks
Basic Generative Adversarial Networks
 
Variational Autoencoder Tutorial
Variational Autoencoder Tutorial Variational Autoencoder Tutorial
Variational Autoencoder Tutorial
 
Explicit Density Models
Explicit Density ModelsExplicit Density Models
Explicit Density Models
 
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
 
Style gan
Style ganStyle gan
Style gan
 
Generative Adversarial Networks and Their Medical Imaging Applications
Generative Adversarial Networks and Their Medical Imaging ApplicationsGenerative Adversarial Networks and Their Medical Imaging Applications
Generative Adversarial Networks and Their Medical Imaging Applications
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)
 
Imitation learning tutorial
Imitation learning tutorialImitation learning tutorial
Imitation learning tutorial
 
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAIGenerative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
 
Style space analysis paper review !
Style space analysis paper review !Style space analysis paper review !
Style space analysis paper review !
 
GAN - Theory and Applications
GAN - Theory and ApplicationsGAN - Theory and Applications
GAN - Theory and Applications
 
A Short Introduction to Generative Adversarial Networks
A Short Introduction to Generative Adversarial NetworksA Short Introduction to Generative Adversarial Networks
A Short Introduction to Generative Adversarial Networks
 
Generative Adversarial Networks and Their Applications in Medical Imaging
Generative Adversarial Networks  and Their Applications in Medical ImagingGenerative Adversarial Networks  and Their Applications in Medical Imaging
Generative Adversarial Networks and Their Applications in Medical Imaging
 
GAN Evaluation
GAN EvaluationGAN Evaluation
GAN Evaluation
 
A Style-Based Generator Architecture for Generative Adversarial Networks
A Style-Based Generator Architecture for Generative Adversarial NetworksA Style-Based Generator Architecture for Generative Adversarial Networks
A Style-Based Generator Architecture for Generative Adversarial Networks
 
Style gan2 review
Style gan2 reviewStyle gan2 review
Style gan2 review
 

Ähnlich wie Theory and Application of GANs

D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)
D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)
D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya
 
Vladislav Kolbasin “Introduction to Generative Adversarial Networks (GANs)”
Vladislav Kolbasin “Introduction to Generative Adversarial Networks (GANs)”Vladislav Kolbasin “Introduction to Generative Adversarial Networks (GANs)”
Vladislav Kolbasin “Introduction to Generative Adversarial Networks (GANs)”Lviv Startup Club
 
3D Volumetric Data Generation with Generative Adversarial Networks
3D Volumetric Data Generation with Generative Adversarial Networks3D Volumetric Data Generation with Generative Adversarial Networks
3D Volumetric Data Generation with Generative Adversarial NetworksPreferred Networks
 
Generative modeling with Convolutional Neural Networks
Generative modeling with Convolutional Neural NetworksGenerative modeling with Convolutional Neural Networks
Generative modeling with Convolutional Neural NetworksDenis Dus
 
MLIP - Chapter 6 - Generation, Super-Resolution, Style transfer
MLIP - Chapter 6 - Generation, Super-Resolution, Style transferMLIP - Chapter 6 - Generation, Super-Resolution, Style transfer
MLIP - Chapter 6 - Generation, Super-Resolution, Style transferCharles Deledalle
 
Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)Manohar Mukku
 
Generative Models and Adversarial Training (D2L3 Insight@DCU Machine Learning...
Generative Models and Adversarial Training (D2L3 Insight@DCU Machine Learning...Generative Models and Adversarial Training (D2L3 Insight@DCU Machine Learning...
Generative Models and Adversarial Training (D2L3 Insight@DCU Machine Learning...Universitat Politècnica de Catalunya
 
Decomposing image generation into layout priction and conditional synthesis
Decomposing image generation into layout priction and conditional synthesisDecomposing image generation into layout priction and conditional synthesis
Decomposing image generation into layout priction and conditional synthesisNaeem Shehzad
 
Plug & Play Generative Networks: Conditional Iterative Generation of Images ...
 Plug & Play Generative Networks: Conditional Iterative Generation of Images ... Plug & Play Generative Networks: Conditional Iterative Generation of Images ...
Plug & Play Generative Networks: Conditional Iterative Generation of Images ...Safaa Alnabulsi
 
Applying your Convolutional Neural Networks
Applying your Convolutional Neural NetworksApplying your Convolutional Neural Networks
Applying your Convolutional Neural NetworksDatabricks
 
gan-190318135433 (1).pptx
gan-190318135433 (1).pptxgan-190318135433 (1).pptx
gan-190318135433 (1).pptxkiran814572
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImageryRAHUL BHOJWANI
 
DALL-E.pdf
DALL-E.pdfDALL-E.pdf
DALL-E.pdfdsfajkh
 
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...MLconf
 
Master defence 2020 - Vadym Korshunov - Region-Selected Image Generation with...
Master defence 2020 - Vadym Korshunov - Region-Selected Image Generation with...Master defence 2020 - Vadym Korshunov - Region-Selected Image Generation with...
Master defence 2020 - Vadym Korshunov - Region-Selected Image Generation with...Lviv Data Science Summer School
 

Ähnlich wie Theory and Application of GANs (20)

Deep Generative Models - Kevin McGuinness - UPC Barcelona 2018
Deep Generative Models - Kevin McGuinness - UPC Barcelona 2018Deep Generative Models - Kevin McGuinness - UPC Barcelona 2018
Deep Generative Models - Kevin McGuinness - UPC Barcelona 2018
 
06 mlp
06 mlp06 mlp
06 mlp
 
D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)
D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)
D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)
 
DiscoGAN
DiscoGANDiscoGAN
DiscoGAN
 
Vladislav Kolbasin “Introduction to Generative Adversarial Networks (GANs)”
Vladislav Kolbasin “Introduction to Generative Adversarial Networks (GANs)”Vladislav Kolbasin “Introduction to Generative Adversarial Networks (GANs)”
Vladislav Kolbasin “Introduction to Generative Adversarial Networks (GANs)”
 
3D Volumetric Data Generation with Generative Adversarial Networks
3D Volumetric Data Generation with Generative Adversarial Networks3D Volumetric Data Generation with Generative Adversarial Networks
3D Volumetric Data Generation with Generative Adversarial Networks
 
Generative modeling with Convolutional Neural Networks
Generative modeling with Convolutional Neural NetworksGenerative modeling with Convolutional Neural Networks
Generative modeling with Convolutional Neural Networks
 
MLIP - Chapter 6 - Generation, Super-Resolution, Style transfer
MLIP - Chapter 6 - Generation, Super-Resolution, Style transferMLIP - Chapter 6 - Generation, Super-Resolution, Style transfer
MLIP - Chapter 6 - Generation, Super-Resolution, Style transfer
 
Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)
 
Generative Models and Adversarial Training (D2L3 Insight@DCU Machine Learning...
Generative Models and Adversarial Training (D2L3 Insight@DCU Machine Learning...Generative Models and Adversarial Training (D2L3 Insight@DCU Machine Learning...
Generative Models and Adversarial Training (D2L3 Insight@DCU Machine Learning...
 
Decomposing image generation into layout priction and conditional synthesis
Decomposing image generation into layout priction and conditional synthesisDecomposing image generation into layout priction and conditional synthesis
Decomposing image generation into layout priction and conditional synthesis
 
Plug & Play Generative Networks: Conditional Iterative Generation of Images ...
 Plug & Play Generative Networks: Conditional Iterative Generation of Images ... Plug & Play Generative Networks: Conditional Iterative Generation of Images ...
Plug & Play Generative Networks: Conditional Iterative Generation of Images ...
 
04 numerical
04 numerical04 numerical
04 numerical
 
Applying your Convolutional Neural Networks
Applying your Convolutional Neural NetworksApplying your Convolutional Neural Networks
Applying your Convolutional Neural Networks
 
gan-190318135433 (1).pptx
gan-190318135433 (1).pptxgan-190318135433 (1).pptx
gan-190318135433 (1).pptx
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite Imagery
 
DALL-E.pdf
DALL-E.pdfDALL-E.pdf
DALL-E.pdf
 
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
 
gan.pdf
gan.pdfgan.pdf
gan.pdf
 
Master defence 2020 - Vadym Korshunov - Region-Selected Image Generation with...
Master defence 2020 - Vadym Korshunov - Region-Selected Image Generation with...Master defence 2020 - Vadym Korshunov - Region-Selected Image Generation with...
Master defence 2020 - Vadym Korshunov - Region-Selected Image Generation with...
 

Mehr von MLReview

Bayesian Non-parametric Models for Data Science using PyMC
 Bayesian Non-parametric Models for Data Science using PyMC Bayesian Non-parametric Models for Data Science using PyMC
Bayesian Non-parametric Models for Data Science using PyMCMLReview
 
Machine Learning and Counterfactual Reasoning for "Personalized" Decision- ...
  Machine Learning and Counterfactual Reasoning for "Personalized" Decision- ...  Machine Learning and Counterfactual Reasoning for "Personalized" Decision- ...
Machine Learning and Counterfactual Reasoning for "Personalized" Decision- ...MLReview
 
PixelGAN Autoencoders
  PixelGAN Autoencoders  PixelGAN Autoencoders
PixelGAN AutoencodersMLReview
 
Representing and comparing probabilities: Part 2
Representing and comparing probabilities: Part 2Representing and comparing probabilities: Part 2
Representing and comparing probabilities: Part 2MLReview
 
Representing and comparing probabilities
Representing and comparing probabilitiesRepresenting and comparing probabilities
Representing and comparing probabilitiesMLReview
 
OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING
 OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING
OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNINGMLReview
 
Theoretical Neuroscience and Deep Learning Theory
Theoretical Neuroscience and Deep Learning TheoryTheoretical Neuroscience and Deep Learning Theory
Theoretical Neuroscience and Deep Learning TheoryMLReview
 
2017 Tutorial - Deep Learning for Dialogue Systems
2017 Tutorial - Deep Learning for Dialogue Systems2017 Tutorial - Deep Learning for Dialogue Systems
2017 Tutorial - Deep Learning for Dialogue SystemsMLReview
 
Deep Learning for Semantic Composition
Deep Learning for Semantic CompositionDeep Learning for Semantic Composition
Deep Learning for Semantic CompositionMLReview
 
Near human performance in question answering?
Near human performance in question answering?Near human performance in question answering?
Near human performance in question answering?MLReview
 
Real-time Edge-aware Image Processing with the Bilateral Grid
Real-time Edge-aware Image Processing with the Bilateral GridReal-time Edge-aware Image Processing with the Bilateral Grid
Real-time Edge-aware Image Processing with the Bilateral GridMLReview
 
Yoav Goldberg: Word Embeddings What, How and Whither
Yoav Goldberg: Word Embeddings What, How and WhitherYoav Goldberg: Word Embeddings What, How and Whither
Yoav Goldberg: Word Embeddings What, How and WhitherMLReview
 

Mehr von MLReview (12)

Bayesian Non-parametric Models for Data Science using PyMC
 Bayesian Non-parametric Models for Data Science using PyMC Bayesian Non-parametric Models for Data Science using PyMC
Bayesian Non-parametric Models for Data Science using PyMC
 
Machine Learning and Counterfactual Reasoning for "Personalized" Decision- ...
  Machine Learning and Counterfactual Reasoning for "Personalized" Decision- ...  Machine Learning and Counterfactual Reasoning for "Personalized" Decision- ...
Machine Learning and Counterfactual Reasoning for "Personalized" Decision- ...
 
PixelGAN Autoencoders
  PixelGAN Autoencoders  PixelGAN Autoencoders
PixelGAN Autoencoders
 
Representing and comparing probabilities: Part 2
Representing and comparing probabilities: Part 2Representing and comparing probabilities: Part 2
Representing and comparing probabilities: Part 2
 
Representing and comparing probabilities
Representing and comparing probabilitiesRepresenting and comparing probabilities
Representing and comparing probabilities
 
OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING
 OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING
OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING
 
Theoretical Neuroscience and Deep Learning Theory
Theoretical Neuroscience and Deep Learning TheoryTheoretical Neuroscience and Deep Learning Theory
Theoretical Neuroscience and Deep Learning Theory
 
2017 Tutorial - Deep Learning for Dialogue Systems
2017 Tutorial - Deep Learning for Dialogue Systems2017 Tutorial - Deep Learning for Dialogue Systems
2017 Tutorial - Deep Learning for Dialogue Systems
 
Deep Learning for Semantic Composition
Deep Learning for Semantic CompositionDeep Learning for Semantic Composition
Deep Learning for Semantic Composition
 
Near human performance in question answering?
Near human performance in question answering?Near human performance in question answering?
Near human performance in question answering?
 
Real-time Edge-aware Image Processing with the Bilateral Grid
Real-time Edge-aware Image Processing with the Bilateral GridReal-time Edge-aware Image Processing with the Bilateral Grid
Real-time Edge-aware Image Processing with the Bilateral Grid
 
Yoav Goldberg: Word Embeddings What, How and Whither
Yoav Goldberg: Word Embeddings What, How and WhitherYoav Goldberg: Word Embeddings What, How and Whither
Yoav Goldberg: Word Embeddings What, How and Whither
 

Kürzlich hochgeladen

PROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalPROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalMAESTRELLAMesa2
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 
Introduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxIntroduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxMedical College
 
Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsSérgio Sacani
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayupadhyaymani499
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologycaarthichand2003
 
well logging & petrophysical analysis.pptx
well logging & petrophysical analysis.pptxwell logging & petrophysical analysis.pptx
well logging & petrophysical analysis.pptxzaydmeerab121
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptxpallavirawat456
 
bonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girlsbonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girlshansessene
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationColumbia Weather Systems
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubaikojalkojal131
 
Biological classification of plants with detail
Biological classification of plants with detailBiological classification of plants with detail
Biological classification of plants with detailhaiderbaloch3
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingNetHelix
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 

Kürzlich hochgeladen (20)

PROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalPROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and Vertical
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 
Introduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxIntroduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptx
 
Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive stars
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyay
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technology
 
well logging & petrophysical analysis.pptx
well logging & petrophysical analysis.pptxwell logging & petrophysical analysis.pptx
well logging & petrophysical analysis.pptx
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptx
 
bonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girlsbonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girls
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather Station
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
 
Biological classification of plants with detail
Biological classification of plants with detailBiological classification of plants with detail
Biological classification of plants with detail
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 

Theory and Application of GANs

  • 1. Theory and Application of Generative Adversarial Networks Ming-Yu Liu, Julie Bernauer, Jan Kautz NVIDIA 1
  • 2. Tutorial schedule • 1:30 - 2:50 Part 1 • 2:50 - 3:20 Demo • 3:30 - 4:00 Coffee Break • 4:00 - 5:30 Part 2 2
  • 3. Before we start 3 • Due to the exponential growth, we will miss several important GAN works. • The GAN Zoo
 https://deephunt.in/ the-gan- zoo-79597dc8c347 Image from Deep Hunt, blog by Avinash Hindupur
  • 4. Outlines 1. Introduction 2. GAN objective 3. GAN training 4. Joint image distribution and video distribution 5. Computer vision applications 4
  • 6. 1. Introduction 1. Generative modeling 2. Gaussian mixture models 3. Manifold assumption 4. Variational autoencoders 5. Autoregressive models 6. Generative adversarial networks 7. Taxonomy of generative models 6
  • 7. Generative modeling • Density estimation 7 • Sample generation Images & slide credit Goodfellow 2016
  • 8. Step 1: observe a set of samples Example: Gaussian mixture models 8 p(x|✓) = X i ⇡iN(x|µi, ⌃i) Step 2: assume a GMM model Step 3: perform maximum likelihood learning max ✓ X x(j)2Dataset log p(✓|x(j) )
  • 9. Example: Gaussian mixture models 9 p(x|✓) = X i ⇡iN(x|µi, ⌃i) 1. [Density Estimation] 2. [Sample Generation] Good for low-dimensional data
  • 10. Manifold assumption • Data lie approximately on a manifold of much lower dimension than the input space. 10 Latent space Data space Images credit Ward, A. D. et al 2007
  • 12. Generative modeling via autoencoders and GMM • Two stage process 1. Apply an autoencoder to embed images in a low dimensional space. 
 
 2. Fit a GMM to the embeddings • Drawbacks • Not end-to-end • can still be high-dimensional • Tend to memorize samples. 12 Encoder Network E Decoder Network: D Input image Latent code Reconstructed image Vincent et al. JMLR 2011 min E,D X x(j) ||x(j) D(E(x(j) ))||2 E(x(j) ) E(x(j) )
  • 13. Variational autoencoders • Derive from a variational framework • Constrain the encoder to output a conditional Gaussian distribution • The decoder reconstructs inputs from samples from the conditional distribution 13 • Kingma and Welling, “Auto-encoding variational Bayes," ICLR 2014 • Rezende, Mohamed, and Wierstra, “Stochastic back-propagation and variational inference in deep latent Gaussian models,” ICML 2014 E(x(j) ) = q✓(z|x(j) ) = N(z|µ(x(j) ), ⌃(x(j) )) D(z(j) ) = p✓(x(j) |z(j) ⇠ q✓(z|x(j) )) Encoder Network E Decoder Network: D Input image Latent code Reconstructed image ~ ~
  • 14. Maximizing a variational lower bound 14 Encoder Network E Decoder Network: D Input image Latent code Reconstructed image Ideally, we would like to maximize But we maximize a lower bound for tractability max ✓ L(✓|Dataset) ⌘ X x(j) log p(✓|x(j) ) max ✓ LV (✓|Dataset) ⌘ X x(j) KL(q✓(z|x(j) )||N(z|0, I))+ Ez⇠(q✓(z|x(j))[log p✓(x(j) |z)]  L(✓|Dataset)
  • 15. Reparameterization trick 15 • Make sampling a differentiable operation • The variational distribution • Sampling E(x) D(z) q✓(z|x) = N(z|µ(x), ⌃(x)) z = µ(x) + ⌃(x)✏, where ✏ ⇠ N(0, I) µ(x) ⌃(x) Image credit, Courville 2017
  • 16. Samples from variational autoencoders 16 LFW dataset ImageNet dataset Slide credit, Courville 2017
  • 17. Why are generated samples blurry? • Pixel values are modeled as a conditional Gaussian, which leads to Euclidean loss minimization. • Regress to the mean problem and rendering blurry image. • Difficult to hand-craft a good perceptual loss function. 17 log p✓(x(j) |z) = ||x(j) D(z)||2 + Const
  • 18. Pitfall of Euclidean distance for image modeling • Blue curve plots the Euclidean distance between a reference image and its horizontal translation. • Red curve is the Euclidean distance between and 18
  • 20. PixelCNNs • Oord, Aaron van den, et al. 2016 • Use masked convolution to enforce the autoregressive relationship. • Minimizing cross entropy loss instead of Euclidean loss. 20 Slide credit, Courville 2017
  • 22. Example results from PixelCNNs 22 Slide credit, Courville 2017
  • 23. Parallel multiscale autoregressive density estimation 23 • Reed et al. ICML 2017 • Can we speed generation time of PixelCNNs? • Yes, via multiscale generation Slide credit, Courville 2017
  • 24. Parallel multiscale autoregressive density estimation 24 Slide credit, Courville 2017 • Also seems to help provide better global structure.
  • 25. Generative adversarial networks 25 Discriminator Network: D Input image 1 Generator Network G Discriminator Network: D Random vector Generated image 0 • Goodfellow et al. NIPS 2014 • Forget about designing a perceptual loss. Let’s train a discriminator to differential real and fake image.
  • 26. Deep convolutional generative adversarial networks (DCGANs) 26 By Redford et al. ICLR 2016
  • 27. Results from DCGANs 27 Z space interpolation
  • 28. Vector space arithmetic 28 Slide credit, Goodfellow 2016
  • 30. Taxonomy of generative models 30 Maximum likelihood Explicit density Trackable density Approximate density Implicit density GANs VAEsPixelCNNs Slide credit Goodfellow 2016
  • 31. Quick recaps • Generative modeling • Manifold assumption • VAEs, PixelCNNs, GANs • Taxonomy 31 Maximum likelihood Explicit density Trackable density Approximate density Implicit density GAN VAEsPixelCNN
  • 32. Outlines 1. Introduction 2. GAN objective 3. GAN training 4. Joint image distribution and video distribution 5. Computer vision applications 32
  • 34. 2. GAN objective 1. GAN objective 2. Discriminator strategy 3. GAN theory 4. Effect of limited discriminator capacity 34
  • 35. GAN objectives 35 min G max D Ex⇠pX [log D(x)] + Ez⇠pZ [log(1 D(G(z))] Solving a minimax problem pX : Data distribution, usually represented by samples. pG(Z) : Model distribution, where Z is usually modeled as uniform or Gaussian.
  • 36. Alternating gradient updates • Step 1: Fix G and perform a gradient step to • Step 2: Fix D and perform a gradient step to 
 (in theory) (in practice) 36 max D Ex⇠pX [log D(x)] + Ez⇠pZ [log(1 D(G(z))] min G Ez⇠pZ [log(1 D(G(z))] max G Ez⇠pZ [log D(G(z))] log D(G(z)) bounded above log 1-D(G(z)) unbounded below bad for minimization
  • 37. Discriminator strategy • Optimal (non-parametric) discriminator D(x) = pX (x) pX(x) + pG(Z)(x) 37 Slide credit Goodfellow 2016
  • 38. Learning a Gaussian mixture model using GANs • Red points: true samples; samples from data distribution • Blue points: fake samples; samples from model distribution • Z-space: a Gaussian distribution of N(0,I) • The model first learns 1 mode, then 2 modes, and finally 3 modes. 38 2D GMM learning
  • 39. GAN theory • Optimal (non-parametric) discriminator • Under an ideal discriminator, the generator minimizes the Jensen-Shannon divergence between pX and pG(Z) . This also requires that D and G have sufficient capacity and a sufficiently large dataset. 39 D(x) = pX (x) pX(x) + pG(Z)(x) JS(pX |pG(Z)) = KL(pX || pX + pG(Z) 2 )+ KL(pG(Z)|| pX + pG(Z) 2 ) KL(pX |pG(Z)) = EpX [log pX (x) pG(Z)(x) ]
  • 40. For a discriminator with capacity n, the generator only need to remembers samples to be epsilon away from the optimal objective. In practice • Dataset sizes are limited. • Finite network capacity • (Sanjeev Arora et al. ICML 2017) 40 C n ✏2 log n Consequences: 1. The generator does not need to generalize. 2. Larger training datasets have limited utility. Capacity meaning # of trainable parameters.
  • 41. Effect of limited discriminator capacity 41 CIFAR10 dataset Image credit Arora and Zhang 2017
  • 42. Quick Recaps • Discriminator strategy • GAN theory • Effect of limited discriminator capacity 42 D(x) = pX (x) pX(x) + pG(Z)(x)
  • 43. Outlines 1. Introduction 2. GAN objective 3. GAN training 4. Joint image distribution and video distribution 5. Computer vision applications 43
  • 45. 3. GAN Training 1. Non-convergence in GANs 2. Mode collapse 3. Lake of global structure 4. Tricks 5. New objective functions 6. Surrogate objective functions 7. Network architectures 45
  • 46. Non-convergence in GANs • GAN training is theoretically guaranteed to converge if we can modify the density functions directly, but • Instead, we modify G (sample generation function) • We represent G and D as highly non-convex parametric functions. • “Oscillation”: can train for a very long time, generating very many different categories of samples, without clearly generating better samples. • Mode collapse: most severe form of non-convergence 46 Slide credit Goodfellow 2016
  • 47. Mode collapse 47 Slide credit Goodfellow 2016 Figure credit, Metz et al, 2016
  • 48. Mode collapse 48 • Same data dimension. • Performance correlated with number of modes. • Performance correlated with manifold dimensions.
  • 49. Mode collapse 49 • Same number of modes. • Performance correlated with manifold dimensions. • Performance non- correlated with data dimensions.
  • 50. Problem with global structure 50 Slide credit Goodfellow 2016
  • 51. Problem with counting 51 Slide credit Goodfellow 2016
  • 52. Improve GAN training 52 Tricks • Label smoothing • Historical batches • … Surrogate or auxiliary objective • UnrolledGAN • WGAN-GP • DRAGAN • … New objectives • EBGAN • LSGAN • WGAN • BEGAN • fGAN • … Network architecture • LAPGAN • …
  • 53. Improve GAN training 53 Tricks • Label smoothing • Historical batches • … Surrogate or auxiliary objective • UnrolledGAN • WGAN-GP • DRAGAN • … New objectives • EBGAN • LSGAN • WGAN • BEGAN • fGAN • … Network architecture • LAPGAN • …
  • 54. Tricks - one sided label smoothing • Does not reduce classification accuracy, only confidence. • Prevents discriminator from giving very large gradient signal to generator. • Prevents extrapolating to encourage extreme samples. • Two sided label smoothing https://github.com/soumith/ ganhacks 54 Ex⇠pX [0.9 log D(x)] + Ez⇠pZ [log(1 D(G(z)))] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, X. Chen, “Improved techniques for training GANs”, NIPS 2016
  • 55. Tricks - historical generator batches Help stabilize the discriminator training in the early stage. 55 Discriminator Network: D Input images @ t iteration 1 Generator Network G Discriminator Network: D Random vector Generated images @ t 0 Discriminator Network: D Generated images @ t-1 0 Discriminator Network: D Generated images @ t-K 0 A. Srivastava, T. Ofister, O. Tuzel, J. Susskind, W. Wang, R. Webb, “Learning from Simulated and Unsupervised Images through adversarial training”. CVPR’17
  • 56. Other tricks • https://github.com/soumith/ganhacks • Use labels • Normalize the inputs to [-1,1] • Use TanH • Use BatchNorm • Avoid sparse gradients: ReLU, MaxPool • … • Salimans et al. NIPS 2016 • Use Virtual BatchNorm • MiniBatch Discrimination • … 56
  • 57. Improve GAN training 57 Tricks • Label smoothing • Historical batches • … Surrogate or auxiliary objective • UnrolledGAN • WGAN-GP • DRAGAN • … New objectives • EBGAN • LSGAN • WGAN • BEGAN • fGAN • … Network architecture • LAPGAN • …
  • 58. EBGAN 58 Junbo Zhao, Michael Mathieu, Yann LeCun, “Energy-based generative adversarial networks,” ICLR 2017 Discriminator Network: D Input image 1 Generator Network G Discriminator Network: D Random vector Generated image 0 Energy Function Input image Low energy value Generator Network G Energy Function Random vector Generated image High energy value Energy function can be modeled by an auto- encoder.
  • 59. EBGAN 59 Junbo Zhao, Michael Mathieu, Yann LeCun, “Energy-based generative adversarial networks,” ICLR 2017 En(x) = ||Decoder(Encoder(x)) x|| Discriminator objective min En Ex⇠pX [En(x)] + Ez⇠pZ E[[m En(G(z))]+ ] Generator objective min G Ez⇠pZ E[En(G(z))]
  • 60. EBGAN 60 Junbo Zhao, Michael Mathieu, Yann LeCun, “Energy-based generative adversarial networks,” ICLR 2017
  • 61. LSGAN X. Mao, Q. Li, H. Xie, R. Lau, Z. Wang, “Least squares generative adversarial networks” 2016 Still use a classifier but replace cross-entropy loss with Euclidean loss. min D Ex⇠pX [ log D(x)] + Ez⇠pZ [ log(1 D(G(z)))] Discriminator min D Ex⇠pX [(D(x) 1)2 ] + Ez⇠pZ [D(G(z))2 ] GAN LSGAN Generator min G Ez⇠pZ [(D(G(z)) 1)2 ] min G Ez⇠pZ [ log D(G(z))]GAN LSGAN 61
  • 62. LSGAN X. Mao, Q. Li, H. Xie, R. Lau, Z. Wang, “Least squares generative adversarial networks” 2016 JS(pX ||pG(Z)) = KL(pX || pX + pG(Z) 2 ) + KL(pG(Z)|| pX + pG(Z) 2 ) GAN: minimize Jensen-Shannon divergence between pX and pG(Z) LSGAN: minimize chi-square distance between pX + pG(Z) and 2pG(Z) X2 (pX + pG(Z)||2pG(Z)) = Z X 2pG(Z)(x) (pG(Z)(x) + pX(x)) pG(Z)(x) + pX (x) dx2 ( pX + pG(Z) 2 ||pG(Z)) 62
  • 63. 63 LSGAN X. Mao, Q. Li, H. Xie, R. Lau, Z. Wang, “Least squares generative adversarial networks” 2016
  • 64. Wasserstein GAN Replace classifier with a critic function Discriminator GAN WGAN 64 M. Arjovsky, S. Chintala, L. Bottou “Wasserstein GAN” 2016 max D Ex⇠pX [D(x)] Ez⇠pZ [D(G(z))] max D Ex⇠pX [log D(x)] + Ez⇠pZ [log(1 D(G(z))] Generator GAN WGAN max G Ez⇠pZ [D(G(z))] max G Ez⇠pZ [log D(G(z))]
  • 65. Wasserstein GAN M. Arjovsky, S. Chintala, L. Bottou “Wasserstein GAN” 2016 JS(pX ||pG(Z)) = KL(pX || pX + pG(Z) 2 ) + KL(pG(Z)|| pX + pG(Z) 2 ) GAN: minimize Jensen-Shannon divergence between pX and pG(Z) WGAN: minimize earth mover distance between pX and pG(Z) EM(pX , pG(Z)) = inf 2 Q (pX ,pG(Z)) E(x,y)⇠ [||x y||] 65
  • 66. 66 Wasserstein GAN M. Arjovsky, S. Chintala, L. Bottou “Wasserstein GAN” 2016
  • 67. GAN vs WGAN 67 Slide credit, Courville 2017
  • 68. GAN vs WGAN 68 Slide credit, Courville 2017 ZJS(pX||pG(Z)) = KL(pX|| pX + pG(Z) 2 ) + KL(pG(Z)|| pX + pG(Z) 2 ) JS(pX||pG(Z)) = KL(pX|| pX + pG(Z) 2 ) + KL(pG(Z)|| pX + pG(Z) 2 ) Jesen-Shannon divergence in this example
  • 69. GAN vs WGAN 69 Slide credit, Courville 2017 Earth Mover distance in this example EM(pX , pG(Z)) = inf 2 Q (pX ,pG(Z)) E(x,y)⇠ [||x y||] EM(pX, pG(Z)) = |✓|
  • 70. GAN vs WGAN GAN WGAN • If we can directly change the density shape parameter, the Earth Mover distance is smoother. • But we do not directly change the density shape parameter, we change the generation function.
  • 71. Boundary Equilibrium GAN (BEGAN) 71 David Berthelot, Thomas Schumm, Luke Metz, “Boundary equilibrium generative adversarial networks” 2017 Energy Function Input image Low energy value Generator Network G Energy Function Random vector Generated image High energy value • Use EBGAN autoencoder-based discriminator. • Not based on minimizing distance between pX and pG(Z) • Based on minimizing distance between pEn(X) and pEn(G(Z)) • The distance is a lower bound on the Wasserstein 1 (Earth Mover) distance.
  • 72. 72 David Berthelot, Thomas Schumm, Luke Metz, “Boundary equilibrium generative adversarial networks” 2017 Boundary Equilibrium GAN (BEGAN) BEGAN EM(pEn(X), pEn(G(Z))) = inf 2 Q (pEn(X),pEn(G(Z))) E(x1,x2)⇠ [|x1 x2|] EM(pX, pG(Z)) = inf 2 Q (pX ,pG(Z)) E(x,y)⇠ [||x y||] WGAN inf E(x1,x2)⇠ [|x1 x2|] inf |E(x1,x2)⇠ [x1 x2]| = mean(x1) mean(x2) But low bound max G min D Ex⇠pX [En(x)] Ez⇠pZ [En(G(z))] Final objective function
  • 73. 73 Boundary Equilibrium GAN (BEGAN) David Berthelot, Thomas Schumm, Luke Metz, “Boundary equilibrium generative adversarial networks” 2017 Discriminator objective: min En En(x) ktEn(G(z)) Generator objective: min G En(G(z)) Equilibrium objective: kt+1 = kt + k( En(x) En(G(z))) Hyperparamete: = En(G(z)) En(x) value between 0 and 1
  • 74. 74 Boundary Equilibrium GAN (BEGAN) David Berthelot, Thomas Schumm, Luke Metz, “Boundary equilibrium generative adversarial networks” 2017
  • 75. Improve GAN training 75 Tricks • Label smoothing • Historical batches • … Surrogate or auxiliary objective • UnrolledGAN • WGAN-GP • DRAGAN • … New objectives • EBGAN • LSGAN • WGAN • BEGAN • fGAN • … Network architecture • LAPGAN • …
  • 76. UnrolledGAN • Considering several future updates of the discriminator as updating the generator. • Update the discriminator in the same way. • Avoid that the generator generates samples from few modes. 76 L. Metz, B. Poole, D. Pfau, J. Sohl-Dickstein, “Unrolled generative adversarial networks” ICLR 2017
  • 77. UnrolledGAN 77 UnrolledGAN GAN • DK if a function of D and G • Considering several future updates of the discriminator as updating the generator. max G Ez⇠pZ [log DK(G(z))]Surrogate objective
  • 78. WGAN-GP • y: imaginary samples 78 I. Gulrajani, F. Ahmed, M. Arjovsky, V. Domoulin, A. Courville “Improved Training of Wasserstein GANs” 2017 y = ux + (1 u)G(z) min G max D Ex⇠pX [D(x)] Ez⇠pZ [D(G(Z))] + Ey⇠pY [(||ryD(y)||2 1)2 ]min G max D Ex⇠pX [D(x)] Ez⇠pZ [D(G(Z))] + Ey⇠pY [(||ryD(y)||2 1)2 ] Optimal critic has unit gradient norm almost everywhere
  • 79. DRAGAN 79 N. Kodali, J. Abernethy, J. Hays, Z. Kira “How to train your DRAGAN” 2017 min G max D Ex⇠pX [log D(x)] Ez⇠pZ [log(1 D(G(Z)))] + Ey⇠pY [(||ryD(y)||2 1)2 ] y = ↵x + (1 ↵)(x + ) y: imaginary samples around true sample min G max D Ex⇠pX [log D(x)] Ez⇠pZ [log(1 D(G(Z)))] + Ey⇠pY [(||ryD(y)||2 1)2 ]
  • 80. Improve GAN training 80 Tricks • Label smoothing • Historical batches • … Surrogate or auxiliary objective • UnrolledGAN • WGAN-GP • DRAGAN • … New objectives • EBGAN • LSGAN • WGAN • BEGAN • fGAN • … Network architecture • LAPGAN • …
  • 81. LAPGAN 81 E. Denton, S. Chintala, A. Szlam, R. Fergus “Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks” NIPS 2015 Testing
  • 82. LAPGAN 82 E. Denton, S. Chintala, A. Szlam, R. Fergus “Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks” NIPS 2015 Training of each resolution is performed independent of the others.
  • 83. Evaluation • Popular metrics for GAN evaluation. • Inception loss • Train an accurate classifier • Train a conditional image generation model. • Check how accurate the classifier can recognize the the generated images. • Human evaluation • Generative models are difficult to evaluate. (This et al. ICLR 2016) • You can hack each evaluation metric. • Still an open problem. 83
  • 84. Outlines 1. Introduction 2. GAN objective 3. GAN training 4. Joint image distribution and video distribution 5. Computer vision applications 84
  • 85. 4. Joint image distribution and video distribution 85
  • 86. 4. Joint image distribution and video distribution 1. Joint image distribution learning 2. Video distribution learning 86
  • 87. Joint distribution of multi-domain images 87 Scene Image plane Camera color image depth image near far thermal image cool hot • !(#$,#&,…, #(): where #) are images of the scene in different modalities. • Ex. !(#*+,+-,#./0-12,,#304./): • In this presentation, “modality” = “domain”
  • 88. Joint distribution of multi-domain images 88 • Define domain by attribute. • Multi-domain images are views of an object with different attributes. • !(#$,#&,…, #(): where #* are images of the object with different attributes. Non-smiling Smiling Young SeniorNon-beard Beard summer winterimages Hand-drawings Font#1 Font#2
  • 89. Extending GAN for joint image distribution learning 89 Discriminator Network: D Input image 1 Generator Network G Discriminator Network: D Random vector Generated image 0
  • 90. What if we do not have paired images from different domains for learning? 90
  • 91. CoGAN: Coupled Generative Adversarial Networks 91 Ming-Yu Liu, Oncel Tuzel “Coupled Generative Adversarial Networks” NIPS 2016 Learning a joint distribution of images using samples from marginal distributions Discriminator Network 2: D2 Input image 1 Generator Network 1: G1 Discriminator Network 1: D1 Generated image 0 Discriminator Network 1: D1 1 Input image Generator Network 2: G2 Discriminator Network 2: D2 Generated image 0 Applying weight- sharing to high-level layer Z
  • 92. 92
  • 93. 93
  • 94. 94
  • 95. 95
  • 96. 96
  • 97. 97
  • 98. 98
  • 99. 99
  • 100. 100
  • 101. Video GANs 101 C. Vondrick, H. Pirsiavash, A. Torralba “Generating videos with scene dynamics” NIPS 2016
  • 102. 102
  • 103. MoCoGANs 103 S. Tulyakov, M. Liu, X. Yang, J. Kautz “MoCoGAN: Decomposing Motion and Content for Video Generation” 2017
  • 104. 104 ZC = [zC, zC, ..., zC] Motion trajectorySampled content zC z (1) M z (2) M z (3) M z (4) M z (5) M ZM = [z (1) M , z (2) M , ..., z (K) M ] MoCoGANs S. Tulyakov, M. Liu, X. Yang, J. Kautz “MoCoGAN: Decomposing Motion and Content for Video Generation” 2017
  • 105. MoCoGANs 105 ˜x(1)GI zC z (1) M GI ˜x(K) z (K) M DIS1 GI ˜x(2) z (2) M ˜v vh(0) "(K) "(1) "(2) RM DVST S. Tulyakov, M. Liu, X. Yang, J. Kautz “MoCoGAN: Decomposing Motion and Content for Video Generation” 2017
  • 106. 106 Training: min DI Ev⇠pV [ log DI(S1(v))] + E˜v⇠p˜V [ log(1 DI(S1(˜v)))] min DV Ev⇠pV [ log DV(ST(v))] + E˜v⇠p˜V [ log(1 DV(ST(˜v)))] max GI,RM E˜v⇠p˜V [ log(1 DI(S1(˜v)))] + E˜v⇠p˜V [ log(1 DV(ST(˜v)))]. MoCoGANs S. Tulyakov, M. Liu, X. Yang, J. Kautz “MoCoGAN: Decomposing Motion and Content for Video Generation” 2017
  • 107. 107 MoCoGANs S. Tulyakov, M. Liu, X. Yang, J. Kautz “MoCoGAN: Decomposing Motion and Content for Video Generation” 2017
  • 108. VGAN vs. MoCoGAN 108 VGAN MoCoGAN Video generation Directly generating 3D volume Video frames are generated sequentially. Generator Motion dynamic video and static background Image Discriminator One discriminator - 3D CNN for clips Two discriminators - 3D CNN for clips - 2D CNN for frames Variable length No Yes Motion and Content Separation No Yes User preference on generated face video quality 5% 95%
  • 109. Outlines 1. Introduction 2. GAN objective 3. GAN Training 4. Joint image distribution and video distribution 5. Computer vision applications 109
  • 110. 5. Computer vision applications 110
  • 111. Why are GANs useful for computer vision? 111 Deep NetworksHand-crafted features Generative Adversarial Networks Hand-crafted objective function
  • 112. • Let and be two different image domains (day- time image and night-time image domains). • Let . • Image-to-image translation concerns the problem of translating to a corresponding image • Correspondence can mean different things in different contexts. Image-to-image Translation X1 X2 x1 2 X1 x1 x2 2 X2 Input Output Image Translator 112
  • 113. Examples and use cases Low-res to high-res Blurry to sharp Noisy to cleanLDR to HDR Thermal to color Image to painting Day to night Summer to winter Synthetic to real • Bad weather to good weather • Greyscale to color • … 113
  • 114. Prior Works • Image-to-image translation has been studied for decades in computer vision. • Different approaches have been exploited including: • Filtering-based approaches • Optimization-based approaches • Dictionary learning-based approaches • Deep learning-based approaches • GAN-based approaches. 114
  • 115. Generative Adversarial Networks (GANs) • Training is done via applying alternating stochastic gradient updates to the following two learning problems. • Effect: minimizing JSD between and max G Ez⇠pN [log D(G(z))] max D Ex⇠pX [log D(X)] + Ez⇠pN [log(1 D(G(z)))] pXpG(z),z⇠N G D D z ⇠ N x 2 XGoodfellow et al. NIPS’14 x = G(z) 115
  • 116. Conditional GANs • Training is done via applying alternating stochastic gradient updates to the following two learning problems. • Effect: minimizing JSD between and F D Dx2 ⇠ X2 x1 ⇠ X1 x = F(x1) max D Ex2⇠pX2 [log D(x2)] + Ex1⇠pX1 [log(1 D(F(x1)))] pF (x1),x1⇠X1 pX2 max F Ex1⇠pX1 [log D(F(x1))] 116
  • 117. Conditional GAN for Image-to-image Translation • Conditional GAN alone is insufficient for image translation. • No guarantee that the conditionally generated image is related to the source image in a desired way. • In the supervised setting, • • This can be easily fixed. Dataset = {(x (1) 1 , x (1) 2 ), (x (2) 1 , x (2) 2 ), ..., (x (N) 1 , x (N) 2 )} 117
  • 118. Supervised Image-to-image Translation • Supervisedly relating to • Ledig et al, CVPR’17: Adding content loss F Dx1 ⇠ X1 x = F(x1) x = F(x (i) 1 ) x (i) 2 ||x x (i) 2 ||2 + ||VGG(x) VGG(x (i) 2 )||2 118
  • 119. SRGAN 119 C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, W. Shi “Photo-realistic image superresolution using a generative adversarial networks “, CVPR 2017
  • 120. Supervised Image-to-image Translation • Supervisedly relating to • Isola et al, CVPR’17: Learning a joint distribution. F Dx1 ⇠ X1 x = F(x1) x = F(x (i) 1 ) x (i) 2 max F EpX1 [log(D(x1, F(x1)))] 120
  • 121. Pix2Pix 121 P. Isola, J. Zhu, T. Zhou, A. Efros “Image-to-image translation with conditional generative networks“, CVPR 2017
  • 122. Unsupervised Image-to-image Translation • Corresponding images could be expensive to get. • In the unsupervised setting • • • With no correspondences, we need additional constraints/assumptions for learning image-to-image translation. Dataset2 = {x (m1) 2 , x (m2) 2 , ..., x (mM ) 2 } Dataset1 = {x (n1) 1 , x (n2) 1 , ..., x (nN ) 1 } 122
  • 123. SimGAN F Dx1 ⇠ X1 x = F(x1) • Srivastava et al. CVPR’17: adding cross-domain content loss. max F EpX1 [log D(F(x1)) ||F(x1) x1||1] 123
  • 124. Cycle Constraint • Learning two-way translation. • DiscoGAN by Kim et al. ICML’17 (arXiv 1703.05192 ) • CycleGAN by Zhu et al. arXiv 1703.10593 F12 D2 F21 D1 ˆx2 = F12(x1) max F12,F21 EpX1 [log(D2(F12(x1)) ||F21(F12(x1)) x1||p p]+ EpX2 [log(D1(F21(x2)) ||F12(F21(x2)) x2||p p] 124 x1 ⇠ X1 F21(F12(x1))
  • 125. CycleGAN unsupervised image to image translation results 125
  • 126. Shared Latent Space Constraint • CoGAN by Liu et al. NIPS’16 • Assume corresponding images in different domains share the same representation in a hidden space. • and • G1 D1 D2G2 z ⇠ N ˆx1 = G1(z) ˆx2 = G2(z) weight-sharing G1 = GL1 GH G2 = GL2 GH max GH ,GL1,GL2 EpN [log D1(GL1(GH(z)) + log D2(GL2(GH(z))] 126
  • 127. Shared Latent Space Constraint • Under sufficient capacity and descending to local minimum assumptions as in Goodfellow et al., CoGAN learns a joint distribution of multi-domain images. • Image translation G1 D1 D2G2 z ⇠ N ˆx1 = G1(z) ˆx2 = G2(z) weight-sharing G2(argminz||G1(z) x1||p p) 127
  • 128. The CoVAE-GAN Framework G1 D1 D2G2 E1 E2 x1 ⇠ X1 x2 ⇠ X2 G1(E1(x1)) G1(E2(x2)) G2(E1(x1)) G2(E2(x2)) • Augmenting CoGAN with VAEs for mapping images to the shared latent space. • Applying weight-sharing constraints to encoders E1 and E2 • Image reconstruction streams and image translation streams 128 Ming-Yu Liu, Thomas Breuel, Jan Kautz “Unsupervised Image-to-image translation networks“, arXiv 2017
  • 129. The CoVAE-GAN Framework G1 D1 D2G2 E1 E2 x1 ⇠ X1 x2 ⇠ X2 G1(E1(x1)) G1(E2(x2)) G2(E1(x1)) G2(E2(x2)) EpX1 [log D1(G1(E1(x1))) + log D2(G2(E1(x1))) 1||G1(E1(x1)) x1||p p 2KL(E1(x1)||pN )]+ EpX2 [log D1(G1(E2(x2))) + log D2(G2(E2(x2))) 1||G2(E2(x2)) x2||p p 2KL(E2(x2)||pN )] 129
  • 130. Intuition G1 G2 E1 E2 In the ideal case (supervised setting) Embedding Space We can enforce that corresponding images are mapping to the same point in the embedding space. E1(x (i) 1 ) ⌘ E2(x (i) 2 ) 130
  • 131. Intuition G1 G2 E1 In the unsupervised setting Embedding Space Although we can use a GAN discriminator to make sure the images coming out from G2 resembling real images, there is no guarantee that the output image from G1 and G2 form a corresponding pair. D2 131
  • 132. Intuition G1 G2 E1 In the unsupervised setting Embedding Space However, if we apply shared latent space assumption via weight-sharing, then the generated images from G1 and G2 will resemble a pair of corresponding images. D2 132
  • 133. G1 G2 E1 In the unsupervised setting Embedding Space Similarly, we can enforce weight-sharing to E1 and E2 to encourage a pair of corresponding images to have the same embedding. D2 E2 D1 133
  • 134. Toy Examples G1 D1 D2G2 E1 E2 x1 ⇠ X1 x2 ⇠ X2 G1(E1(x1)) G1(E2(x2)) G2(E1(x1)) G2(E2(x2)) Domain 1 Domain 2 Image Translation Errors Results 134
  • 136. Image Translation Results Foggy image to clear sky image Front ViewBack View Right ViewLeft View 136
  • 137. Attribute-based Face Image Translation Input +Blond Hair +Eye Glasses +Goatee +Smiling Input +Blond Hair +Eye Glasses +Goatee +Smiling 137
  • 138. Image translation results Input +Blondhair Input +Eyeglasses Input -SmilingInput -Goatee 138
  • 139. Image translation results Input +Blondhair Input +Eyeglasses Input +SmilingInput +Goatee 139
  • 140. Conclusions • Family of generative models • Theory of GANs • Various training techniques • Joint image distribution and video distribution • Image translation applications 140