1. DiscoGAN is a method for learning to discover cross-domain relations without explicitly paired data using generative adversarial networks.
2. It uses two coupled GANs to map each domain into the other domain to allow for domain transfer while preserving key attributes.
3. Results show DiscoGAN performs better than other methods and is more robust to the mode collapse problem due to the symmetry granted by coupling the two GANs.
1. 1
Published by T. Kim, M. Cha, H. Kim, J. K. Lee, and, J. Kim
Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, PMLR 70, 2017
Seongcheol Baek
Reading Circle Presentation @ Hikihara Lab
Department of Electrical Engineering, Kyoto University
2019/07/19
Learning to Discover Cross-Domain Relations
with Generative Adversarial Networks
2. Focus of this presentation
- Recently emerging issues around GAN
- Introduction of generative adversarial networks (GAN)
- What is DiscoGAN?
Problems of interest / model architecture / mode collapse problem /
experiments / summaries / comments
2
3. Recent generative technologies
3
2014 2015 2016 2017 2018 2019
Ian J. Goodfellow
invented
“generative
adversarial network”
Deep
Convolutional GAN
(DCGAN)
Least Squares GAN
(LSGAN)
Semi-Supervised GAN StackGAN,
Auxiliary
Classifier GAN
(ACGAN)
Jun. Oct. Nov. Oct. Mar. Aug. Sep. Oct. Sep. Mar. May
Samsung deepfake AI
fabricate a video from
a single profile picGauGAN (Source: Nvidia)
BW clips into color (Source: Nvidia)
CycleGAN
Original AlphaGo
beat a professional
Go player
DiscoGAN
4. Recent issues around deepfakes – security, art, etc.
4
A viral video that Obama insults
Donald Trump is fabricated
with FakeApp (Photo: Youtube)
A deepfake clip of Mark Zuckerberg
is being allowed to remain on Instagram
(Photo: Bill Poster UK)
- US lawmakers say AI deepfakes ‘have the potential to disrupt every facet of our society’
- At individual level, deepfakes can be used for cyberbullying, defamation and blackmail
Edmond de Belamy: The first piece
of AI-generated art
(created by GAN in 2018)
5. What is GAN?
5
- Two neural networks contest with each other in a game. Given a training set, GAN learns to
generate new data with the same statistics as the training set.
- Minimax two-player game (Generative model v.s. Discriminative model)
6. Minimax Problem of GAN
6
min
$
max
'
((*, ,) = /0~23454(0) log , 0 + /:~2;(;)[log(1 − ,(*(;)))]
( *, , = @
0
ABCDC 0 log , 0 d0 + @
;
AF (;) log(1 − , *(;) ) d;
Training of Generator – min
$
[1 − ,(* G )] = 0
Training of Discriminator max
'
,(I) = 1 max
'
[1 − ,(* G )] = 1
Discriminant for real data Discriminant for generated data
- ((,, *) has a saddle point at ,(* ; ) =
J
K
, ∈ [0, 1]
data is fake/real
7. Discover Cross-Domain Relations with GAN
7
Training of 2 different data sets
without explicitly paired labelling
Results of domain transfer
- Previous AI could also transfer data from one domain to another, preserving key attributes
- Previous training methods (~2016) require paired data, that is costly and hard to collect
- DiscoGAN requires training of 2 different data sets without any paired data, and its results
shows better performance with robustness to the mode collapse problem
(Domain A) (Domain B)
!"#
!#"
8. Network Models – DiscoGAN & Previous GANs
8
Standard GAN with GAN loss
GAN with a reconstruction loss & GAN loss
DiscoGAN
- Each generator consists of encoder-decoder pair (input and output are images)
- GAN loss (and the reconstruction) is to be minimized on training processes
- In DiscoGAN, 2 coupled GANs map each domain to its counterpart domain (bijective)
9. Problem Formulation (1)
9
- Reconstruction loss measures how well the original input is reconstructed after a sequence of two
generations: !"#$%&'
= )(+,-,, +,) such as !0, !1, or Huber loss
- GAN loss measures how realistic the generated image is in domain B: !2,$3
= −56'~8'(6) log <- +,-
- Relaxed constraints are considered to guarantee bijection and domain transition
- Bijection: ideally =,-
>0
= =-,
→ min
2'3
(!"#$%&'
), min
23'
(!"#$%&3
)
- Domain transition: ideally B,- ∈ ℝ-, B-, ∈ ℝ,
→ min
E3
(!E3
), min
E'
(!E'
)
10. Problem Formulation (2)
10
Training of Generator
(in case of !"#)
Training of Discriminator
(in case of $#)
Constraints Level
(a) Standard GAN with GAN
loss
%&'
= −*+~-+
[log &'(3+'(4+))]
%&'
= −*+~-+
[log &'(3+'(4+))] –
(b) GAN with a
reconstruction loss & GAN
loss
%3+'
= %3+7'
+ %9:7;<+
= −*+~-+
log &' 3+' 4+
+ =(4+'+, 4+)
%&'
= −*'~-'
[log &' 4' ]
− *+~-+
[log(1 − &'(3+'(4+)))]
doubled DOF
from (a),
weaker than (a)
(c) DiscoGAN %3 = %3+'
+ %@AB
= %3+7'
+ %9:7;<+
+ %3+7+
+ %9:7;<'
= −*+~-+
log &' 3+' 4+
+ =(4+'+, 4+)
− *'~-'
log &+ 3'+ 4'
+ =(4'+', 4')
%& = %&+
+ %&'
= −*+~-+
[log &+ 4+ ]
− *'~-'
[log(1 − &+(3'+(4')))]
− *'~-'
[log &' 4' ]
− *+~-+
[log(1 − &'(3+'(4+)))]
doubled DOF
from (b),
weaker than (b)
11. Architecture of Generator
11
- Each generator takes an image and feeds it through an encoder-decoder pair
- Number of layers ranges from 4 to 5 depending on the domain
Encoder
(convolution layer)
Decoder
(deconvolution layer)
Domain A (resp. B) Domain B (resp. A)
12. Architecture of Discriminator
12
- Each discriminator feeds an image through convolution layers
- Discriminator outputs a scalar output based on sigmoid, telling how real fed image is
13. Toy Experiment – Domain Transition Test
13
- In DiscoGAN, discriminator B is perfectly fooled by translated sampled from domain A
- DiscoGAN prevents mode-collapse by translating into distinct well-bounded regions that do
not overlap
Initial state Standard GAN GAN with !"#$%& DiscoGAN
'(
Colored points: samples in domain A
Black x’s: target modes in domain B
14. Mode Collapse Problem
14
The gradients are biased towards the mode from which
higher number of samples are drawn to form the real training data
- Generator outputs unintended images in different mode, which occurs prevalently in GANs
- Usually, GAN remedy this problem with losses, however it has not been resolved perfectly
- Other examples: communication system, cryptography, automaton, etc.
15. Why DiscoGAN is robust to mode-collapse?
15
- In DiscoGAN, two coupled models are trained together simultaneously. !"#’s and !#"’s
share parameters
- Constraints of coupled reconstruction losses lead to the strict bijection
16. Real Domain Experiment – Car to Car, Face to Face
16
Input data Standard GAN GAN with !"#$%& DiscoGAN
CartocarFacetoface
- Reconstruction tests
- Results in DiscoGAN show higher correlations, (robust to mode collapse)
–
17. Real Domain Experiment – Face Conversion
17
Translation of gender
Blond to black,
Black to blond hair
Glasses to non-glasses,
non-glasses to glasses
- DiscoGAN translates specific feature, preserving other facial features
18. Cross-Domain Experiment (1)
18
Chair to car Car to face
- Note that training is implemented without any paired data
- The main attribute (azimuth) is preserved
21. Summaries
21
- DiscoGAN is proposed as a learning method to discover cross-domain relations without any
pair labels
- Results showed better performance with robustness to mode-collapse. The symmetry
granted by coupling 2 GANs, is considered to be a key factor for the dynamical robustness
Comments
- The strategy to couple two GAN models reminded me of the symmetry of dynamics. Some
correlations could be drawn to handle the stability problem…?
- This paper is giving me many ideas. It is very pleasant.
22. 22
Thank you!
- Source code for simulations
Official implementation of "Learning to Discover Cross-Domain Relations with Generative Adversarial Networks" (Github) ... https://github.com/SKTBrain/DiscoGAN
- This presentation is also available on:
https://www.slideshare.net/SeongcheolBaek/introduction-of-discogan
23. References
23
- Crux of Presentation
T. Kim, et al., Learning to Discover Cross-Domain Relations with Generative Adversarial Networks (arXiv) ... https://arxiv.org/abs/1703.05192
- Recent generative technologies
Apple announces Animoji (The Verge) … https://www.theverge.com/2017/9/12/16290210/new-iphone-emoji-animated-animoji-apple-ios-11-update
AI Can Convert Black and White Clips into Color (Nvidia Developer) ... https://news.developer.nvidia.com/ai-can-convert-black-and-white-clips-into-color/
Nvidia’s latest AI software turns rough doodles into realistic landscapes (The Verge) ... https://www.theverge.com/2019/3/19/18272602/ai-art-generation-gan-nvidia-doodle-landscapes
Deepfakes are getting easier than ever to make (The Verge) … https://www.theverge.com/2019/5/23/18637373/deepfakes-samsung-ai-research-results-single-photo-algorithm
- Recent issues around deepfakes – security, art, etc.
A viral video that appeared to show Obama calling Trump a 'dips---' shows a disturbing new trend called 'deepfakes’ (Business Insider) … https://www.businessinsider.com/obama-
deepfake-video-insulting-trump-2018-4
New deepfake tech turns a single photo and audio file into a singing video portrait (The Verge) ... https://www.theverge.com/2019/6/20/18692671/deepfake-technology-singing-talking-
video-portrait-from-a-single-image-imperial-college-samsung
US lawmakers say AI deepfakes ‘have the potential to disrupt every facet of our society’ (The Verge) … https://www.theverge.com/2018/9/14/17859188/ai-deepfakes-national-security-
threat-lawmakers-letter-intelligence-community
Deepfakes: A Threat to Individuals and National Security (Lionbridge) … https://lionbridge.ai/articles/deepfakes-a-threat-to-individuals-and-national-security/
A deepfake clip of Mark Zuckerberg is being allowed to remain on Instagram (iNews) … https://inews.co.uk/news/technology/a-deepfake-clip-of-mark-zuckerberg-is-being-allowed-to-
remain-on-instagram/
Portrait of Edmond Belamy created by GAN (Wikipedia) … https://en.wikipedia.org/wiki/Edmond_de_Belamy
- Generative Adversarial Network
I. J. Goodfellow, Generative Adversarial Nets (arXiv) … https://arxiv.org/abs/1406.2661
Tutorial on Generative Adversarial Networks … https://www.slideshare.net/ckmarkohchang/generative-adversarial-networks
- Mode Collapse Problem
A. Ghosh, et al., Multi-Agent Diverse Generative Adversarial Networks (Research Gate) … https://www.researchgate.net/publication/315882247_Multi-
Agent_Diverse_Generative_Adversarial_Networks