SlideShare ist ein Scribd-Unternehmen logo
1 von 20
InfoGAN: Interpretable Representation Learning by
Information Maximizing Generative Adversarial Nets
Xi Chen, Yan Duan, Rein Houthooft, John Schulman,
Ilya Sutskever, Pieter Abbeel (UC Berkeley, Open AI)
Presenter: Shuhei M. Yoshida (Dept. of Physics, UTokyo)
Unsupervised learning of disentangled representations
Goal
GANs + Maximizing Mutual Information
between generated images and input codes
Approach
Benefit
Interpretable representation obtained
without supervision and substantial additional costs
Reference
https://arxiv.org/abs/1606.03657 (with Appendix sections)
Implementations
https://github.com/openai/InfoGAN (by the authors, with TensorFlow)
https://github.com/yoshum/InfoGAN (by the presenter, with Chainer)
NIPS2016読み会
Motivation
How can we achieve
unsupervised learning of disentangled representation?
In general, learned representation is entangled,
i.e. encoded in a data space in a complicated manner
When a representation is disentangled, it would be
more interpretable and easier to apply to tasks
Related works
• Unsupervised learning of representation
(no mechanism to force disentanglement)
Stacked (often denoising) autoencoder, RBM
Many others, including semi-supervised approach
• Supervised learning of disentangled representation
Bilinear models, multi-view perceptron
VAEs, adversarial autoencoders
• Weakly supervised learning of disentangled representation
disBM, DC-IGN
• Unsupervised learning of disentangled representation
hossRBM, applicable only to discrete latent factors
which the presenter has almost no knowledge about.
This work:
Unsupervised learning of disentangled representation
applicable to both continuous and discrete latent factors
Generative Adversarial Nets(GANs)
Generative model trained by competition between
two neural nets:
Generator 𝑥 = 𝐺 𝑧 , 𝑧 ∼ 𝑝 𝑧 𝑍
𝑝 𝑧 𝑍 : an arbitrary noise distribution
Discriminator 𝐷 𝑥 ∈ 0,1 :
probability that 𝑥 is sampled from the data dist. 𝑝data(𝑋)
rather than generated by the generator 𝐺 𝑧
min
𝐺
max
𝐷
𝑉GAN 𝐺, 𝐷 , where
𝑉GAN 𝐺, 𝐷 ≡ 𝐸 𝑥∼𝑝data 𝑋 ln 𝐷 𝑥 + 𝐸 𝑧∼𝑝 𝑧 𝑍 ln 1 − 𝐷 𝐺 𝑧
Optimization problem to solve:
Problems with GANs
From the perspective of representation learning:
No restrictions on how 𝐺 𝑧 uses 𝑧
• 𝑧 can be used in a highly entangled way
• Each dimension of 𝑧 does not represent
any salient feature of the training data
𝑧1
𝑧2
Proposed Resolution: InfoGAN
-Maximizing Mutual Information -
Observation in conventional GANs:
a generated date 𝑥 does not have much information
on the noise 𝑧 from which 𝑥 is generated
because of heavily entangled use of 𝑧
Proposed resolution = InfoGAN:
the generator 𝐺 𝑧, 𝑐 trained so that
it maximize the mutual information 𝐼 𝐶 𝑋 between
the latent code 𝐶 and the generated data 𝑋
min
𝐺
max
𝐷
𝑉GAN 𝐺, 𝐷 − 𝜆𝐼 𝐶 𝑋 = 𝐺 𝑍, 𝐶
Mutual Information
𝐼 𝑋; 𝑌 = 𝐻 𝑋 − 𝐻 𝑋 𝑌 , where
• 𝐻 𝑋 = 𝐸 𝑥∼𝑝 𝑋 − ln 𝑝 𝑋 = 𝑥 :
Entropy of the prior distribution
• 𝐻 𝑋 𝑌 = 𝐸 𝑦∼𝑝 𝑌 ,𝑥∼𝑝 𝑋|𝑌=𝑦 − ln 𝑝 𝑋 = 𝑥 𝑌 = 𝑦 :
Entropy of the posterior distribution
𝑝 𝑋 = 𝑥
𝑥
𝑝 𝑋 = 𝑥|𝑌 = 𝑦
𝑥
𝑝 𝑋 = 𝑥|𝑌 = 𝑦
𝑥
𝐼 𝑋; 𝑌 = 0 𝐼 𝑋; 𝑌 > 0
Sampling 𝑦 ∼ 𝑝 𝑌
Avoiding increase of calculation costs
Major difficulty:
Evaluation of 𝐼 𝐶 𝑋 based on
evaluation and sampling from the posterior 𝑝 𝐶 𝑋
Two strategies:
Variational maximization of mutual information
Use an approximate function 𝑄 𝑐 𝑥 = 𝑝 𝐶 = 𝑐 𝑋 = 𝑥
Sharing the neural net
between 𝑄 𝑐 𝑥 and the discriminator 𝐷 𝑥
Variational Maximization of MI
For an arbitrary function 𝑄 𝑐, 𝑥 ,
𝐸 𝑥∼𝑝 𝐺 𝑋 ,𝑐∼𝑝 𝐶|𝑋=𝑥 ln 𝑝 𝐶 = 𝑐 𝑋 = 𝑥
(∵ positivity of KL divergence)
= 𝐸 𝑥∼𝑝 𝐺 𝑋 ,𝑐∼𝑝 𝐶|𝑋=𝑥 ln 𝑄(𝑐, 𝑥) + 𝐸 𝑥∼𝑝 𝐺 𝑋 ,𝑐∼𝑝 𝐶|𝑋=𝑥 ln
𝑝 𝐶 = 𝑐 𝑋 = 𝑥
𝑄 𝑐, 𝑥
= 𝐸 𝑥∼𝑝 𝐺 𝑋 ,𝑐∼𝑝 𝐶|𝑋=𝑥 ln 𝑄(𝑐, 𝑥) + 𝐸 𝑥∼𝑝 𝐺 𝑋 𝐷KL 𝑝 𝐶 𝑋 = 𝑥 ||𝑄 𝐶, 𝑥
≥ 𝐸 𝑥∼𝑝 𝐺 𝑋 ,𝑐∼𝑝 𝐶|𝑋=𝑥 ln 𝑄(𝑐, 𝑥)
Variational Maximization of MI
Maximizing 𝐿𝐼 𝐺, 𝑄 w.r.t. 𝐺 and 𝑄
 With 𝑄(𝑐, 𝑥) approximating 𝑝 𝐶 = 𝑐 𝑋 = 𝑥 , we obtain
an variational estimate of the mutual information:
𝐿𝐼 𝐺, 𝑄 ≡ 𝐸 𝑥∼𝑝 𝐺 𝑋 ,𝑐∼𝑝 𝐶|𝑋=𝑥 ln 𝑄 𝑐, 𝑥 + 𝐻 𝐶
≲ 𝐼 𝐶 𝑋 = 𝐺 𝑍, 𝐶
⇔
• Achieving the equality by setting 𝑄 𝑐, 𝑥 = 𝑝 𝐶 = 𝑐 𝑋 = 𝑥
• Maximizing the mutual information
min
𝐺,𝑄
max
𝐷
𝑉GAN 𝐺, 𝐷 − 𝜆𝐿𝐼 𝐺, 𝑄
Optimization problem to solve in InfoGAN:
Eliminate sampling from posterior
Lemma
𝐸 𝑥∼𝑝 𝑋 ,𝑦∼𝑝 𝑌 𝑋=𝑥) 𝑓 𝑥, 𝑦 = 𝐸 𝑥∼𝑝 𝑋 ,𝑦∼𝑝 𝑌 𝑋=𝑥),𝑥′∼𝑝 𝑋′ 𝑌=𝑦) 𝑓 𝑥′
, 𝑦 .
𝐸 𝑥∼𝑝 𝐺 𝑋 ,𝑐∼𝑝 𝐶|𝑋=𝑥 ln 𝑄 𝑐, 𝑥
= 𝐸 𝑐∼𝑝 𝐶 ,𝑧∼𝑝 𝑧 𝑍 ,𝑥=𝐺 𝑧,𝑐 ,𝑐∼𝑝 𝐶|𝑋=𝑥 ln 𝑄 𝑐, 𝑥 ,
By using this lemma and noting that
𝐸 𝑥∼𝑝 𝐺 𝑋 ,𝑐∼𝑝 𝐶|𝑋=𝑥 ln 𝑄 𝑐, 𝑥 = 𝑬 𝒄∼𝒑 𝑪 ,𝒛∼𝒑 𝒛 𝒁 ,𝒙=𝑮 𝒛,𝒄 𝐥𝐧 𝑸 𝒄, 𝒙
we can eliminate the sampling from 𝑝 𝐶|𝑋 = 𝑥 :
Easy to estimate!
Proof of lemma
Lemma
𝐸 𝑥∼𝑝 𝑋 ,𝑦∼𝑝 𝑌 𝑋=𝑥) 𝑓 𝑥, 𝑦 = 𝐸 𝑥∼𝑝 𝑋 ,𝑦∼𝑝 𝑌 𝑋=𝑥),𝑥′∼𝑝 𝑋′ 𝑌=𝑦) 𝑓 𝑥′
, 𝑦 .
∵ l. h. s. =
𝑥 𝑦
𝑝 𝑋 = 𝑥 𝑝 𝑌 = 𝑦 𝑋 = 𝑥 𝑓 𝑥, 𝑦
=
𝑥 𝑦
𝑝 𝑌 = 𝑦 𝑝 𝑋 = 𝑥 𝑌 = 𝑦 𝑓 𝑥, 𝑦
=
𝑥 𝑦 𝑥′
𝑝 𝑋 = 𝑥′, 𝑌 = 𝑦 𝑝 𝑋 = 𝑥 𝑌 = 𝑦 𝑓 𝑥, 𝑦
=
𝑥 𝑦 𝑥′
𝑝 𝑋 = 𝑥′ 𝑝 𝑌 = 𝑦 𝑋 = 𝑥′ 𝑝 𝑋 = 𝑥 𝑌 = 𝑦 𝑓 𝑥, 𝑦
= r. h. s.
∵ Bayes’ theorem
Sharing layers between 𝐷 and 𝑄
Model 𝑄 𝑐, 𝑥 using neural network
Reduce the calculation costs by
sharing all the convolution layers with 𝐷
Image from Odena, et al., arXiv:1610.09585.
Convolution layers of the discriminator
𝐷 𝑄
Given DCGANs,
InfoGAN comes for negligible additional costs!
Experiment – MI Maximization
• InfoGAN on MNIST dataset
• Latent code 𝑐
= 10-class categorical code
𝐿𝐼 quickly saturates to
𝐻 𝑐 = ln 10 ∼ 2.3 in InfoGAN
Figure 1 in the original paper
Experiment
– Disentangled Representation –
Figure 2 in the original paper
• InfoGAN on MNIST dataset
• Latent codes
 𝑐1: 10-class categorical code
 𝑐2, 𝑐3: continuous code
 𝑐1 can be used as a
classifier with 5% error
rate.
 𝑐2 and 𝑐3 captured the
rotation and width,
respectively
Experiment
– Disentangled Representation –
Dataset: P. Paysan, et al., AVSS, 2009, pp. 296–301.
Figure 3 in the original paper
Experiment
– Disentangled Representation –
Dataset: M. Aubry, et al., CVPR, 2014, pp. 3762–3769.
InfoGAN learned salient features without supervision
Figure 4 in the original paper
Experiment
– Disentangled Representation –
Dataset: Street View House Number
Figure 5 in the original paper
Experiment
– Disentangled Representation –
Dataset: CelebA
Figure 6 in the original paper
Future Prospect and Conclusion
Mutual information maximization can be applied to
other methods, e.g. VAE
Learning hierarchical latent representation
Improving semi-supervised learning
High-dimentional data discovery
Unsupervised learning of disentangled representations
Goal
GANs + Maximizing Mutual Information
between generated images and input codes
Approach
Benefit
Interpretable representation obtained
without supervision and substantial additional costs

Weitere ähnliche Inhalte

Was ist angesagt?

Random forest
Random forestRandom forest
Random forest
Ujjawal
 
Finding connections among images using CycleGAN
Finding connections among images using CycleGANFinding connections among images using CycleGAN
Finding connections among images using CycleGAN
NAVER Engineering
 

Was ist angesagt? (20)

Disentangled Representation Learning of Deep Generative Models
Disentangled Representation Learning of Deep Generative ModelsDisentangled Representation Learning of Deep Generative Models
Disentangled Representation Learning of Deep Generative Models
 
Anomaly Detection with GANs
Anomaly Detection with GANsAnomaly Detection with GANs
Anomaly Detection with GANs
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
 
Generative Adversarial Network (+Laplacian Pyramid GAN)
Generative Adversarial Network (+Laplacian Pyramid GAN)Generative Adversarial Network (+Laplacian Pyramid GAN)
Generative Adversarial Network (+Laplacian Pyramid GAN)
 
Optimization/Gradient Descent
Optimization/Gradient DescentOptimization/Gradient Descent
Optimization/Gradient Descent
 
Basic Generative Adversarial Networks
Basic Generative Adversarial NetworksBasic Generative Adversarial Networks
Basic Generative Adversarial Networks
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...
Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...
Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...
 
主成分分析
主成分分析主成分分析
主成分分析
 
Batch normalization presentation
Batch normalization presentationBatch normalization presentation
Batch normalization presentation
 
PAC-Bayesian Bound for Deep Learning
PAC-Bayesian Bound for Deep LearningPAC-Bayesian Bound for Deep Learning
PAC-Bayesian Bound for Deep Learning
 
NN_02_Threshold_Logic_Units.pdf
NN_02_Threshold_Logic_Units.pdfNN_02_Threshold_Logic_Units.pdf
NN_02_Threshold_Logic_Units.pdf
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRU
 
Score based generative model
Score based generative modelScore based generative model
Score based generative model
 
CNN Tutorial
CNN TutorialCNN Tutorial
CNN Tutorial
 
Pr057 mask rcnn
Pr057 mask rcnnPr057 mask rcnn
Pr057 mask rcnn
 
Applying deep learning to medical data
Applying deep learning to medical dataApplying deep learning to medical data
Applying deep learning to medical data
 
Neural network
Neural networkNeural network
Neural network
 
Random forest
Random forestRandom forest
Random forest
 
Finding connections among images using CycleGAN
Finding connections among images using CycleGANFinding connections among images using CycleGAN
Finding connections among images using CycleGAN
 

Andere mochten auch

時系列データ3
時系列データ3時系列データ3
時系列データ3
graySpace999
 

Andere mochten auch (18)

Introduction of "TrailBlazer" algorithm
Introduction of "TrailBlazer" algorithmIntroduction of "TrailBlazer" algorithm
Introduction of "TrailBlazer" algorithm
 
時系列データ3
時系列データ3時系列データ3
時系列データ3
 
Value iteration networks
Value iteration networksValue iteration networks
Value iteration networks
 
Fast and Probvably Seedings for k-Means
Fast and Probvably Seedings for k-MeansFast and Probvably Seedings for k-Means
Fast and Probvably Seedings for k-Means
 
Dual Learning for Machine Translation (NIPS 2016)
Dual Learning for Machine Translation (NIPS 2016)Dual Learning for Machine Translation (NIPS 2016)
Dual Learning for Machine Translation (NIPS 2016)
 
Interaction Networks for Learning about Objects, Relations and Physics
Interaction Networks for Learning about Objects, Relations and PhysicsInteraction Networks for Learning about Objects, Relations and Physics
Interaction Networks for Learning about Objects, Relations and Physics
 
Safe and Efficient Off-Policy Reinforcement Learning
Safe and Efficient Off-Policy Reinforcement LearningSafe and Efficient Off-Policy Reinforcement Learning
Safe and Efficient Off-Policy Reinforcement Learning
 
Conditional Image Generation with PixelCNN Decoders
Conditional Image Generation with PixelCNN DecodersConditional Image Generation with PixelCNN Decoders
Conditional Image Generation with PixelCNN Decoders
 
Learning to learn by gradient descent by gradient descent
Learning to learn by gradient descent by gradient descentLearning to learn by gradient descent by gradient descent
Learning to learn by gradient descent by gradient descent
 
Introduction of “Fairness in Learning: Classic and Contextual Bandits”
Introduction of “Fairness in Learning: Classic and Contextual Bandits”Introduction of “Fairness in Learning: Classic and Contextual Bandits”
Introduction of “Fairness in Learning: Classic and Contextual Bandits”
 
Improving Variational Inference with Inverse Autoregressive Flow
Improving Variational Inference with Inverse Autoregressive FlowImproving Variational Inference with Inverse Autoregressive Flow
Improving Variational Inference with Inverse Autoregressive Flow
 
[DL輪読会]Convolutional Sequence to Sequence Learning
[DL輪読会]Convolutional Sequence to Sequence Learning[DL輪読会]Convolutional Sequence to Sequence Learning
[DL輪読会]Convolutional Sequence to Sequence Learning
 
NIPS 2016 Overview and Deep Learning Topics
NIPS 2016 Overview and Deep Learning Topics  NIPS 2016 Overview and Deep Learning Topics
NIPS 2016 Overview and Deep Learning Topics
 
論文紹介 Combining Model-Based and Model-Free Updates for Trajectory-Centric Rein...
論文紹介 Combining Model-Based and Model-Free Updates for Trajectory-Centric Rein...論文紹介 Combining Model-Based and Model-Free Updates for Trajectory-Centric Rein...
論文紹介 Combining Model-Based and Model-Free Updates for Trajectory-Centric Rein...
 
Differential privacy without sensitivity [NIPS2016読み会資料]
Differential privacy without sensitivity [NIPS2016読み会資料]Differential privacy without sensitivity [NIPS2016読み会資料]
Differential privacy without sensitivity [NIPS2016読み会資料]
 
Matching networks for one shot learning
Matching networks for one shot learningMatching networks for one shot learning
Matching networks for one shot learning
 
ICML2016読み会 概要紹介
ICML2016読み会 概要紹介ICML2016読み会 概要紹介
ICML2016読み会 概要紹介
 
論文紹介 Pixel Recurrent Neural Networks
論文紹介 Pixel Recurrent Neural Networks論文紹介 Pixel Recurrent Neural Networks
論文紹介 Pixel Recurrent Neural Networks
 

Ähnlich wie InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

Histogram-Based Method for Effective Initialization of the K-Means Clustering...
Histogram-Based Method for Effective Initialization of the K-Means Clustering...Histogram-Based Method for Effective Initialization of the K-Means Clustering...
Histogram-Based Method for Effective Initialization of the K-Means Clustering...
Gingles Caroline
 

Ähnlich wie InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets (20)

20230213_ComputerVision_연구.pptx
20230213_ComputerVision_연구.pptx20230213_ComputerVision_연구.pptx
20230213_ComputerVision_연구.pptx
 
QTML2021 UAP Quantum Feature Map
QTML2021 UAP Quantum Feature MapQTML2021 UAP Quantum Feature Map
QTML2021 UAP Quantum Feature Map
 
Deep Residual Hashing Neural Network for Image Retrieval
Deep Residual Hashing Neural Network for Image RetrievalDeep Residual Hashing Neural Network for Image Retrieval
Deep Residual Hashing Neural Network for Image Retrieval
 
Non parametric bayesian learning in discrete data
Non parametric bayesian learning in discrete dataNon parametric bayesian learning in discrete data
Non parametric bayesian learning in discrete data
 
[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GAN[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GAN
 
03 Data Mining Techniques
03 Data Mining Techniques03 Data Mining Techniques
03 Data Mining Techniques
 
Patch-wise Deep Metric Learning for Unsupervised Low-Dose CT Denoising
Patch-wise Deep Metric Learning for Unsupervised Low-Dose CT DenoisingPatch-wise Deep Metric Learning for Unsupervised Low-Dose CT Denoising
Patch-wise Deep Metric Learning for Unsupervised Low-Dose CT Denoising
 
Learning a nonlinear embedding by preserving class neibourhood structure 최종
Learning a nonlinear embedding by preserving class neibourhood structure   최종Learning a nonlinear embedding by preserving class neibourhood structure   최종
Learning a nonlinear embedding by preserving class neibourhood structure 최종
 
Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)
 
Tutorial Equivariance in Imaging ICMS 23.pptx
Tutorial Equivariance in Imaging ICMS 23.pptxTutorial Equivariance in Imaging ICMS 23.pptx
Tutorial Equivariance in Imaging ICMS 23.pptx
 
GDC2019 - SEED - Towards Deep Generative Models in Game Development
GDC2019 - SEED - Towards Deep Generative Models in Game DevelopmentGDC2019 - SEED - Towards Deep Generative Models in Game Development
GDC2019 - SEED - Towards Deep Generative Models in Game Development
 
Compressed Sensing using Generative Model
Compressed Sensing using Generative ModelCompressed Sensing using Generative Model
Compressed Sensing using Generative Model
 
Koh_Liang_ICML2017
Koh_Liang_ICML2017Koh_Liang_ICML2017
Koh_Liang_ICML2017
 
Histogram-Based Method for Effective Initialization of the K-Means Clustering...
Histogram-Based Method for Effective Initialization of the K-Means Clustering...Histogram-Based Method for Effective Initialization of the K-Means Clustering...
Histogram-Based Method for Effective Initialization of the K-Means Clustering...
 
그림 그리는 AI
그림 그리는 AI그림 그리는 AI
그림 그리는 AI
 
Artificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsArtificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning Models
 
Artificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsArtificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning Models
 
Artificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsArtificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning Models
 
Mncs 16-10-1주-변승규-introduction to the machine learning #2
Mncs 16-10-1주-변승규-introduction to the machine learning #2Mncs 16-10-1주-변승규-introduction to the machine learning #2
Mncs 16-10-1주-변승규-introduction to the machine learning #2
 
Paper review: Measuring the Intrinsic Dimension of Objective Landscapes.
Paper review: Measuring the Intrinsic Dimension of Objective Landscapes.Paper review: Measuring the Intrinsic Dimension of Objective Landscapes.
Paper review: Measuring the Intrinsic Dimension of Objective Landscapes.
 

Kürzlich hochgeladen

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Kürzlich hochgeladen (20)

Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 

InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

  • 1. InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, Pieter Abbeel (UC Berkeley, Open AI) Presenter: Shuhei M. Yoshida (Dept. of Physics, UTokyo) Unsupervised learning of disentangled representations Goal GANs + Maximizing Mutual Information between generated images and input codes Approach Benefit Interpretable representation obtained without supervision and substantial additional costs Reference https://arxiv.org/abs/1606.03657 (with Appendix sections) Implementations https://github.com/openai/InfoGAN (by the authors, with TensorFlow) https://github.com/yoshum/InfoGAN (by the presenter, with Chainer) NIPS2016読み会
  • 2. Motivation How can we achieve unsupervised learning of disentangled representation? In general, learned representation is entangled, i.e. encoded in a data space in a complicated manner When a representation is disentangled, it would be more interpretable and easier to apply to tasks
  • 3. Related works • Unsupervised learning of representation (no mechanism to force disentanglement) Stacked (often denoising) autoencoder, RBM Many others, including semi-supervised approach • Supervised learning of disentangled representation Bilinear models, multi-view perceptron VAEs, adversarial autoencoders • Weakly supervised learning of disentangled representation disBM, DC-IGN • Unsupervised learning of disentangled representation hossRBM, applicable only to discrete latent factors which the presenter has almost no knowledge about. This work: Unsupervised learning of disentangled representation applicable to both continuous and discrete latent factors
  • 4. Generative Adversarial Nets(GANs) Generative model trained by competition between two neural nets: Generator 𝑥 = 𝐺 𝑧 , 𝑧 ∼ 𝑝 𝑧 𝑍 𝑝 𝑧 𝑍 : an arbitrary noise distribution Discriminator 𝐷 𝑥 ∈ 0,1 : probability that 𝑥 is sampled from the data dist. 𝑝data(𝑋) rather than generated by the generator 𝐺 𝑧 min 𝐺 max 𝐷 𝑉GAN 𝐺, 𝐷 , where 𝑉GAN 𝐺, 𝐷 ≡ 𝐸 𝑥∼𝑝data 𝑋 ln 𝐷 𝑥 + 𝐸 𝑧∼𝑝 𝑧 𝑍 ln 1 − 𝐷 𝐺 𝑧 Optimization problem to solve:
  • 5. Problems with GANs From the perspective of representation learning: No restrictions on how 𝐺 𝑧 uses 𝑧 • 𝑧 can be used in a highly entangled way • Each dimension of 𝑧 does not represent any salient feature of the training data 𝑧1 𝑧2
  • 6. Proposed Resolution: InfoGAN -Maximizing Mutual Information - Observation in conventional GANs: a generated date 𝑥 does not have much information on the noise 𝑧 from which 𝑥 is generated because of heavily entangled use of 𝑧 Proposed resolution = InfoGAN: the generator 𝐺 𝑧, 𝑐 trained so that it maximize the mutual information 𝐼 𝐶 𝑋 between the latent code 𝐶 and the generated data 𝑋 min 𝐺 max 𝐷 𝑉GAN 𝐺, 𝐷 − 𝜆𝐼 𝐶 𝑋 = 𝐺 𝑍, 𝐶
  • 7. Mutual Information 𝐼 𝑋; 𝑌 = 𝐻 𝑋 − 𝐻 𝑋 𝑌 , where • 𝐻 𝑋 = 𝐸 𝑥∼𝑝 𝑋 − ln 𝑝 𝑋 = 𝑥 : Entropy of the prior distribution • 𝐻 𝑋 𝑌 = 𝐸 𝑦∼𝑝 𝑌 ,𝑥∼𝑝 𝑋|𝑌=𝑦 − ln 𝑝 𝑋 = 𝑥 𝑌 = 𝑦 : Entropy of the posterior distribution 𝑝 𝑋 = 𝑥 𝑥 𝑝 𝑋 = 𝑥|𝑌 = 𝑦 𝑥 𝑝 𝑋 = 𝑥|𝑌 = 𝑦 𝑥 𝐼 𝑋; 𝑌 = 0 𝐼 𝑋; 𝑌 > 0 Sampling 𝑦 ∼ 𝑝 𝑌
  • 8. Avoiding increase of calculation costs Major difficulty: Evaluation of 𝐼 𝐶 𝑋 based on evaluation and sampling from the posterior 𝑝 𝐶 𝑋 Two strategies: Variational maximization of mutual information Use an approximate function 𝑄 𝑐 𝑥 = 𝑝 𝐶 = 𝑐 𝑋 = 𝑥 Sharing the neural net between 𝑄 𝑐 𝑥 and the discriminator 𝐷 𝑥
  • 9. Variational Maximization of MI For an arbitrary function 𝑄 𝑐, 𝑥 , 𝐸 𝑥∼𝑝 𝐺 𝑋 ,𝑐∼𝑝 𝐶|𝑋=𝑥 ln 𝑝 𝐶 = 𝑐 𝑋 = 𝑥 (∵ positivity of KL divergence) = 𝐸 𝑥∼𝑝 𝐺 𝑋 ,𝑐∼𝑝 𝐶|𝑋=𝑥 ln 𝑄(𝑐, 𝑥) + 𝐸 𝑥∼𝑝 𝐺 𝑋 ,𝑐∼𝑝 𝐶|𝑋=𝑥 ln 𝑝 𝐶 = 𝑐 𝑋 = 𝑥 𝑄 𝑐, 𝑥 = 𝐸 𝑥∼𝑝 𝐺 𝑋 ,𝑐∼𝑝 𝐶|𝑋=𝑥 ln 𝑄(𝑐, 𝑥) + 𝐸 𝑥∼𝑝 𝐺 𝑋 𝐷KL 𝑝 𝐶 𝑋 = 𝑥 ||𝑄 𝐶, 𝑥 ≥ 𝐸 𝑥∼𝑝 𝐺 𝑋 ,𝑐∼𝑝 𝐶|𝑋=𝑥 ln 𝑄(𝑐, 𝑥)
  • 10. Variational Maximization of MI Maximizing 𝐿𝐼 𝐺, 𝑄 w.r.t. 𝐺 and 𝑄  With 𝑄(𝑐, 𝑥) approximating 𝑝 𝐶 = 𝑐 𝑋 = 𝑥 , we obtain an variational estimate of the mutual information: 𝐿𝐼 𝐺, 𝑄 ≡ 𝐸 𝑥∼𝑝 𝐺 𝑋 ,𝑐∼𝑝 𝐶|𝑋=𝑥 ln 𝑄 𝑐, 𝑥 + 𝐻 𝐶 ≲ 𝐼 𝐶 𝑋 = 𝐺 𝑍, 𝐶 ⇔ • Achieving the equality by setting 𝑄 𝑐, 𝑥 = 𝑝 𝐶 = 𝑐 𝑋 = 𝑥 • Maximizing the mutual information min 𝐺,𝑄 max 𝐷 𝑉GAN 𝐺, 𝐷 − 𝜆𝐿𝐼 𝐺, 𝑄 Optimization problem to solve in InfoGAN:
  • 11. Eliminate sampling from posterior Lemma 𝐸 𝑥∼𝑝 𝑋 ,𝑦∼𝑝 𝑌 𝑋=𝑥) 𝑓 𝑥, 𝑦 = 𝐸 𝑥∼𝑝 𝑋 ,𝑦∼𝑝 𝑌 𝑋=𝑥),𝑥′∼𝑝 𝑋′ 𝑌=𝑦) 𝑓 𝑥′ , 𝑦 . 𝐸 𝑥∼𝑝 𝐺 𝑋 ,𝑐∼𝑝 𝐶|𝑋=𝑥 ln 𝑄 𝑐, 𝑥 = 𝐸 𝑐∼𝑝 𝐶 ,𝑧∼𝑝 𝑧 𝑍 ,𝑥=𝐺 𝑧,𝑐 ,𝑐∼𝑝 𝐶|𝑋=𝑥 ln 𝑄 𝑐, 𝑥 , By using this lemma and noting that 𝐸 𝑥∼𝑝 𝐺 𝑋 ,𝑐∼𝑝 𝐶|𝑋=𝑥 ln 𝑄 𝑐, 𝑥 = 𝑬 𝒄∼𝒑 𝑪 ,𝒛∼𝒑 𝒛 𝒁 ,𝒙=𝑮 𝒛,𝒄 𝐥𝐧 𝑸 𝒄, 𝒙 we can eliminate the sampling from 𝑝 𝐶|𝑋 = 𝑥 : Easy to estimate!
  • 12. Proof of lemma Lemma 𝐸 𝑥∼𝑝 𝑋 ,𝑦∼𝑝 𝑌 𝑋=𝑥) 𝑓 𝑥, 𝑦 = 𝐸 𝑥∼𝑝 𝑋 ,𝑦∼𝑝 𝑌 𝑋=𝑥),𝑥′∼𝑝 𝑋′ 𝑌=𝑦) 𝑓 𝑥′ , 𝑦 . ∵ l. h. s. = 𝑥 𝑦 𝑝 𝑋 = 𝑥 𝑝 𝑌 = 𝑦 𝑋 = 𝑥 𝑓 𝑥, 𝑦 = 𝑥 𝑦 𝑝 𝑌 = 𝑦 𝑝 𝑋 = 𝑥 𝑌 = 𝑦 𝑓 𝑥, 𝑦 = 𝑥 𝑦 𝑥′ 𝑝 𝑋 = 𝑥′, 𝑌 = 𝑦 𝑝 𝑋 = 𝑥 𝑌 = 𝑦 𝑓 𝑥, 𝑦 = 𝑥 𝑦 𝑥′ 𝑝 𝑋 = 𝑥′ 𝑝 𝑌 = 𝑦 𝑋 = 𝑥′ 𝑝 𝑋 = 𝑥 𝑌 = 𝑦 𝑓 𝑥, 𝑦 = r. h. s. ∵ Bayes’ theorem
  • 13. Sharing layers between 𝐷 and 𝑄 Model 𝑄 𝑐, 𝑥 using neural network Reduce the calculation costs by sharing all the convolution layers with 𝐷 Image from Odena, et al., arXiv:1610.09585. Convolution layers of the discriminator 𝐷 𝑄 Given DCGANs, InfoGAN comes for negligible additional costs!
  • 14. Experiment – MI Maximization • InfoGAN on MNIST dataset • Latent code 𝑐 = 10-class categorical code 𝐿𝐼 quickly saturates to 𝐻 𝑐 = ln 10 ∼ 2.3 in InfoGAN Figure 1 in the original paper
  • 15. Experiment – Disentangled Representation – Figure 2 in the original paper • InfoGAN on MNIST dataset • Latent codes  𝑐1: 10-class categorical code  𝑐2, 𝑐3: continuous code  𝑐1 can be used as a classifier with 5% error rate.  𝑐2 and 𝑐3 captured the rotation and width, respectively
  • 16. Experiment – Disentangled Representation – Dataset: P. Paysan, et al., AVSS, 2009, pp. 296–301. Figure 3 in the original paper
  • 17. Experiment – Disentangled Representation – Dataset: M. Aubry, et al., CVPR, 2014, pp. 3762–3769. InfoGAN learned salient features without supervision Figure 4 in the original paper
  • 18. Experiment – Disentangled Representation – Dataset: Street View House Number Figure 5 in the original paper
  • 19. Experiment – Disentangled Representation – Dataset: CelebA Figure 6 in the original paper
  • 20. Future Prospect and Conclusion Mutual information maximization can be applied to other methods, e.g. VAE Learning hierarchical latent representation Improving semi-supervised learning High-dimentional data discovery Unsupervised learning of disentangled representations Goal GANs + Maximizing Mutual Information between generated images and input codes Approach Benefit Interpretable representation obtained without supervision and substantial additional costs