SlideShare ist ein Scribd-Unternehmen logo
1 von 55
Info-Wasserstein-GAIL
Yunzhu Li, Jiaming Song, Stefano Ermon, “Inferring The Latent Structure
of Human Decision-Making from Raw Visual Inputs”, ArXiv, 2017
Sungjoon Choi
(sungjoon.choi@cpslab.snu.ac.kr)
Latent Structure of Human Demos
2
Pass / code: 0 Pass / code: 1
Turn/ code: 0 Turn/ code: 1
• Introduction
• Backgrounds
• Generative Adversarial Imitation Learning (GAIL)
• Policy gradient
• InfoGAN
• Wasserstein GAN
• InfoGAIL
• Experiments
Contents
3
• Goal of imitation learning is to match expert
behavior.
• However, demonstrations often show significant
variability due to latent factors.
• This paper presents an Info-GAIL algorithm that
can infer the latent structure of human decision
making.
• This method can not only imitate, but also learn
interpretable representations.
Imitation Learning
4
• The goal of this paper is to develop an imitation
learning framework that is able to autonomously
discover and disentangle the latent factors of
variation underlying human decision making.
• Basically, this paper combines generative
adversarial imitation learning (GAIL), Info GAN,
and Wasserstein GAN with some reward
heuristics
Introduction
5
• We will NOT go into details.
GAIL
6
• But, we will see some basics of policy gradient methods.
Policy Gradient
7
Now we Get rid of expectation
over a policy function!!
Policy Gradient
8
Step-based PG
9
Step-based PG
10
In other words, now we are considering a dynamic model!
Step-based PG
11
We do NOT have to care about
complex models in an MDP, anymore!
Step-based PG (REINFORCE)
12
Now, we have REINFORCE algorithm!
This method has been used in many deep learning methods
where the objective function is NOT differentiable.
Step-based PG (PG)
13
For all trajectories, and for all instances in a trajectory,
the PG is simply weighted MLE where the weight is defined by
the sum of future rewards, or Q value.
• Now, we know where (18) came from, right?
GAIL
14
• Interpretable Imitation Learning
• Utilized information theoretic regularization.
• Simply added InfoGAN to GAIL.
• Utilizing Raw Visual Inputs via Transfer Learning
• Used a Deep Residual Network.
Visual InfoGAIL
15
• Rather than using a single unstructured noise vector,
InfoGAN decomposes the input noise vector into two
parts: (1) z, incompressible noise and (2) c, the latent code
that targets the salient structured semantic features of the
data distribution.
• InfoGAN proposes an information-theoretic regularization:
there should be high mutual information between latent
codes c and generator distribution G(z, c). Thus I(c; G(z, c))
should be high.
InfoGAN
16
• Reward Augmentation
• A general framework to incorporate prior knowledge in imitation
learning by providing additional incentives to the agent without
interfering with the imitation learning process.
• Added a surrogate state-based reward that reflects our biases over
the desired behaviors.
• Can be seen as
• a hybrid between imitation and reinforcement learning
• side information provided to the generator
• Wasserstein GAN (WGAN)
• The discrimination network in WGAN solves a regression problem
instead of a classification problem.
• Suffers less from the vanishing gradient and mode collapse problem.
Improved Optimization
17
• Wasserstein Generative Adversarial Learning
WGAN?
18
Example 1 in WGAN
WGAN, practically
48
• Variance Reduction
• Reduce variance in policy gradient method.
• Replay buffer method with prioritized replay.
• Good for the cases where the rewards are rare.
• Baseline variance reduction methods.
Improved Optimization
49
Finally, InfoGAIL
50
Sample data similar to InfoGAN
Update D similar to WGAN.
Initialize policy from behavior cloning
Update Q similar to GAN or GAIL.
Update Policy with TRPO.
Network Architectures
51
Latent codes are
added to G
Latent codes are also
added to D
Actions are added to D
The posterior network Q adopts the same
architecture as D except that the output is
a softmax over the discrete latent variables,
or factored Gaussian over continuous
latent variables.
Input Image
Action
Disc. Latent Code Cont. Latent Code
G (policy)
Input Image
Action Disc. Latent Code
D (cost)
Score
Input Image
Action Disc. Latent Code
Q (regularizer)
Disc. Latent Code Cont. Latent Code
Train policy function G with TRPO, and iterate.
Experiments
53
Pass / code: 0 Pass / code: 1
Turn/ code: 0 Turn/ code: 1
Experiments
54
InfoGAIL

Weitere ähnliche Inhalte

Was ist angesagt?

GANs Presentation.pptx
GANs Presentation.pptxGANs Presentation.pptx
GANs Presentation.pptxMAHMOUD729246
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks남주 김
 
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networksDing Li
 
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기NAVER Engineering
 
Machine Learning With Logistic Regression
Machine Learning  With Logistic RegressionMachine Learning  With Logistic Regression
Machine Learning With Logistic RegressionKnoldus Inc.
 
A Short Introduction to Generative Adversarial Networks
A Short Introduction to Generative Adversarial NetworksA Short Introduction to Generative Adversarial Networks
A Short Introduction to Generative Adversarial NetworksJong Wook Kim
 
Basic Generative Adversarial Networks
Basic Generative Adversarial NetworksBasic Generative Adversarial Networks
Basic Generative Adversarial NetworksDong Heon Cho
 
GANs and Applications
GANs and ApplicationsGANs and Applications
GANs and ApplicationsHoang Nguyen
 
Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)Manohar Mukku
 
Introduction to Principle Component Analysis
Introduction to Principle Component AnalysisIntroduction to Principle Component Analysis
Introduction to Principle Component AnalysisSunjeet Jena
 
Feature selection
Feature selectionFeature selection
Feature selectiondkpawar
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual IntroductionLukas Masuch
 
Machine Learning project presentation
Machine Learning project presentationMachine Learning project presentation
Machine Learning project presentationRamandeep Kaur Bagri
 
Machine Learning Final presentation
Machine Learning Final presentation Machine Learning Final presentation
Machine Learning Final presentation AyanaRukasar
 
Probabilistic modeling in deep learning
Probabilistic modeling in deep learningProbabilistic modeling in deep learning
Probabilistic modeling in deep learningDenis Dus
 
Overfitting & Underfitting
Overfitting & UnderfittingOverfitting & Underfitting
Overfitting & UnderfittingSOUMIT KAR
 

Was ist angesagt? (20)

GANs Presentation.pptx
GANs Presentation.pptxGANs Presentation.pptx
GANs Presentation.pptx
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
 
Shap
ShapShap
Shap
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
 
Machine Learning With Logistic Regression
Machine Learning  With Logistic RegressionMachine Learning  With Logistic Regression
Machine Learning With Logistic Regression
 
A Short Introduction to Generative Adversarial Networks
A Short Introduction to Generative Adversarial NetworksA Short Introduction to Generative Adversarial Networks
A Short Introduction to Generative Adversarial Networks
 
Basic Generative Adversarial Networks
Basic Generative Adversarial NetworksBasic Generative Adversarial Networks
Basic Generative Adversarial Networks
 
GANs and Applications
GANs and ApplicationsGANs and Applications
GANs and Applications
 
Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)
 
Introduction to Principle Component Analysis
Introduction to Principle Component AnalysisIntroduction to Principle Component Analysis
Introduction to Principle Component Analysis
 
Bayesian networks
Bayesian networksBayesian networks
Bayesian networks
 
Feature selection
Feature selectionFeature selection
Feature selection
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
 
Machine Learning project presentation
Machine Learning project presentationMachine Learning project presentation
Machine Learning project presentation
 
Perceptron
PerceptronPerceptron
Perceptron
 
Machine Learning Final presentation
Machine Learning Final presentation Machine Learning Final presentation
Machine Learning Final presentation
 
Probabilistic modeling in deep learning
Probabilistic modeling in deep learningProbabilistic modeling in deep learning
Probabilistic modeling in deep learning
 
Overfitting & Underfitting
Overfitting & UnderfittingOverfitting & Underfitting
Overfitting & Underfitting
 

Ähnlich wie InfoGAIL

InfoGAN and Generative Adversarial Networks
InfoGAN and Generative Adversarial NetworksInfoGAN and Generative Adversarial Networks
InfoGAN and Generative Adversarial NetworksZak Jost
 
Reading group gan - 20170417
Reading group   gan - 20170417Reading group   gan - 20170417
Reading group gan - 20170417Shuai Zhang
 
NS - CUK Seminar: S.T.Nguyen, Review on "DropAGG: Robust Graph Neural Network...
NS - CUK Seminar: S.T.Nguyen, Review on "DropAGG: Robust Graph Neural Network...NS - CUK Seminar: S.T.Nguyen, Review on "DropAGG: Robust Graph Neural Network...
NS - CUK Seminar: S.T.Nguyen, Review on "DropAGG: Robust Graph Neural Network...ssuser4b1f48
 
Preference learning for guiding the tree searches in continuous POMDPs (CoRL ...
Preference learning for guiding the tree searches in continuous POMDPs (CoRL ...Preference learning for guiding the tree searches in continuous POMDPs (CoRL ...
Preference learning for guiding the tree searches in continuous POMDPs (CoRL ...Jisu Han
 
Clustering using GA and Hill-climbing
Clustering using GA and Hill-climbingClustering using GA and Hill-climbing
Clustering using GA and Hill-climbingFatemeh Karimi
 
GNA 13552928 deep learning for GAN a.ppt
GNA 13552928 deep learning for GAN a.pptGNA 13552928 deep learning for GAN a.ppt
GNA 13552928 deep learning for GAN a.pptManiMaran230751
 
Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...Madhav Mishra
 
Gans - Generative Adversarial Nets
Gans - Generative Adversarial NetsGans - Generative Adversarial Nets
Gans - Generative Adversarial NetsSajalRastogi8
 
Generative Adversarial Networks 2
Generative Adversarial Networks 2Generative Adversarial Networks 2
Generative Adversarial Networks 2Alireza Shafaei
 
Deep learning architectures
Deep learning architecturesDeep learning architectures
Deep learning architecturesJoe li
 
consistency regularization for generative adversarial networks_review
consistency regularization for generative adversarial networks_reviewconsistency regularization for generative adversarial networks_review
consistency regularization for generative adversarial networks_reviewYoonho Na
 
Using GANs to improve generalization in a semi-supervised setting - trying it...
Using GANs to improve generalization in a semi-supervised setting - trying it...Using GANs to improve generalization in a semi-supervised setting - trying it...
Using GANs to improve generalization in a semi-supervised setting - trying it...PyData
 
Semi-supervised learning with GANs
Semi-supervised learning with GANsSemi-supervised learning with GANs
Semi-supervised learning with GANsterek47
 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial IndustrySubrat Panda, PhD
 
Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)Prakhar Rastogi
 
NS-CUK Seminar :J.H.Lee, "Review on "Similarity Preserving Adversarial Graph ...
NS-CUK Seminar :J.H.Lee, "Review on "Similarity Preserving Adversarial Graph ...NS-CUK Seminar :J.H.Lee, "Review on "Similarity Preserving Adversarial Graph ...
NS-CUK Seminar :J.H.Lee, "Review on "Similarity Preserving Adversarial Graph ...ssuser4b1f48
 
Tutorial on Theory and Application of Generative Adversarial Networks
Tutorial on Theory and Application of Generative Adversarial NetworksTutorial on Theory and Application of Generative Adversarial Networks
Tutorial on Theory and Application of Generative Adversarial NetworksMLReview
 
Entity embeddings for categorical data
Entity embeddings for categorical dataEntity embeddings for categorical data
Entity embeddings for categorical dataPaul Skeie
 

Ähnlich wie InfoGAIL (20)

InfoGAN and Generative Adversarial Networks
InfoGAN and Generative Adversarial NetworksInfoGAN and Generative Adversarial Networks
InfoGAN and Generative Adversarial Networks
 
Reading group gan - 20170417
Reading group   gan - 20170417Reading group   gan - 20170417
Reading group gan - 20170417
 
NS - CUK Seminar: S.T.Nguyen, Review on "DropAGG: Robust Graph Neural Network...
NS - CUK Seminar: S.T.Nguyen, Review on "DropAGG: Robust Graph Neural Network...NS - CUK Seminar: S.T.Nguyen, Review on "DropAGG: Robust Graph Neural Network...
NS - CUK Seminar: S.T.Nguyen, Review on "DropAGG: Robust Graph Neural Network...
 
Paper review
Paper reviewPaper review
Paper review
 
Preference learning for guiding the tree searches in continuous POMDPs (CoRL ...
Preference learning for guiding the tree searches in continuous POMDPs (CoRL ...Preference learning for guiding the tree searches in continuous POMDPs (CoRL ...
Preference learning for guiding the tree searches in continuous POMDPs (CoRL ...
 
Machine Learning Tools and Particle Swarm Optimization for Content-Based Sear...
Machine Learning Tools and Particle Swarm Optimization for Content-Based Sear...Machine Learning Tools and Particle Swarm Optimization for Content-Based Sear...
Machine Learning Tools and Particle Swarm Optimization for Content-Based Sear...
 
Clustering using GA and Hill-climbing
Clustering using GA and Hill-climbingClustering using GA and Hill-climbing
Clustering using GA and Hill-climbing
 
GNA 13552928 deep learning for GAN a.ppt
GNA 13552928 deep learning for GAN a.pptGNA 13552928 deep learning for GAN a.ppt
GNA 13552928 deep learning for GAN a.ppt
 
Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...
 
Gans - Generative Adversarial Nets
Gans - Generative Adversarial NetsGans - Generative Adversarial Nets
Gans - Generative Adversarial Nets
 
Generative Adversarial Networks 2
Generative Adversarial Networks 2Generative Adversarial Networks 2
Generative Adversarial Networks 2
 
Deep learning architectures
Deep learning architecturesDeep learning architectures
Deep learning architectures
 
consistency regularization for generative adversarial networks_review
consistency regularization for generative adversarial networks_reviewconsistency regularization for generative adversarial networks_review
consistency regularization for generative adversarial networks_review
 
Using GANs to improve generalization in a semi-supervised setting - trying it...
Using GANs to improve generalization in a semi-supervised setting - trying it...Using GANs to improve generalization in a semi-supervised setting - trying it...
Using GANs to improve generalization in a semi-supervised setting - trying it...
 
Semi-supervised learning with GANs
Semi-supervised learning with GANsSemi-supervised learning with GANs
Semi-supervised learning with GANs
 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial Industry
 
Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)
 
NS-CUK Seminar :J.H.Lee, "Review on "Similarity Preserving Adversarial Graph ...
NS-CUK Seminar :J.H.Lee, "Review on "Similarity Preserving Adversarial Graph ...NS-CUK Seminar :J.H.Lee, "Review on "Similarity Preserving Adversarial Graph ...
NS-CUK Seminar :J.H.Lee, "Review on "Similarity Preserving Adversarial Graph ...
 
Tutorial on Theory and Application of Generative Adversarial Networks
Tutorial on Theory and Application of Generative Adversarial NetworksTutorial on Theory and Application of Generative Adversarial Networks
Tutorial on Theory and Application of Generative Adversarial Networks
 
Entity embeddings for categorical data
Entity embeddings for categorical dataEntity embeddings for categorical data
Entity embeddings for categorical data
 

Mehr von Sungjoon Choi

RNN and its applications
RNN and its applicationsRNN and its applications
RNN and its applicationsSungjoon Choi
 
Hybrid computing using a neural network with dynamic external memory
Hybrid computing using a neural network with dynamic external memoryHybrid computing using a neural network with dynamic external memory
Hybrid computing using a neural network with dynamic external memorySungjoon Choi
 
Modeling uncertainty in deep learning
Modeling uncertainty in deep learning Modeling uncertainty in deep learning
Modeling uncertainty in deep learning Sungjoon Choi
 
Gaussian Process Latent Variable Model
Gaussian Process Latent Variable ModelGaussian Process Latent Variable Model
Gaussian Process Latent Variable ModelSungjoon Choi
 
Uncertainty Modeling in Deep Learning
Uncertainty Modeling in Deep LearningUncertainty Modeling in Deep Learning
Uncertainty Modeling in Deep LearningSungjoon Choi
 
Recent Trends in Deep Learning
Recent Trends in Deep LearningRecent Trends in Deep Learning
Recent Trends in Deep LearningSungjoon Choi
 
Leveraged Gaussian Process
Leveraged Gaussian ProcessLeveraged Gaussian Process
Leveraged Gaussian ProcessSungjoon Choi
 
Domain Adaptation Methods
Domain Adaptation MethodsDomain Adaptation Methods
Domain Adaptation MethodsSungjoon Choi
 
Connection between Bellman equation and Markov Decision Processes
Connection between Bellman equation and Markov Decision ProcessesConnection between Bellman equation and Markov Decision Processes
Connection between Bellman equation and Markov Decision ProcessesSungjoon Choi
 
Kernel, RKHS, and Gaussian Processes
Kernel, RKHS, and Gaussian ProcessesKernel, RKHS, and Gaussian Processes
Kernel, RKHS, and Gaussian ProcessesSungjoon Choi
 
Inverse Reinforcement Learning Algorithms
Inverse Reinforcement Learning AlgorithmsInverse Reinforcement Learning Algorithms
Inverse Reinforcement Learning AlgorithmsSungjoon Choi
 
Value iteration networks
Value iteration networksValue iteration networks
Value iteration networksSungjoon Choi
 
Deep Learning in Robotics
Deep Learning in RoboticsDeep Learning in Robotics
Deep Learning in RoboticsSungjoon Choi
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer VisionSungjoon Choi
 
Semantic Segmentation Methods using Deep Learning
Semantic Segmentation Methods using Deep LearningSemantic Segmentation Methods using Deep Learning
Semantic Segmentation Methods using Deep LearningSungjoon Choi
 
Object Detection Methods using Deep Learning
Object Detection Methods using Deep LearningObject Detection Methods using Deep Learning
Object Detection Methods using Deep LearningSungjoon Choi
 
TensorFlow Tutorial Part2
TensorFlow Tutorial Part2TensorFlow Tutorial Part2
TensorFlow Tutorial Part2Sungjoon Choi
 

Mehr von Sungjoon Choi (20)

RNN and its applications
RNN and its applicationsRNN and its applications
RNN and its applications
 
Hybrid computing using a neural network with dynamic external memory
Hybrid computing using a neural network with dynamic external memoryHybrid computing using a neural network with dynamic external memory
Hybrid computing using a neural network with dynamic external memory
 
Modeling uncertainty in deep learning
Modeling uncertainty in deep learning Modeling uncertainty in deep learning
Modeling uncertainty in deep learning
 
Gaussian Process Latent Variable Model
Gaussian Process Latent Variable ModelGaussian Process Latent Variable Model
Gaussian Process Latent Variable Model
 
Uncertainty Modeling in Deep Learning
Uncertainty Modeling in Deep LearningUncertainty Modeling in Deep Learning
Uncertainty Modeling in Deep Learning
 
Recent Trends in Deep Learning
Recent Trends in Deep LearningRecent Trends in Deep Learning
Recent Trends in Deep Learning
 
Leveraged Gaussian Process
Leveraged Gaussian ProcessLeveraged Gaussian Process
Leveraged Gaussian Process
 
LevDNN
LevDNNLevDNN
LevDNN
 
IROS 2017 Slides
IROS 2017 SlidesIROS 2017 Slides
IROS 2017 Slides
 
Domain Adaptation Methods
Domain Adaptation MethodsDomain Adaptation Methods
Domain Adaptation Methods
 
Connection between Bellman equation and Markov Decision Processes
Connection between Bellman equation and Markov Decision ProcessesConnection between Bellman equation and Markov Decision Processes
Connection between Bellman equation and Markov Decision Processes
 
Kernel, RKHS, and Gaussian Processes
Kernel, RKHS, and Gaussian ProcessesKernel, RKHS, and Gaussian Processes
Kernel, RKHS, and Gaussian Processes
 
Inverse Reinforcement Learning Algorithms
Inverse Reinforcement Learning AlgorithmsInverse Reinforcement Learning Algorithms
Inverse Reinforcement Learning Algorithms
 
Value iteration networks
Value iteration networksValue iteration networks
Value iteration networks
 
Deep Learning in Robotics
Deep Learning in RoboticsDeep Learning in Robotics
Deep Learning in Robotics
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer Vision
 
Semantic Segmentation Methods using Deep Learning
Semantic Segmentation Methods using Deep LearningSemantic Segmentation Methods using Deep Learning
Semantic Segmentation Methods using Deep Learning
 
Object Detection Methods using Deep Learning
Object Detection Methods using Deep LearningObject Detection Methods using Deep Learning
Object Detection Methods using Deep Learning
 
CNN Tutorial
CNN TutorialCNN Tutorial
CNN Tutorial
 
TensorFlow Tutorial Part2
TensorFlow Tutorial Part2TensorFlow Tutorial Part2
TensorFlow Tutorial Part2
 

Kürzlich hochgeladen

Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdfSuman Jyoti
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfRagavanV2
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringmulugeta48
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptMsecMca
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdfKamal Acharya
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxfenichawla
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapRishantSharmaFr
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLManishPatel169454
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756dollysharma2066
 

Kürzlich hochgeladen (20)

NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 

InfoGAIL

  • 1. Info-Wasserstein-GAIL Yunzhu Li, Jiaming Song, Stefano Ermon, “Inferring The Latent Structure of Human Decision-Making from Raw Visual Inputs”, ArXiv, 2017 Sungjoon Choi (sungjoon.choi@cpslab.snu.ac.kr)
  • 2. Latent Structure of Human Demos 2 Pass / code: 0 Pass / code: 1 Turn/ code: 0 Turn/ code: 1
  • 3. • Introduction • Backgrounds • Generative Adversarial Imitation Learning (GAIL) • Policy gradient • InfoGAN • Wasserstein GAN • InfoGAIL • Experiments Contents 3
  • 4. • Goal of imitation learning is to match expert behavior. • However, demonstrations often show significant variability due to latent factors. • This paper presents an Info-GAIL algorithm that can infer the latent structure of human decision making. • This method can not only imitate, but also learn interpretable representations. Imitation Learning 4
  • 5. • The goal of this paper is to develop an imitation learning framework that is able to autonomously discover and disentangle the latent factors of variation underlying human decision making. • Basically, this paper combines generative adversarial imitation learning (GAIL), Info GAN, and Wasserstein GAN with some reward heuristics Introduction 5
  • 6. • We will NOT go into details. GAIL 6 • But, we will see some basics of policy gradient methods.
  • 7. Policy Gradient 7 Now we Get rid of expectation over a policy function!!
  • 10. Step-based PG 10 In other words, now we are considering a dynamic model!
  • 11. Step-based PG 11 We do NOT have to care about complex models in an MDP, anymore!
  • 12. Step-based PG (REINFORCE) 12 Now, we have REINFORCE algorithm! This method has been used in many deep learning methods where the objective function is NOT differentiable.
  • 13. Step-based PG (PG) 13 For all trajectories, and for all instances in a trajectory, the PG is simply weighted MLE where the weight is defined by the sum of future rewards, or Q value.
  • 14. • Now, we know where (18) came from, right? GAIL 14
  • 15. • Interpretable Imitation Learning • Utilized information theoretic regularization. • Simply added InfoGAN to GAIL. • Utilizing Raw Visual Inputs via Transfer Learning • Used a Deep Residual Network. Visual InfoGAIL 15
  • 16. • Rather than using a single unstructured noise vector, InfoGAN decomposes the input noise vector into two parts: (1) z, incompressible noise and (2) c, the latent code that targets the salient structured semantic features of the data distribution. • InfoGAN proposes an information-theoretic regularization: there should be high mutual information between latent codes c and generator distribution G(z, c). Thus I(c; G(z, c)) should be high. InfoGAN 16
  • 17. • Reward Augmentation • A general framework to incorporate prior knowledge in imitation learning by providing additional incentives to the agent without interfering with the imitation learning process. • Added a surrogate state-based reward that reflects our biases over the desired behaviors. • Can be seen as • a hybrid between imitation and reinforcement learning • side information provided to the generator • Wasserstein GAN (WGAN) • The discrimination network in WGAN solves a regression problem instead of a classification problem. • Suffers less from the vanishing gradient and mode collapse problem. Improved Optimization 17
  • 18. • Wasserstein Generative Adversarial Learning WGAN? 18
  • 19.
  • 20. Example 1 in WGAN
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.
  • 49. • Variance Reduction • Reduce variance in policy gradient method. • Replay buffer method with prioritized replay. • Good for the cases where the rewards are rare. • Baseline variance reduction methods. Improved Optimization 49
  • 50. Finally, InfoGAIL 50 Sample data similar to InfoGAN Update D similar to WGAN. Initialize policy from behavior cloning Update Q similar to GAN or GAIL. Update Policy with TRPO.
  • 51. Network Architectures 51 Latent codes are added to G Latent codes are also added to D Actions are added to D The posterior network Q adopts the same architecture as D except that the output is a softmax over the discrete latent variables, or factored Gaussian over continuous latent variables.
  • 52. Input Image Action Disc. Latent Code Cont. Latent Code G (policy) Input Image Action Disc. Latent Code D (cost) Score Input Image Action Disc. Latent Code Q (regularizer) Disc. Latent Code Cont. Latent Code Train policy function G with TRPO, and iterate.
  • 53. Experiments 53 Pass / code: 0 Pass / code: 1 Turn/ code: 0 Turn/ code: 1