Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Info-Wasserstein-GAIL
Yunzhu Li, Jiaming Song, Stefano Ermon, “Inferring The Latent Structure
of Human Decision-Making fro...
Latent Structure of Human Demos
2
Pass / code: 0 Pass / code: 1
Turn/ code: 0 Turn/ code: 1
• Introduction
• Backgrounds
• Generative Adversarial Imitation Learning (GAIL)
• Policy gradient
• InfoGAN
• Wasserstein ...
• Goal of imitation learning is to match expert
behavior.
• However, demonstrations often show significant
variability due...
• The goal of this paper is to develop an imitation
learning framework that is able to autonomously
discover and disentang...
• We will NOT go into details.
GAIL
6
• But, we will see some basics of policy gradient methods.
Policy Gradient
7
Now we Get rid of expectation
over a policy function!!
Policy Gradient
8
Step-based PG
9
Step-based PG
10
In other words, now we are considering a dynamic model!
Step-based PG
11
We do NOT have to care about
complex models in an MDP, anymore!
Step-based PG (REINFORCE)
12
Now, we have REINFORCE algorithm!
This method has been used in many deep learning methods
whe...
Step-based PG (PG)
13
For all trajectories, and for all instances in a trajectory,
the PG is simply weighted MLE where the...
• Now, we know where (18) came from, right?
GAIL
14
• Interpretable Imitation Learning
• Utilized information theoretic regularization.
• Simply added InfoGAN to GAIL.
• Util...
• Rather than using a single unstructured noise vector,
InfoGAN decomposes the input noise vector into two
parts: (1) z, i...
• Reward Augmentation
• A general framework to incorporate prior knowledge in imitation
learning by providing additional i...
• Wasserstein Generative Adversarial Learning
WGAN?
18
Example 1 in WGAN
WGAN, practically
48
• Variance Reduction
• Reduce variance in policy gradient method.
• Replay buffer method with prioritized replay.
• Good f...
Finally, InfoGAIL
50
Sample data similar to InfoGAN
Update D similar to WGAN.
Initialize policy from behavior cloning
Upda...
Network Architectures
51
Latent codes are
added to G
Latent codes are also
added to D
Actions are added to D
The posterior...
Input Image
Action
Disc. Latent Code Cont. Latent Code
G (policy)
Input Image
Action Disc. Latent Code
D (cost)
Score
Inpu...
Experiments
53
Pass / code: 0 Pass / code: 1
Turn/ code: 0 Turn/ code: 1
Experiments
54
InfoGAIL
InfoGAIL
InfoGAIL
InfoGAIL
InfoGAIL
InfoGAIL
InfoGAIL
InfoGAIL
InfoGAIL
InfoGAIL
InfoGAIL
InfoGAIL
InfoGAIL
InfoGAIL
InfoGAIL
InfoGAIL
InfoGAIL
InfoGAIL
InfoGAIL
InfoGAIL
InfoGAIL
InfoGAIL
InfoGAIL
InfoGAIL
InfoGAIL
InfoGAIL
InfoGAIL
InfoGAIL
InfoGAIL
Nächste SlideShare
Wird geladen in …5
×

InfoGAIL

Slides introducing
Yunzhu Li, Jiaming Song, Stefano Ermon, “Inferring The Latent Structure of Human Decision-Making from Raw Visual Inputs”, ArXiv, 2017
+ Pollicy Gradient + InfoGAN + WGAN

Ähnliche Bücher

Kostenlos mit einer 30-tägigen Testversion von Scribd

Alle anzeigen

Ähnliche Hörbücher

Kostenlos mit einer 30-tägigen Testversion von Scribd

Alle anzeigen
  • Als Erste(r) kommentieren

InfoGAIL

  1. 1. Info-Wasserstein-GAIL Yunzhu Li, Jiaming Song, Stefano Ermon, “Inferring The Latent Structure of Human Decision-Making from Raw Visual Inputs”, ArXiv, 2017 Sungjoon Choi (sungjoon.choi@cpslab.snu.ac.kr)
  2. 2. Latent Structure of Human Demos 2 Pass / code: 0 Pass / code: 1 Turn/ code: 0 Turn/ code: 1
  3. 3. • Introduction • Backgrounds • Generative Adversarial Imitation Learning (GAIL) • Policy gradient • InfoGAN • Wasserstein GAN • InfoGAIL • Experiments Contents 3
  4. 4. • Goal of imitation learning is to match expert behavior. • However, demonstrations often show significant variability due to latent factors. • This paper presents an Info-GAIL algorithm that can infer the latent structure of human decision making. • This method can not only imitate, but also learn interpretable representations. Imitation Learning 4
  5. 5. • The goal of this paper is to develop an imitation learning framework that is able to autonomously discover and disentangle the latent factors of variation underlying human decision making. • Basically, this paper combines generative adversarial imitation learning (GAIL), Info GAN, and Wasserstein GAN with some reward heuristics Introduction 5
  6. 6. • We will NOT go into details. GAIL 6 • But, we will see some basics of policy gradient methods.
  7. 7. Policy Gradient 7 Now we Get rid of expectation over a policy function!!
  8. 8. Policy Gradient 8
  9. 9. Step-based PG 9
  10. 10. Step-based PG 10 In other words, now we are considering a dynamic model!
  11. 11. Step-based PG 11 We do NOT have to care about complex models in an MDP, anymore!
  12. 12. Step-based PG (REINFORCE) 12 Now, we have REINFORCE algorithm! This method has been used in many deep learning methods where the objective function is NOT differentiable.
  13. 13. Step-based PG (PG) 13 For all trajectories, and for all instances in a trajectory, the PG is simply weighted MLE where the weight is defined by the sum of future rewards, or Q value.
  14. 14. • Now, we know where (18) came from, right? GAIL 14
  15. 15. • Interpretable Imitation Learning • Utilized information theoretic regularization. • Simply added InfoGAN to GAIL. • Utilizing Raw Visual Inputs via Transfer Learning • Used a Deep Residual Network. Visual InfoGAIL 15
  16. 16. • Rather than using a single unstructured noise vector, InfoGAN decomposes the input noise vector into two parts: (1) z, incompressible noise and (2) c, the latent code that targets the salient structured semantic features of the data distribution. • InfoGAN proposes an information-theoretic regularization: there should be high mutual information between latent codes c and generator distribution G(z, c). Thus I(c; G(z, c)) should be high. InfoGAN 16
  17. 17. • Reward Augmentation • A general framework to incorporate prior knowledge in imitation learning by providing additional incentives to the agent without interfering with the imitation learning process. • Added a surrogate state-based reward that reflects our biases over the desired behaviors. • Can be seen as • a hybrid between imitation and reinforcement learning • side information provided to the generator • Wasserstein GAN (WGAN) • The discrimination network in WGAN solves a regression problem instead of a classification problem. • Suffers less from the vanishing gradient and mode collapse problem. Improved Optimization 17
  18. 18. • Wasserstein Generative Adversarial Learning WGAN? 18
  19. 19. Example 1 in WGAN
  20. 20. WGAN, practically 48
  21. 21. • Variance Reduction • Reduce variance in policy gradient method. • Replay buffer method with prioritized replay. • Good for the cases where the rewards are rare. • Baseline variance reduction methods. Improved Optimization 49
  22. 22. Finally, InfoGAIL 50 Sample data similar to InfoGAN Update D similar to WGAN. Initialize policy from behavior cloning Update Q similar to GAN or GAIL. Update Policy with TRPO.
  23. 23. Network Architectures 51 Latent codes are added to G Latent codes are also added to D Actions are added to D The posterior network Q adopts the same architecture as D except that the output is a softmax over the discrete latent variables, or factored Gaussian over continuous latent variables.
  24. 24. Input Image Action Disc. Latent Code Cont. Latent Code G (policy) Input Image Action Disc. Latent Code D (cost) Score Input Image Action Disc. Latent Code Q (regularizer) Disc. Latent Code Cont. Latent Code Train policy function G with TRPO, and iterate.
  25. 25. Experiments 53 Pass / code: 0 Pass / code: 1 Turn/ code: 0 Turn/ code: 1
  26. 26. Experiments 54

    Als Erste(r) kommentieren

    Loggen Sie sich ein, um Kommentare anzuzeigen.

  • ssuser0d404f

    May. 27, 2017
  • KihoSuh

    Jun. 2, 2017
  • DapengLiu10

    Jul. 6, 2017
  • JiamingSong

    Aug. 4, 2017
  • utotch

    Oct. 24, 2017
  • JaeminCho6

    Jan. 17, 2018
  • TaewonMoon

    May. 29, 2018

Slides introducing Yunzhu Li, Jiaming Song, Stefano Ermon, “Inferring The Latent Structure of Human Decision-Making from Raw Visual Inputs”, ArXiv, 2017 + Pollicy Gradient + InfoGAN + WGAN

Aufrufe

Aufrufe insgesamt

2.420

Auf Slideshare

0

Aus Einbettungen

0

Anzahl der Einbettungen

27

Befehle

Downloads

54

Geteilt

0

Kommentare

0

Likes

7

×