Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Trajectory-wise Multiple Choice Learning for
Dynamics Generalization in Reinforcement Learning
Younggyo Seo1
*, Kimin Lee2...
Problem: Dynamics Generalization
● Model-based RL suffers from dynamics generalization problem
Evaluation
Training
Deploym...
Problem: Dynamics Generalization
● Multi-modal distribution of transition dynamics
Main Components
● Main idea: explicitly approximate the multi-modal distribution
● Multi-headed dynamics model
Approximate...
Main Components
● Main idea: explicitly approximate the multi-modal distribution
● Multi-headed dynamics model
Approximate...
Main Components
● Main idea: explicitly approximate the multi-modal distribution
● Multi-headed dynamics model
Approximate...
Trajectory-wise Multiple Choice Learning
● For MCL, each prediction head should receive distinct training samples
Transiti...
Trajectory-wise Multiple Choice Learning
● For MCL, each prediction head should receive distinct training samples
Trajecto...
Context-conditional Multi-headed Dynamics Model
● We also introduce context encoder for online adaptation to unseen enviro...
Analysis on Trajectory-wise MCL
Transitions Trajectory
segment
● Specialization leads to superior generalization performan...
Analysis on Adaptive Planning
● Qualitative analysis
○ Manually assign prediction heads specialized for [mass: 2.5] to [ma...
Comparative Evaluation
● Superior generalization performance on unseen 6 environments
Conclusion
● For dynamics generalization
○ Context-conditional multi-headed dynamics model
○ Trajectory-wise multiple choi...
Nächste SlideShare
Wird geladen in …5
×

Trajectory-wise Multiple Choice Learning for Dynamics Generalization in Reinforcement Learning

Official slides for the NeurIPS 2020 paper "Trajectory-wise Multiple Choice Learning for Dynamics Generalization in Reinforcement Learning," by Younggyo Seo*, Kimin Lee*, Ignasi Clavera, Thanard Kurutach, Jinwoo Shin, Pieter Abbeel.

Ähnliche Bücher

Kostenlos mit einer 30-tägigen Testversion von Scribd

Alle anzeigen

Ähnliche Hörbücher

Kostenlos mit einer 30-tägigen Testversion von Scribd

Alle anzeigen
  • Als Erste(r) kommentieren

Trajectory-wise Multiple Choice Learning for Dynamics Generalization in Reinforcement Learning

  1. 1. Trajectory-wise Multiple Choice Learning for Dynamics Generalization in Reinforcement Learning Younggyo Seo1 *, Kimin Lee2 *, Ignasi Clavera2 , Thanard Kurutach2 , Jinwoo Shin1 and Pieter Abbeel2 KAIST1 , UC Berkeley2 *Equal Contribution https://sites.google.com/view/trajectory-mcl
  2. 2. Problem: Dynamics Generalization ● Model-based RL suffers from dynamics generalization problem Evaluation Training Deployment
  3. 3. Problem: Dynamics Generalization ● Multi-modal distribution of transition dynamics
  4. 4. Main Components ● Main idea: explicitly approximate the multi-modal distribution ● Multi-headed dynamics model Approximates multi-modal distribution by learning specialized prediction heads
  5. 5. Main Components ● Main idea: explicitly approximate the multi-modal distribution ● Multi-headed dynamics model Approximates multi-modal distribution by learning specialized prediction heads ● Multiple choice learning (MCL) Update the most accurate prediction head for specialization
  6. 6. Main Components ● Main idea: explicitly approximate the multi-modal distribution ● Multi-headed dynamics model Approximates multi-modal distribution by learning specialized prediction heads ● Multiple choice learning (MCL) Update the most accurate prediction head for specialization ● Adaptive planning Use the most accurate prediction head over a recent experience for planning
  7. 7. Trajectory-wise Multiple Choice Learning ● For MCL, each prediction head should receive distinct training samples Transitions Which prediction head is most accurate over these transitions?
  8. 8. Trajectory-wise Multiple Choice Learning ● For MCL, each prediction head should receive distinct training samples Trajectory segment ● Trajectory-wise multiple choice learning Difference in dynamics is more distinctively captured by considering prediction error over trajectory segment
  9. 9. Context-conditional Multi-headed Dynamics Model ● We also introduce context encoder for online adaptation to unseen environments ● Context encoder g captures contextual information from past experience ● See [Lee’20] for more information [Lee’20] Lee, Kimin, Younggyo Seo, Seunghyun Lee, Honglak Lee, Jinwoo Shin. "Context-aware Dynamics Model for Generalization in Model-Based Reinforcement Learning." In ICML. 2020.
  10. 10. Analysis on Trajectory-wise MCL Transitions Trajectory segment ● Specialization leads to superior generalization performance Hopper
  11. 11. Analysis on Adaptive Planning ● Qualitative analysis ○ Manually assign prediction heads specialized for [mass: 2.5] to [mass: 1.0] [Mass: 1.0] with prediction heads specialized for [Mass: 2.5] [Mass: 2.5] with prediction heads specialized for [Mass: 2.5] Agent acts as if it has a heavyweight body!
  12. 12. Comparative Evaluation ● Superior generalization performance on unseen 6 environments
  13. 13. Conclusion ● For dynamics generalization ○ Context-conditional multi-headed dynamics model ○ Trajectory-wise multiple choice learning ○ Adaptive planning Thank you!

    Als Erste(r) kommentieren

  • ssuserb1f8021

    Feb. 27, 2021

Official slides for the NeurIPS 2020 paper "Trajectory-wise Multiple Choice Learning for Dynamics Generalization in Reinforcement Learning," by Younggyo Seo*, Kimin Lee*, Ignasi Clavera, Thanard Kurutach, Jinwoo Shin, Pieter Abbeel.

Aufrufe

Aufrufe insgesamt

392

Auf Slideshare

0

Aus Einbettungen

0

Anzahl der Einbettungen

0

Befehle

Downloads

2

Geteilt

0

Kommentare

0

Likes

1

×