SlideShare ist ein Scribd-Unternehmen logo
1 von 39
Downloaden Sie, um offline zu lesen
Context-aware Dynamics Model for Generalization
in Model-Based Reinforcement Learning
Kimin Lee*, Younggyo Seo*, Seunghyun Lee, Honglak Lee, Jinwoo Shin
https://sites.google.com/view/cadm*Equal Contribution
Model-based Reinforcement Learning
● Model-based reinforcement learning (RL)
○ Learning a model of environment, i.e., transition dynamics (and reward)
● Advantages
Control via planning Sample-efficient learning
Model-based RL works!
● Recent success of model-based reinforcement learning
MuZero [1] Dreamer [2]
[1] Schrittwieser, J., Antonoglou, I., Hubert, T., Simonyan, K., Sifre, L., Schmitt, S., ... & Lillicrap, T. Mastering atari, go, chess and shogi by planning with a learned model. arXiv. 2019
[2] Hafner, D., Lillicrap, T., Ba, J., & Norouzi, M. Dream to control: Learning behaviors by latent imagination. In ICLR. 2020
Generalization in Model-based RL
[3] Nagabandi, A., Clavera, I., Liu, S., Fearing, R. S., Abbeel, P., Levine, S., & Finn, C. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In ICLR. 2019.
● However, model-based RL does not
generalize well to unseen environments [3]
Generalization in Model-based RL
[3] Nagabandi, A., Clavera, I., Liu, S., Fearing, R. S., Abbeel, P., Levine, S., & Finn, C. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In ICLR. 2019.
● However, model-based RL does not
generalize well to unseen environments [3]
No information of length!
Generalization in Model-based RL
[3] Nagabandi, A., Clavera, I., Liu, S., Fearing, R. S., Abbeel, P., Levine, S., & Finn, C. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In ICLR. 2019.
● However, model-based RL does not
generalize well to unseen environments [3]
● For generalization, we need context
information from past observations
Generalization in Model-based RL
[3] Nagabandi, A., Clavera, I., Liu, S., Fearing, R. S., Abbeel, P., Levine, S., & Finn, C. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In ICLR. 2019.
● However, model-based RL does not
generalize well to unseen environments [3]
● For generalization, we need context
information from past observations
“Context-awareness!”
Context-aware Dynamics Model
● What is context & How can it help?
Context-aware Dynamics Model
● What is context & How can it help?
Context-aware Dynamics Model
● What is context & How can it help?
Context-aware Dynamics Model
● What is context & How can it help?
Context-aware Dynamics Model
● What is context & How can it help?
How do we extract
context information
from past experiences?
Context-aware Dynamics Model
● Main idea: separate context learning and next-state inference
Context-aware Dynamics Model
● Main idea: separate context learning and next-state inference
● Context learning
Introduce a context encoder that outputs
a context latent vector
Context-aware Dynamics Model
● Main idea: separate context learning and next-state inference
● Context learning
Introduce a context encoder that learns
the context latent vector
● Next-state inference
Condition a dynamics model on the
context latent vector
Context-aware Dynamics Model
● Main idea: separate context learning and next-state inference
● Context learning
Introduce a context encoder that learns
the context latent vector
● Next-state inference
Condition a dynamics model on the
context latent vector
Challenge: how to encode more meaningful information of dynamics?
Context-aware Dynamics Model
● Loss function for context learning
● Future-step prediction
Make predictions multiple timesteps into
the future
Context-aware Dynamics Model
● Loss function for context learning
● Future-step prediction
Make predictions multiple timesteps into
the future
● Backward prediction
Predict backward transitions
Context-aware Dynamics Model
● Final loss function
● Model-agnostic!
Ablation Study
Effects of prediction loss
Vanilla dynamics model (DM)
: No context learning
Vanilla DM + context learning
with one-step forward
Vanilla DM + context learning
with future-step forward
Vanilla DM + context learning
with future-step forward & backward
CaDM is Model-agnostic
● Prediction error for Half-Cheetah with varying body masses
Vanilla DM PE-TS [4]
[4] Chua, K., Calandra, R., McAllister, R., & Levine, S. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In NeurIPS. 2018.
Embedding Analysis
● Contexts from similar environments are clustered together
Embedding Analysis
● Contexts from similar environments are clustered together
Embedding Analysis
● Contexts from similar environments are clustered together
Embedding Analysis
● Contexts from similar environments are clustered together
Embedding Analysis
● Contexts from similar environments are clustered together
● 10 past transitions and 20 future predictions
Prediction Visualization
● 10 past transitions and 20 future predictions
Prediction Visualization
● 10 past transitions and 20 future predictions
Prediction Visualization
● 10 past transitions and 20 future predictions
Prediction Visualization
● Context also improves the generalization of model-free RL method
Context helps Model-free RL too
● Proximal policy optimization (PPO) [5]
[5] Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. Proximal policy optimization algorithms. arXiv. 2017
● Context also improves the generalization of model-free RL method
Context helps Model-free RL too
● Proximal policy optimization (PPO) [5]
● Model-free RL also suffers from poor
generalization [6, 7]
[5] Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. Proximal policy optimization algorithms. arXiv. 2017
[6] Packer, C., Gao, K., Kos, J., Krähenbühl, P., Koltun, V., & Song, D. Assessing generalization in deep reinforcement learning. arXiv. 2018.
[7] Cobbe, K., Klimov, O., Hesse, C., Kim, T., & Schulman, J. Quantifying generalization in reinforcement learning. In ICML. 2019.
● Context also improves the generalization of model-free RL method
Context helps Model-free RL too
[5] Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. Proximal policy optimization algorithms. arXiv. 2017
[6] Packer, C., Gao, K., Kos, J., Krähenbühl, P., Koltun, V., & Song, D. Assessing generalization in deep reinforcement learning. arXiv. 2018.
[7] Cobbe, K., Klimov, O., Hesse, C., Kim, T., & Schulman, J. Quantifying generalization in reinforcement learning. In ICML. 2019.
● Proximal policy optimization (PPO) [5]
● Model-free RL also suffers from poor
generalization [6, 7]
● PPO + CaDM
○ Conditioning policy and value
networks on learned latent vector
● We evaluate the generalization performance in two regimes
○ Moderate
○ Extreme
Experimental Setup: Environments
Model-based RL: HalfCheetah
[3] Nagabandi, A., Clavera, I., Liu, S., Fearing, R. S., Abbeel, P., Levine, S., & Finn, C. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In ICLR. 2019.
[4] Chua, K., Calandra, R., McAllister, R., & Levine, S. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In NeurIPS. 2018.
Model-based RL: HalfCheetah
[3] Nagabandi, A., Clavera, I., Liu, S., Fearing, R. S., Abbeel, P., Levine, S., & Finn, C. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In ICLR. 2019.
[4] Chua, K., Calandra, R., McAllister, R., & Levine, S. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In NeurIPS. 2018.
Model-based RL: HalfCheetah
[3] Nagabandi, A., Clavera, I., Liu, S., Fearing, R. S., Abbeel, P., Levine, S., & Finn, C. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In ICLR. 2019.
[4] Chua, K., Calandra, R., McAllister, R., & Levine, S. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In NeurIPS. 2018.
Model-free RL: HalfCheetah
[9] Rakelly, K., Zhou, A., Quillen, D., Finn, C., & Levine, S. Efficient off-policy meta-reinforcement learning via probabilistic context variables. In ICML. 2019.
[10] Zhou, W., Pinto, L., & Gupta, A. (2019). Environment probing interaction policies. In ICLR. 2019.
Conclusion
● For dynamics generalization,
○ We propose a context-aware dynamics model
○ Novel loss function for context learning
● Code is available at
● https://github.com/younggyoseo/CaDM
https://sites.google.com/view/cadm
Thank you!

Weitere ähnliche Inhalte

Ähnlich wie Context-aware Dynamics Model for Generalization in Model-Based Reinforcement Learning (ICML 2020)

Vera.s.weber pre data_collection_powerpoint
Vera.s.weber pre data_collection_powerpointVera.s.weber pre data_collection_powerpoint
Vera.s.weber pre data_collection_powerpoint
Vera Weber
 
Boundary spanning leadership slideshare
Boundary spanning leadership slideshareBoundary spanning leadership slideshare
Boundary spanning leadership slideshare
Kelly Trusty
 

Ähnlich wie Context-aware Dynamics Model for Generalization in Model-Based Reinforcement Learning (ICML 2020) (19)

Testing your homework 2 solutions...in [1] import numpy as
Testing your homework 2 solutions...in [1] import numpy as Testing your homework 2 solutions...in [1] import numpy as
Testing your homework 2 solutions...in [1] import numpy as
 
ML Interpretability Inside Out
ML Interpretability Inside OutML Interpretability Inside Out
ML Interpretability Inside Out
 
Deep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsDeep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its Applications
 
Vera.s.weber pre data_collection_powerpoint
Vera.s.weber pre data_collection_powerpointVera.s.weber pre data_collection_powerpoint
Vera.s.weber pre data_collection_powerpoint
 
Using Embeddings for Dynamic Diverse Summarisation in Heterogeneous Graph Str...
Using Embeddings for Dynamic Diverse Summarisation in Heterogeneous Graph Str...Using Embeddings for Dynamic Diverse Summarisation in Heterogeneous Graph Str...
Using Embeddings for Dynamic Diverse Summarisation in Heterogeneous Graph Str...
 
Qualitative approaches to learning analytics
Qualitative approaches to learning analyticsQualitative approaches to learning analytics
Qualitative approaches to learning analytics
 
Neural Semi-supervised Learning under Domain Shift
Neural Semi-supervised Learning under Domain ShiftNeural Semi-supervised Learning under Domain Shift
Neural Semi-supervised Learning under Domain Shift
 
Boundary spanning leadership slideshare
Boundary spanning leadership slideshareBoundary spanning leadership slideshare
Boundary spanning leadership slideshare
 
Modular Multitask Reinforcement Learning with Policy Sketches
Modular Multitask Reinforcement Learning with Policy SketchesModular Multitask Reinforcement Learning with Policy Sketches
Modular Multitask Reinforcement Learning with Policy Sketches
 
Model-Based Reinforcement Learning @NIPS2017
Model-Based Reinforcement Learning @NIPS2017Model-Based Reinforcement Learning @NIPS2017
Model-Based Reinforcement Learning @NIPS2017
 
Appalachian College Association Increasing the Appreciation of the Appalachia...
Appalachian College Association Increasing the Appreciation of the Appalachia...Appalachian College Association Increasing the Appreciation of the Appalachia...
Appalachian College Association Increasing the Appreciation of the Appalachia...
 
Sequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time SeriesSequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time Series
 
Achterman csla 2011reading_online
Achterman csla 2011reading_onlineAchterman csla 2011reading_online
Achterman csla 2011reading_online
 
Csla presentation reading online 2011
Csla presentation reading online 2011Csla presentation reading online 2011
Csla presentation reading online 2011
 
Shanghai deep learning meetup 4
Shanghai deep learning meetup 4Shanghai deep learning meetup 4
Shanghai deep learning meetup 4
 
Chounta@paws
Chounta@pawsChounta@paws
Chounta@paws
 
The relationship of e-learner’s self-regulatory efficacy and perception of e-...
The relationship of e-learner’s self-regulatory efficacy and perception of e-...The relationship of e-learner’s self-regulatory efficacy and perception of e-...
The relationship of e-learner’s self-regulatory efficacy and perception of e-...
 
REINFORCEMENT LEARNING (reinforced through trial and error).pptx
REINFORCEMENT LEARNING (reinforced through trial and error).pptxREINFORCEMENT LEARNING (reinforced through trial and error).pptx
REINFORCEMENT LEARNING (reinforced through trial and error).pptx
 
Learning Framework for Educational Informatics
Learning Framework for Educational InformaticsLearning Framework for Educational Informatics
Learning Framework for Educational Informatics
 

Mehr von ALINLAB

Mehr von ALINLAB (7)

Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised...
Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised...Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised...
Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised...
 
Trajectory-wise Multiple Choice Learning for Dynamics Generalization in Reinf...
Trajectory-wise Multiple Choice Learning for Dynamics Generalization in Reinf...Trajectory-wise Multiple Choice Learning for Dynamics Generalization in Reinf...
Trajectory-wise Multiple Choice Learning for Dynamics Generalization in Reinf...
 
Learning bounds for risk-sensitive learning
Learning bounds for risk-sensitive learningLearning bounds for risk-sensitive learning
Learning bounds for risk-sensitive learning
 
CSI: Novelty Detection via Contrastive Learning on Distributionally Shifted I...
CSI: Novelty Detection via Contrastive Learning on Distributionally Shifted I...CSI: Novelty Detection via Contrastive Learning on Distributionally Shifted I...
CSI: Novelty Detection via Contrastive Learning on Distributionally Shifted I...
 
Polynomial Tensor Sketch for Element-wise Matrix Function (ICML 2020)
Polynomial Tensor Sketch for Element-wise Matrix Function (ICML 2020)Polynomial Tensor Sketch for Element-wise Matrix Function (ICML 2020)
Polynomial Tensor Sketch for Element-wise Matrix Function (ICML 2020)
 
Self-supervised Label Augmentation via Input Transformations (ICML 2020)
Self-supervised Label Augmentation via Input Transformations (ICML 2020)Self-supervised Label Augmentation via Input Transformations (ICML 2020)
Self-supervised Label Augmentation via Input Transformations (ICML 2020)
 
M2m: Imbalanced Classification via Major-to-minor Translation (CVPR 2020)
M2m: Imbalanced Classification via Major-to-minor Translation (CVPR 2020)M2m: Imbalanced Classification via Major-to-minor Translation (CVPR 2020)
M2m: Imbalanced Classification via Major-to-minor Translation (CVPR 2020)
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 

Context-aware Dynamics Model for Generalization in Model-Based Reinforcement Learning (ICML 2020)

  • 1. Context-aware Dynamics Model for Generalization in Model-Based Reinforcement Learning Kimin Lee*, Younggyo Seo*, Seunghyun Lee, Honglak Lee, Jinwoo Shin https://sites.google.com/view/cadm*Equal Contribution
  • 2. Model-based Reinforcement Learning ● Model-based reinforcement learning (RL) ○ Learning a model of environment, i.e., transition dynamics (and reward) ● Advantages Control via planning Sample-efficient learning
  • 3. Model-based RL works! ● Recent success of model-based reinforcement learning MuZero [1] Dreamer [2] [1] Schrittwieser, J., Antonoglou, I., Hubert, T., Simonyan, K., Sifre, L., Schmitt, S., ... & Lillicrap, T. Mastering atari, go, chess and shogi by planning with a learned model. arXiv. 2019 [2] Hafner, D., Lillicrap, T., Ba, J., & Norouzi, M. Dream to control: Learning behaviors by latent imagination. In ICLR. 2020
  • 4. Generalization in Model-based RL [3] Nagabandi, A., Clavera, I., Liu, S., Fearing, R. S., Abbeel, P., Levine, S., & Finn, C. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In ICLR. 2019. ● However, model-based RL does not generalize well to unseen environments [3]
  • 5. Generalization in Model-based RL [3] Nagabandi, A., Clavera, I., Liu, S., Fearing, R. S., Abbeel, P., Levine, S., & Finn, C. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In ICLR. 2019. ● However, model-based RL does not generalize well to unseen environments [3] No information of length!
  • 6. Generalization in Model-based RL [3] Nagabandi, A., Clavera, I., Liu, S., Fearing, R. S., Abbeel, P., Levine, S., & Finn, C. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In ICLR. 2019. ● However, model-based RL does not generalize well to unseen environments [3] ● For generalization, we need context information from past observations
  • 7. Generalization in Model-based RL [3] Nagabandi, A., Clavera, I., Liu, S., Fearing, R. S., Abbeel, P., Levine, S., & Finn, C. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In ICLR. 2019. ● However, model-based RL does not generalize well to unseen environments [3] ● For generalization, we need context information from past observations “Context-awareness!”
  • 8. Context-aware Dynamics Model ● What is context & How can it help?
  • 9. Context-aware Dynamics Model ● What is context & How can it help?
  • 10. Context-aware Dynamics Model ● What is context & How can it help?
  • 11. Context-aware Dynamics Model ● What is context & How can it help?
  • 12. Context-aware Dynamics Model ● What is context & How can it help? How do we extract context information from past experiences?
  • 13. Context-aware Dynamics Model ● Main idea: separate context learning and next-state inference
  • 14. Context-aware Dynamics Model ● Main idea: separate context learning and next-state inference ● Context learning Introduce a context encoder that outputs a context latent vector
  • 15. Context-aware Dynamics Model ● Main idea: separate context learning and next-state inference ● Context learning Introduce a context encoder that learns the context latent vector ● Next-state inference Condition a dynamics model on the context latent vector
  • 16. Context-aware Dynamics Model ● Main idea: separate context learning and next-state inference ● Context learning Introduce a context encoder that learns the context latent vector ● Next-state inference Condition a dynamics model on the context latent vector Challenge: how to encode more meaningful information of dynamics?
  • 17. Context-aware Dynamics Model ● Loss function for context learning ● Future-step prediction Make predictions multiple timesteps into the future
  • 18. Context-aware Dynamics Model ● Loss function for context learning ● Future-step prediction Make predictions multiple timesteps into the future ● Backward prediction Predict backward transitions
  • 19. Context-aware Dynamics Model ● Final loss function ● Model-agnostic!
  • 20. Ablation Study Effects of prediction loss Vanilla dynamics model (DM) : No context learning Vanilla DM + context learning with one-step forward Vanilla DM + context learning with future-step forward Vanilla DM + context learning with future-step forward & backward
  • 21. CaDM is Model-agnostic ● Prediction error for Half-Cheetah with varying body masses Vanilla DM PE-TS [4] [4] Chua, K., Calandra, R., McAllister, R., & Levine, S. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In NeurIPS. 2018.
  • 22. Embedding Analysis ● Contexts from similar environments are clustered together
  • 23. Embedding Analysis ● Contexts from similar environments are clustered together
  • 24. Embedding Analysis ● Contexts from similar environments are clustered together
  • 25. Embedding Analysis ● Contexts from similar environments are clustered together
  • 26. Embedding Analysis ● Contexts from similar environments are clustered together
  • 27. ● 10 past transitions and 20 future predictions Prediction Visualization
  • 28. ● 10 past transitions and 20 future predictions Prediction Visualization
  • 29. ● 10 past transitions and 20 future predictions Prediction Visualization
  • 30. ● 10 past transitions and 20 future predictions Prediction Visualization
  • 31. ● Context also improves the generalization of model-free RL method Context helps Model-free RL too ● Proximal policy optimization (PPO) [5] [5] Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. Proximal policy optimization algorithms. arXiv. 2017
  • 32. ● Context also improves the generalization of model-free RL method Context helps Model-free RL too ● Proximal policy optimization (PPO) [5] ● Model-free RL also suffers from poor generalization [6, 7] [5] Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. Proximal policy optimization algorithms. arXiv. 2017 [6] Packer, C., Gao, K., Kos, J., Krähenbühl, P., Koltun, V., & Song, D. Assessing generalization in deep reinforcement learning. arXiv. 2018. [7] Cobbe, K., Klimov, O., Hesse, C., Kim, T., & Schulman, J. Quantifying generalization in reinforcement learning. In ICML. 2019.
  • 33. ● Context also improves the generalization of model-free RL method Context helps Model-free RL too [5] Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. Proximal policy optimization algorithms. arXiv. 2017 [6] Packer, C., Gao, K., Kos, J., Krähenbühl, P., Koltun, V., & Song, D. Assessing generalization in deep reinforcement learning. arXiv. 2018. [7] Cobbe, K., Klimov, O., Hesse, C., Kim, T., & Schulman, J. Quantifying generalization in reinforcement learning. In ICML. 2019. ● Proximal policy optimization (PPO) [5] ● Model-free RL also suffers from poor generalization [6, 7] ● PPO + CaDM ○ Conditioning policy and value networks on learned latent vector
  • 34. ● We evaluate the generalization performance in two regimes ○ Moderate ○ Extreme Experimental Setup: Environments
  • 35. Model-based RL: HalfCheetah [3] Nagabandi, A., Clavera, I., Liu, S., Fearing, R. S., Abbeel, P., Levine, S., & Finn, C. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In ICLR. 2019. [4] Chua, K., Calandra, R., McAllister, R., & Levine, S. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In NeurIPS. 2018.
  • 36. Model-based RL: HalfCheetah [3] Nagabandi, A., Clavera, I., Liu, S., Fearing, R. S., Abbeel, P., Levine, S., & Finn, C. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In ICLR. 2019. [4] Chua, K., Calandra, R., McAllister, R., & Levine, S. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In NeurIPS. 2018.
  • 37. Model-based RL: HalfCheetah [3] Nagabandi, A., Clavera, I., Liu, S., Fearing, R. S., Abbeel, P., Levine, S., & Finn, C. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In ICLR. 2019. [4] Chua, K., Calandra, R., McAllister, R., & Levine, S. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In NeurIPS. 2018.
  • 38. Model-free RL: HalfCheetah [9] Rakelly, K., Zhou, A., Quillen, D., Finn, C., & Levine, S. Efficient off-policy meta-reinforcement learning via probabilistic context variables. In ICML. 2019. [10] Zhou, W., Pinto, L., & Gupta, A. (2019). Environment probing interaction policies. In ICLR. 2019.
  • 39. Conclusion ● For dynamics generalization, ○ We propose a context-aware dynamics model ○ Novel loss function for context learning ● Code is available at ● https://github.com/younggyoseo/CaDM https://sites.google.com/view/cadm Thank you!