Context-aware Dynamics Model for Generalization in Model-Based Reinforcement Learning (ICML 2020)

Context-aware Dynamics Model for Generalization
in Model-Based Reinforcement Learning
Kimin Lee*, Younggyo Seo*, Seunghyun Lee, Honglak Lee, Jinwoo Shin
https://sites.google.com/view/cadm*Equal Contribution

Model-based Reinforcement Learning
● Model-based reinforcement learning (RL)
○ Learning a model of environment, i.e., transition dynamics (and reward)
● Advantages
Control via planning Sample-efficient learning

Model-based RL works!
● Recent success of model-based reinforcement learning
MuZero [1] Dreamer [2]
[1] Schrittwieser, J., Antonoglou, I., Hubert, T., Simonyan, K., Sifre, L., Schmitt, S., ... & Lillicrap, T. Mastering atari, go, chess and shogi by planning with a learned model. arXiv. 2019
[2] Hafner, D., Lillicrap, T., Ba, J., & Norouzi, M. Dream to control: Learning behaviors by latent imagination. In ICLR. 2020

Generalization in Model-based RL
[3] Nagabandi, A., Clavera, I., Liu, S., Fearing, R. S., Abbeel, P., Levine, S., & Finn, C. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In ICLR. 2019.
● However, model-based RL does not
generalize well to unseen environments [3]

No information of length!

● For generalization, we need context
information from past observations

● For generalization, we need context
information from past observations
“Context-awareness!”

Context-aware Dynamics Model
● What is context & How can it help?

● What is context & How can it help?
How do we extract
context information
from past experiences?

● Main idea: separate context learning and next-state inference

● Context learning
Introduce a context encoder that outputs
a context latent vector

Introduce a context encoder that learns
the context latent vector
● Next-state inference
Condition a dynamics model on the
context latent vector

Introduce a context encoder that learns
the context latent vector
● Next-state inference
Condition a dynamics model on the
context latent vector
Challenge: how to encode more meaningful information of dynamics?

● Loss function for context learning
● Future-step prediction
Make predictions multiple timesteps into
the future

● Loss function for context learning
● Future-step prediction
Make predictions multiple timesteps into
the future
● Backward prediction
Predict backward transitions

● Final loss function
● Model-agnostic!

Ablation Study
Effects of prediction loss
Vanilla dynamics model (DM)
: No context learning
Vanilla DM + context learning
with one-step forward
with future-step forward
with future-step forward & backward

CaDM is Model-agnostic
● Prediction error for Half-Cheetah with varying body masses
Vanilla DM PE-TS [4]
[4] Chua, K., Calandra, R., McAllister, R., & Levine, S. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In NeurIPS. 2018.

Embedding Analysis
● Contexts from similar environments are clustered together

● 10 past transitions and 20 future predictions
Prediction Visualization

● Context also improves the generalization of model-free RL method
Context helps Model-free RL too
● Proximal policy optimization (PPO) [5]
[5] Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. Proximal policy optimization algorithms. arXiv. 2017

● Model-free RL also suffers from poor
generalization [6, 7]
[6] Packer, C., Gao, K., Kos, J., Krähenbühl, P., Koltun, V., & Song, D. Assessing generalization in deep reinforcement learning. arXiv. 2018.
[7] Cobbe, K., Klimov, O., Hesse, C., Kim, T., & Schulman, J. Quantifying generalization in reinforcement learning. In ICML. 2019.

[6] Packer, C., Gao, K., Kos, J., Krähenbühl, P., Koltun, V., & Song, D. Assessing generalization in deep reinforcement learning. arXiv. 2018.
[7] Cobbe, K., Klimov, O., Hesse, C., Kim, T., & Schulman, J. Quantifying generalization in reinforcement learning. In ICML. 2019.
● Model-free RL also suffers from poor
generalization [6, 7]
● PPO + CaDM
○ Conditioning policy and value
networks on learned latent vector

● We evaluate the generalization performance in two regimes
○ Moderate
○ Extreme
Experimental Setup: Environments

Model-based RL: HalfCheetah
[4] Chua, K., Calandra, R., McAllister, R., & Levine, S. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In NeurIPS. 2018.

Model-free RL: HalfCheetah
[9] Rakelly, K., Zhou, A., Quillen, D., Finn, C., & Levine, S. Efficient off-policy meta-reinforcement learning via probabilistic context variables. In ICML. 2019.
[10] Zhou, W., Pinto, L., & Gupta, A. (2019). Environment probing interaction policies. In ICLR. 2019.

Conclusion
● For dynamics generalization,
○ We propose a context-aware dynamics model
○ Novel loss function for context learning
● Code is available at
● https://github.com/younggyoseo/CaDM
https://sites.google.com/view/cadm
Thank you!

Context-aware Dynamics Model for Generalization in Model-Based Reinforcement Learning (ICML 2020)

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Context-aware Dynamics Model for Generalization in Model-Based Reinforcement Learning (ICML 2020)

Ähnlich wie Context-aware Dynamics Model for Generalization in Model-Based Reinforcement Learning (ICML 2020) (19)

Mehr von ALINLAB

Mehr von ALINLAB (7)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Context-aware Dynamics Model for Generalization in Model-Based Reinforcement Learning (ICML 2020)