This document summarizes a research paper on developing a context-aware dynamics model (CaDM) to improve generalization in model-based reinforcement learning. The CaDM uses a context encoder to separately learn context from past observations and condition a dynamics model on the learned context vector. This allows the model to generalize better to new environments. The CaDM achieves improved generalization over vanilla dynamics models in simulation experiments, and can also help model-free RL methods generalize better by conditioning policies on the context vector.
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Context-aware Dynamics Model for Generalization in Model-Based Reinforcement Learning (ICML 2020)
1. Context-aware Dynamics Model for Generalization
in Model-Based Reinforcement Learning
Kimin Lee*, Younggyo Seo*, Seunghyun Lee, Honglak Lee, Jinwoo Shin
https://sites.google.com/view/cadm*Equal Contribution
2. Model-based Reinforcement Learning
● Model-based reinforcement learning (RL)
○ Learning a model of environment, i.e., transition dynamics (and reward)
● Advantages
Control via planning Sample-efficient learning
3. Model-based RL works!
● Recent success of model-based reinforcement learning
MuZero [1] Dreamer [2]
[1] Schrittwieser, J., Antonoglou, I., Hubert, T., Simonyan, K., Sifre, L., Schmitt, S., ... & Lillicrap, T. Mastering atari, go, chess and shogi by planning with a learned model. arXiv. 2019
[2] Hafner, D., Lillicrap, T., Ba, J., & Norouzi, M. Dream to control: Learning behaviors by latent imagination. In ICLR. 2020
4. Generalization in Model-based RL
[3] Nagabandi, A., Clavera, I., Liu, S., Fearing, R. S., Abbeel, P., Levine, S., & Finn, C. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In ICLR. 2019.
● However, model-based RL does not
generalize well to unseen environments [3]
5. Generalization in Model-based RL
[3] Nagabandi, A., Clavera, I., Liu, S., Fearing, R. S., Abbeel, P., Levine, S., & Finn, C. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In ICLR. 2019.
● However, model-based RL does not
generalize well to unseen environments [3]
No information of length!
6. Generalization in Model-based RL
[3] Nagabandi, A., Clavera, I., Liu, S., Fearing, R. S., Abbeel, P., Levine, S., & Finn, C. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In ICLR. 2019.
● However, model-based RL does not
generalize well to unseen environments [3]
● For generalization, we need context
information from past observations
7. Generalization in Model-based RL
[3] Nagabandi, A., Clavera, I., Liu, S., Fearing, R. S., Abbeel, P., Levine, S., & Finn, C. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In ICLR. 2019.
● However, model-based RL does not
generalize well to unseen environments [3]
● For generalization, we need context
information from past observations
“Context-awareness!”
14. Context-aware Dynamics Model
● Main idea: separate context learning and next-state inference
● Context learning
Introduce a context encoder that outputs
a context latent vector
15. Context-aware Dynamics Model
● Main idea: separate context learning and next-state inference
● Context learning
Introduce a context encoder that learns
the context latent vector
● Next-state inference
Condition a dynamics model on the
context latent vector
16. Context-aware Dynamics Model
● Main idea: separate context learning and next-state inference
● Context learning
Introduce a context encoder that learns
the context latent vector
● Next-state inference
Condition a dynamics model on the
context latent vector
Challenge: how to encode more meaningful information of dynamics?
17. Context-aware Dynamics Model
● Loss function for context learning
● Future-step prediction
Make predictions multiple timesteps into
the future
18. Context-aware Dynamics Model
● Loss function for context learning
● Future-step prediction
Make predictions multiple timesteps into
the future
● Backward prediction
Predict backward transitions
20. Ablation Study
Effects of prediction loss
Vanilla dynamics model (DM)
: No context learning
Vanilla DM + context learning
with one-step forward
Vanilla DM + context learning
with future-step forward
Vanilla DM + context learning
with future-step forward & backward
21. CaDM is Model-agnostic
● Prediction error for Half-Cheetah with varying body masses
Vanilla DM PE-TS [4]
[4] Chua, K., Calandra, R., McAllister, R., & Levine, S. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In NeurIPS. 2018.
27. ● 10 past transitions and 20 future predictions
Prediction Visualization
28. ● 10 past transitions and 20 future predictions
Prediction Visualization
29. ● 10 past transitions and 20 future predictions
Prediction Visualization
30. ● 10 past transitions and 20 future predictions
Prediction Visualization
31. ● Context also improves the generalization of model-free RL method
Context helps Model-free RL too
● Proximal policy optimization (PPO) [5]
[5] Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. Proximal policy optimization algorithms. arXiv. 2017
32. ● Context also improves the generalization of model-free RL method
Context helps Model-free RL too
● Proximal policy optimization (PPO) [5]
● Model-free RL also suffers from poor
generalization [6, 7]
[5] Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. Proximal policy optimization algorithms. arXiv. 2017
[6] Packer, C., Gao, K., Kos, J., Krähenbühl, P., Koltun, V., & Song, D. Assessing generalization in deep reinforcement learning. arXiv. 2018.
[7] Cobbe, K., Klimov, O., Hesse, C., Kim, T., & Schulman, J. Quantifying generalization in reinforcement learning. In ICML. 2019.
33. ● Context also improves the generalization of model-free RL method
Context helps Model-free RL too
[5] Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. Proximal policy optimization algorithms. arXiv. 2017
[6] Packer, C., Gao, K., Kos, J., Krähenbühl, P., Koltun, V., & Song, D. Assessing generalization in deep reinforcement learning. arXiv. 2018.
[7] Cobbe, K., Klimov, O., Hesse, C., Kim, T., & Schulman, J. Quantifying generalization in reinforcement learning. In ICML. 2019.
● Proximal policy optimization (PPO) [5]
● Model-free RL also suffers from poor
generalization [6, 7]
● PPO + CaDM
○ Conditioning policy and value
networks on learned latent vector
34. ● We evaluate the generalization performance in two regimes
○ Moderate
○ Extreme
Experimental Setup: Environments
35. Model-based RL: HalfCheetah
[3] Nagabandi, A., Clavera, I., Liu, S., Fearing, R. S., Abbeel, P., Levine, S., & Finn, C. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In ICLR. 2019.
[4] Chua, K., Calandra, R., McAllister, R., & Levine, S. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In NeurIPS. 2018.
36. Model-based RL: HalfCheetah
[3] Nagabandi, A., Clavera, I., Liu, S., Fearing, R. S., Abbeel, P., Levine, S., & Finn, C. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In ICLR. 2019.
[4] Chua, K., Calandra, R., McAllister, R., & Levine, S. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In NeurIPS. 2018.
37. Model-based RL: HalfCheetah
[3] Nagabandi, A., Clavera, I., Liu, S., Fearing, R. S., Abbeel, P., Levine, S., & Finn, C. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In ICLR. 2019.
[4] Chua, K., Calandra, R., McAllister, R., & Levine, S. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In NeurIPS. 2018.
38. Model-free RL: HalfCheetah
[9] Rakelly, K., Zhou, A., Quillen, D., Finn, C., & Levine, S. Efficient off-policy meta-reinforcement learning via probabilistic context variables. In ICML. 2019.
[10] Zhou, W., Pinto, L., & Gupta, A. (2019). Environment probing interaction policies. In ICLR. 2019.
39. Conclusion
● For dynamics generalization,
○ We propose a context-aware dynamics model
○ Novel loss function for context learning
● Code is available at
● https://github.com/younggyoseo/CaDM
https://sites.google.com/view/cadm
Thank you!