Trajectory-wise MCL for Dynamics Generalization

•

1 gefällt mir•484 views

This document proposes a method called Trajectory-wise Multiple Choice Learning (MCL) to improve generalization in model-based reinforcement learning. The method uses a multi-headed dynamics model to approximate the multi-modal distribution of transition dynamics. Trajectory-wise MCL updates the prediction head that is most accurate over an entire trajectory segment, allowing each head to specialize. An adaptive planning method then uses the most accurate head based on recent experience. Evaluation shows the approach achieves superior generalization to new environments compared to baseline methods.

Ingenieurwesen

Trajectory-wise Multiple Choice Learning for
Dynamics Generalization in Reinforcement Learning
Younggyo Seo1
*, Kimin Lee2
*, Ignasi Clavera2
, Thanard Kurutach2
,
Jinwoo Shin1
and Pieter Abbeel2
KAIST1
, UC Berkeley2
*Equal Contribution
https://sites.google.com/view/trajectory-mcl

Problem: Dynamics Generalization
● Model-based RL suffers from dynamics generalization problem
Evaluation
Training
Deployment

Problem: Dynamics Generalization
● Multi-modal distribution of transition dynamics

Main Components
● Main idea: explicitly approximate the multi-modal distribution
● Multi-headed dynamics model
Approximates multi-modal distribution by
learning specialized prediction heads

Trajectory-wise Multiple Choice Learning
● For MCL, each prediction head should receive distinct training samples
Transitions
Which prediction head is most
accurate over these transitions?

Trajectory-wise Multiple Choice Learning
● For MCL, each prediction head should receive distinct training samples
Trajectory
segment
● Trajectory-wise multiple choice learning
Difference in dynamics is more distinctively captured
by considering prediction error over trajectory
segment

Context-conditional Multi-headed Dynamics Model
● We also introduce context encoder for online adaptation to unseen environments
● Context encoder g captures
contextual information from past
experience
● See [Lee’20] for more information
[Lee’20] Lee, Kimin, Younggyo Seo, Seunghyun Lee, Honglak Lee, Jinwoo Shin. "Context-aware Dynamics Model for Generalization in Model-Based Reinforcement Learning." In
ICML. 2020.

Analysis on Trajectory-wise MCL
Transitions Trajectory
segment
● Specialization leads to superior generalization performance
Hopper

Analysis on Adaptive Planning
● Qualitative analysis
○ Manually assign prediction heads specialized for [mass: 2.5] to [mass: 1.0]
[Mass: 1.0]
with prediction heads
specialized for [Mass: 2.5]
[Mass: 2.5]
with prediction heads
specialized for [Mass: 2.5]
Agent acts as if it has a heavyweight body!

Comparative Evaluation
● Superior generalization performance on unseen 6 environments

Conclusion
● For dynamics generalization
○ Context-conditional multi-headed dynamics model
○ Trajectory-wise multiple choice learning
○ Adaptive planning
Thank you!

Empfohlen

Multi reward literature_survey_younghyo_parkYounghyoPark

TIP_TAViT_presentation.pdfBoahKim2

MSCV Capstone Spring 2020 Presentation - RL for ADMayank Gupta

FedBNAnam Ur Rehman

Comparing Incremental Learning Strategies for Convolutional Neural NetworksVincenzo Lomonaco

Session-aware Linear Item-Item Models for Session-based Recommendation (WWW 2...민진 최

Combining Lazy Learning, Racing and Subsampling for Effective Feature SelectionGianluca Bontempi

Lecture 2 MAED 2023 modelling tools and energy system analysis.pptxAliFaryadras1

Empfohlen

Multi reward literature_survey_younghyo_parkYounghyoPark

TIP_TAViT_presentation.pdfBoahKim2

MSCV Capstone Spring 2020 Presentation - RL for ADMayank Gupta

FedBNAnam Ur Rehman

Comparing Incremental Learning Strategies for Convolutional Neural NetworksVincenzo Lomonaco

Session-aware Linear Item-Item Models for Session-based Recommendation (WWW 2...민진 최

Combining Lazy Learning, Racing and Subsampling for Effective Feature SelectionGianluca Bontempi

Lecture 2 MAED 2023 modelling tools and energy system analysis.pptxAliFaryadras1

Model-Based User Interface Optimization: Part IV: ADVANCED TOPICS - At SICSA ...Aalto University

Representational Continuity for Unsupervised Continual LearningMLAI2

State Representation Learning for control: an overviewNatalia Díaz Rodríguez

Autonomy Incubator Seminar Series: Tractable Robust Planning and Model Learni...AutonomyIncubator

AI BASED PPT FOR PROJCTS USEFUL FOR EDITINGLokesh147875

Agile Kolkata 2023 I Deep Learning for Sustainable Energy: A journey - Dr Sap...AgileNetwork

Graph convolutional neural networks for web-scale recommender systems.pptxssuser2624f71

Navigation in 3 d environment with reinforcement learning by Predrag Njegovan...SmartCat

Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised...ALINLAB

Learning bounds for risk-sensitive learningALINLAB

CSI: Novelty Detection via Contrastive Learning on Distributionally Shifted I...ALINLAB

Polynomial Tensor Sketch for Element-wise Matrix Function (ICML 2020)ALINLAB

Context-aware Dynamics Model for Generalization in Model-Based Reinforcement ...ALINLAB

Self-supervised Label Augmentation via Input Transformations (ICML 2020)ALINLAB

M2m: Imbalanced Classification via Major-to-minor Translation (CVPR 2020)ALINLAB

Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona

Porous Ceramics seminar and technical writingrakeshbaidya232001

Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile

Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile

Online banking management system project.pdfKamal Acharya

Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi

Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile

Weitere ähnliche Inhalte

Ähnlich wie Trajectory-wise MCL for Dynamics Generalization

Model-Based User Interface Optimization: Part IV: ADVANCED TOPICS - At SICSA ...Aalto University

Representational Continuity for Unsupervised Continual LearningMLAI2

State Representation Learning for control: an overviewNatalia Díaz Rodríguez

Autonomy Incubator Seminar Series: Tractable Robust Planning and Model Learni...AutonomyIncubator

AI BASED PPT FOR PROJCTS USEFUL FOR EDITINGLokesh147875

Agile Kolkata 2023 I Deep Learning for Sustainable Energy: A journey - Dr Sap...AgileNetwork

Graph convolutional neural networks for web-scale recommender systems.pptxssuser2624f71

Navigation in 3 d environment with reinforcement learning by Predrag Njegovan...SmartCat

Ähnlich wie Trajectory-wise MCL for Dynamics Generalization (8)

Model-Based User Interface Optimization: Part IV: ADVANCED TOPICS - At SICSA ...

Representational Continuity for Unsupervised Continual Learning

State Representation Learning for control: an overview

Autonomy Incubator Seminar Series: Tractable Robust Planning and Model Learni...

AI BASED PPT FOR PROJCTS USEFUL FOR EDITING

Agile Kolkata 2023 I Deep Learning for Sustainable Energy: A journey - Dr Sap...

Graph convolutional neural networks for web-scale recommender systems.pptx

Navigation in 3 d environment with reinforcement learning by Predrag Njegovan...

Mehr von ALINLAB

Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised...ALINLAB

Learning bounds for risk-sensitive learningALINLAB

CSI: Novelty Detection via Contrastive Learning on Distributionally Shifted I...ALINLAB

Polynomial Tensor Sketch for Element-wise Matrix Function (ICML 2020)ALINLAB

Context-aware Dynamics Model for Generalization in Model-Based Reinforcement ...ALINLAB

Self-supervised Label Augmentation via Input Transformations (ICML 2020)ALINLAB

M2m: Imbalanced Classification via Major-to-minor Translation (CVPR 2020)ALINLAB

Mehr von ALINLAB (7)

Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised...

Learning bounds for risk-sensitive learning

CSI: Novelty Detection via Contrastive Learning on Distributionally Shifted I...

Polynomial Tensor Sketch for Element-wise Matrix Function (ICML 2020)

Context-aware Dynamics Model for Generalization in Model-Based Reinforcement ...

Self-supervised Label Augmentation via Input Transformations (ICML 2020)

M2m: Imbalanced Classification via Major-to-minor Translation (CVPR 2020)

Kürzlich hochgeladen

Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona

Porous Ceramics seminar and technical writingrakeshbaidya232001

Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile

Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile

Online banking management system project.pdfKamal Acharya

Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi

Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile

Introduction to IEEE STANDARDS and its different types.pptxupamatechverse

Introduction and different types of Ethernet.pptxupamatechverse

DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEslot gacor bisa pakai pulsa

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile

Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis

Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth

Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile

The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat

UNIT-II FMM-Flow Through Circular Conduitsrknatarajan

CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani

UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan

(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat

Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis

Kürzlich hochgeladen (20)

Processing & Properties of Floor and Wall Tiles.pptx

Porous Ceramics seminar and technical writing

Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...

Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts

Online banking management system project.pdf

Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...

Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...

Introduction to IEEE STANDARDS and its different types.pptx

Introduction and different types of Ethernet.pptx

DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts

Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...

Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...

Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...

The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...

UNIT-II FMM-Flow Through Circular Conduits

CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record

UNIT-III FMM. DIMENSIONAL ANALYSIS

(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...

Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...

Trajectory-wise MCL for Dynamics Generalization

1. Trajectory-wise Multiple Choice Learning for Dynamics Generalization in Reinforcement Learning Younggyo Seo1 *, Kimin Lee2 *, Ignasi Clavera2 , Thanard Kurutach2 , Jinwoo Shin1 and Pieter Abbeel2 KAIST1 , UC Berkeley2 *Equal Contribution https://sites.google.com/view/trajectory-mcl

2. Problem: Dynamics Generalization ● Model-based RL suffers from dynamics generalization problem Evaluation Training Deployment

3. Problem: Dynamics Generalization ● Multi-modal distribution of transition dynamics

4. Main Components ● Main idea: explicitly approximate the multi-modal distribution ● Multi-headed dynamics model Approximates multi-modal distribution by learning specialized prediction heads

5. Main Components ● Main idea: explicitly approximate the multi-modal distribution ● Multi-headed dynamics model Approximates multi-modal distribution by learning specialized prediction heads ● Multiple choice learning (MCL) Update the most accurate prediction head for specialization

6. Main Components ● Main idea: explicitly approximate the multi-modal distribution ● Multi-headed dynamics model Approximates multi-modal distribution by learning specialized prediction heads ● Multiple choice learning (MCL) Update the most accurate prediction head for specialization ● Adaptive planning Use the most accurate prediction head over a recent experience for planning

7. Trajectory-wise Multiple Choice Learning ● For MCL, each prediction head should receive distinct training samples Transitions Which prediction head is most accurate over these transitions?

8. Trajectory-wise Multiple Choice Learning ● For MCL, each prediction head should receive distinct training samples Trajectory segment ● Trajectory-wise multiple choice learning Difference in dynamics is more distinctively captured by considering prediction error over trajectory segment

9. Context-conditional Multi-headed Dynamics Model ● We also introduce context encoder for online adaptation to unseen environments ● Context encoder g captures contextual information from past experience ● See [Lee’20] for more information [Lee’20] Lee, Kimin, Younggyo Seo, Seunghyun Lee, Honglak Lee, Jinwoo Shin. "Context-aware Dynamics Model for Generalization in Model-Based Reinforcement Learning." In ICML. 2020.

10. Analysis on Trajectory-wise MCL Transitions Trajectory segment ● Specialization leads to superior generalization performance Hopper

11. Analysis on Adaptive Planning ● Qualitative analysis ○ Manually assign prediction heads specialized for [mass: 2.5] to [mass: 1.0] [Mass: 1.0] with prediction heads specialized for [Mass: 2.5] [Mass: 2.5] with prediction heads specialized for [Mass: 2.5] Agent acts as if it has a heavyweight body!

12. Comparative Evaluation ● Superior generalization performance on unseen 6 environments

13. Conclusion ● For dynamics generalization ○ Context-conditional multi-headed dynamics model ○ Trajectory-wise multiple choice learning ○ Adaptive planning Thank you!