【DL輪読会】Transporters with Visual Foresight for Solving Unseen Rearrangement Tasks

•Download as PPTX, PDF•

0 likes•345 views

Deep Learning JP

2022/9/9 Deep Learning JP http://deeplearning.jp/seminar-2/

Technology

1
DEEP LEARNING JP
[DL Papers]
http://deeplearning.jp/
Transporters with Visual Foresight for Solving Unseen
Rearrangement Tasks
Koki Yamane, University of Tsukuba

書誌情報
2022/9/9 2
 題名
 Transporters with Visual Foresight for Solving Unseen Rearrangement
Tasks
 著者
 Hongtao Wu, Jikai Ye, Xin Meng, Chris Paxton, Gregory Chirikjian
 The Johns Hopkins University
 National University of Singapore
 NVIDIA
 会議: arXiv (2022, May)
 URL: https://arxiv.org/pdf/2202.10765.pdf

概要
 ゼロショットでのタスクの汎化
 未学習タスク
 長時間タスク
 木探索による未来予測
 画像予測モデル
 複数行動提案モジュール
 画像予測モデルの高効率学習
 FCNの平行移動等価性
 入力が平行移動すると出力も移動
2022/9/9 3
目標指定タスク計画により幅広い再配置タスクを実現

従来手法:
Transporter Networks (TN)
FCNの平行移動等価性を利用して高効率なpick-and-placeタスクの学習を実現
2022/9/9 4

従来手法:
Goal-Conditioned Transporter Networks (GCTN)
目標状態の入力を追加し非剛体物体に対応
2022/9/9 5

提案手法:
Transporters with Visual Foresight (TVF)
行動提案と画像予測による木探索
2022/9/9 6

提案手法:
画像予測モデル
 入力
 真上からのRGB-D 画像
 行動情報 (Pick-pose, Place-pose)
 出力
 次ステップの画像
 アーキテクチャ
 36層のFCN (Fully Convolutional Network)
2022/9/9 7
FCNにより次ステップの画像を予測
平行移動等価性による高効率学習

提案手法:
複数行動提案モジュール
2022/9/9 8
木探索のために複数の行動を提案
1. GCTNで行動価値マップを取得
2. 行動価値マップを閾値処理
3. K-Means クラスタリング
行動価値マップから数個の候補に絞り込み

実験（シミュレーション）
14 種類のブロック積みタスク (未学習含む)
2022/9/9 9
 シミュレータ: Ravens (pybulletベースのマニピュレータシミュレータ)
 ロボット: UR5 (吸引機構)
 データ数：1000

実験結果（シミュレーション）
2022/9/9 10
未学習のタスクでも高い成功率を達成
> 90%

実験（実世界）
6 種類のブロック積みタスク (未学習含む)
2022/9/9 11
 手法
 GCTN: Goal-Conditioned Transporter Networks (ベースライン)
 TVF: Transporters with Visual Foresight (提案手法)
 3 種類の学習タスクと 3 種類の未学習タスク
 3 種類のタスクに合計 30 回のデモ（1 タスク 10 回）
 各タスク 10 回の施行で検証

実験結果（実世界）
2022/9/9 12
実世界でも高い成功率を達成
未学習
タスク結果例

まとめ
2022/9/9 13
 複数行動提案と画像予測を繰り返すことにより行動とその結果を木探索
 画像予測モデルにFCNを使用し高効率の学習を実現
 未学習タスクに対して高い成功率を達成
 実世界のロボットで検証し高い成功率を実証

What's hot

[DL輪読会]data2vec: A General Framework for Self-supervised Learning in Speech,...Deep Learning JP

【DL輪読会】How Much Can CLIP Benefit Vision-and-Language Tasks? Deep Learning JP

[DL輪読会]Vision Transformer with Deformable Attention （Deformable Attention Tra...Deep Learning JP

[DL輪読会]GLIDE: Guided Language to Image Diffusion for Generation and EditingDeep Learning JP

GAN（と強化学習との関係）Masahiro Suzuki

[DL輪読会]相互情報量最大化による表現学習Deep Learning JP

【DL輪読会】An Image is Worth One Word: Personalizing Text-to-Image Generation usi...Deep Learning JP

【DL輪読会】Emergence of maps in the memories of blind navigation agentsDeep Learning JP

【DL輪読会】Flamingo: a Visual Language Model for Few-Shot Learning 画像×言語の大規模基盤モ...Deep Learning JP

[DL輪読会]Ensemble Distribution DistillationDeep Learning JP

【DL輪読会】Efficiently Modeling Long Sequences with Structured State SpacesDeep Learning JP

【DL輪読会】DINOv2: Learning Robust Visual Features without SupervisionDeep Learning JP

【DL輪読会】Foundation Models for Decision Making: Problems, Methods, and Opportun...Deep Learning JP

[DL輪読会]Understanding Black-box Predictions via Influence Functions Deep Learning JP

【DL輪読会】GAN-Supervised Dense Visual Alignment (CVPR 2022)Deep Learning JP

[DL輪読会]GQNと関連研究，世界モデルとの関係についてDeep Learning JP

PRML学習者から入る深層生成モデル入門tmtm otm

Transformer メタサーベイcvpaper. challenge

SSII2022 [SS1] ニューラル3D表現の最新動向〜ニューラルネットでなんでも表せる？？〜SSII

[DL輪読会]GENESIS: Generative Scene Inference and Sampling with Object-Centric L...Deep Learning JP

What's hot (20)

[DL輪読会]data2vec: A General Framework for Self-supervised Learning in Speech,...

【DL輪読会】How Much Can CLIP Benefit Vision-and-Language Tasks?

[DL輪読会]Vision Transformer with Deformable Attention （Deformable Attention Tra...

[DL輪読会]GLIDE: Guided Language to Image Diffusion for Generation and Editing

GAN（と強化学習との関係）

[DL輪読会]相互情報量最大化による表現学習

【DL輪読会】An Image is Worth One Word: Personalizing Text-to-Image Generation usi...

【DL輪読会】Emergence of maps in the memories of blind navigation agents

【DL輪読会】Flamingo: a Visual Language Model for Few-Shot Learning 画像×言語の大規模基盤モ...

[DL輪読会]Ensemble Distribution Distillation

【DL輪読会】Efficiently Modeling Long Sequences with Structured State Spaces

【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision

【DL輪読会】Foundation Models for Decision Making: Problems, Methods, and Opportun...

[DL輪読会]Understanding Black-box Predictions via Influence Functions

【DL輪読会】GAN-Supervised Dense Visual Alignment (CVPR 2022)

[DL輪読会]GQNと関連研究，世界モデルとの関係について

PRML学習者から入る深層生成モデル入門

Transformer メタサーベイ

SSII2022 [SS1] ニューラル3D表現の最新動向〜ニューラルネットでなんでも表せる？？〜

[DL輪読会]GENESIS: Generative Scene Inference and Sampling with Object-Centric L...

Similar to 【DL輪読会】Transporters with Visual Foresight for Solving Unseen Rearrangement Tasks

[CVPR2020読み会＠CV勉強会] 3D Packing for Self-Supervised Monocular Depth EstimationKazuyuki Miyazawa

RobotPaperChallenge 2019-07robotpaperchallenge

【CVPR 2019】Do Better ImageNet Models Transfer Better?cvpaper. challenge

Vision and Language（メタサーベイ）cvpaper. challenge

You Only Look One-level Featureの解説と見せかけた物体検出のよもやま話Yusuke Uchida

論文 Solo Advent Calendar諒介荒木

先端技術とメディア表現　第4回レポートまとめDigital Nature Group

これからのコンピュータビジョン技術 - cvpaper.challenge in PRMU Grand Challenge 2016 (PRMU研究会 2...cvpaper. challenge

コンピュータビジョンの研究開発状況cvpaper. challenge

ArtTrack: Articulated Multi-Person Tracking in the Wild : CV勉強会関東Yukiyoshi Sasao

PredCNN: Predictive Learning with Cascade Convolutionsharmonylab

「解説資料」ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation Takumi Ohkuma

【DL輪読会】ViTPose: Simple Vision Transformer Baselines for Human Pose EstimationDeep Learning JP

NVIDIA Seminar ディープラーニングによる画像認識と応用事例Takayoshi Yamashita

SSII2021 [SS1] Transformer x Computer Visionの実活用可能性と展望〜 TransformerのCompute...SSII

SfM Learner系単眼深度推定手法についてRyutaro Yamauchi

輪講スライド20220903.pptxnishimoto2

[DL輪読会]VoxelPose: Towards Multi-Camera 3D Human Pose Estimation in Wild Envir...Deep Learning JP

[DL輪読会]Transframer: Arbitrary Frame Prediction with Generative ModelsDeep Learning JP

semantic segmentation サーベイyohei okawa

Similar to 【DL輪読会】Transporters with Visual Foresight for Solving Unseen Rearrangement Tasks (20)

[CVPR2020読み会＠CV勉強会] 3D Packing for Self-Supervised Monocular Depth Estimation

RobotPaperChallenge 2019-07

【CVPR 2019】Do Better ImageNet Models Transfer Better?

Vision and Language（メタサーベイ）

You Only Look One-level Featureの解説と見せかけた物体検出のよもやま話

論文 Solo Advent Calendar

先端技術とメディア表現　第4回レポートまとめ

これからのコンピュータビジョン技術 - cvpaper.challenge in PRMU Grand Challenge 2016 (PRMU研究会 2...

コンピュータビジョンの研究開発状況

ArtTrack: Articulated Multi-Person Tracking in the Wild : CV勉強会関東

PredCNN: Predictive Learning with Cascade Convolutions

「解説資料」ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation

【DL輪読会】ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation

NVIDIA Seminar ディープラーニングによる画像認識と応用事例

SSII2021 [SS1] Transformer x Computer Visionの実活用可能性と展望〜 TransformerのCompute...

SfM Learner系単眼深度推定手法について

輪講スライド20220903.pptx

[DL輪読会]VoxelPose: Towards Multi-Camera 3D Human Pose Estimation in Wild Envir...

[DL輪読会]Transframer: Arbitrary Frame Prediction with Generative Models

semantic segmentation サーベイ

More from Deep Learning JP

【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving PlannersDeep Learning JP

【DL輪読会】事前学習用データセットについてDeep Learning JP

【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...Deep Learning JP

【DL輪読会】Zero-Shot Dual-Lens Super-ResolutionDeep Learning JP

【DL輪読会】BloombergGPT: A Large Language Model for Finance arxivDeep Learning JP

【DL輪読会】マルチモーダル LLMDeep Learning JP

【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...Deep Learning JP

【DL輪読会】AnyLoc: Towards Universal Visual Place RecognitionDeep Learning JP

【DL輪読会】Can Neural Network Memorization Be Localized?Deep Learning JP

【DL輪読会】Hopfield network　関連研究についてDeep Learning JP

【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )Deep Learning JP

【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...Deep Learning JP

【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"Deep Learning JP

【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "Deep Learning JP

【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat ModelsDeep Learning JP

【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"Deep Learning JP

【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...Deep Learning JP

【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...Deep Learning JP

【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...Deep Learning JP

【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...Deep Learning JP

More from Deep Learning JP (20)

【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners

【DL輪読会】事前学習用データセットについて

【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...

【DL輪読会】Zero-Shot Dual-Lens Super-Resolution

【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv

【DL輪読会】マルチモーダル LLM

【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...

【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition

【DL輪読会】Can Neural Network Memorization Be Localized?

【DL輪読会】Hopfield network　関連研究について

【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )

【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...

【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"

【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "

【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models

【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"

【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...

【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...

【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...

【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...

【DL輪読会】Transporters with Visual Foresight for Solving Unseen Rearrangement Tasks

1. 1 DEEP LEARNING JP [DL Papers] http://deeplearning.jp/ Transporters with Visual Foresight for Solving Unseen Rearrangement Tasks Koki Yamane, University of Tsukuba

2. 書誌情報 2022/9/9 2  題名  Transporters with Visual Foresight for Solving Unseen Rearrangement Tasks  著者  Hongtao Wu, Jikai Ye, Xin Meng, Chris Paxton, Gregory Chirikjian  The Johns Hopkins University  National University of Singapore  NVIDIA  会議: arXiv (2022, May)  URL: https://arxiv.org/pdf/2202.10765.pdf

3. 概要  ゼロショットでのタスクの汎化  未学習タスク  長時間タスク  木探索による未来予測  画像予測モデル  複数行動提案モジュール  画像予測モデルの高効率学習  FCNの平行移動等価性  入力が平行移動すると出力も移動 2022/9/9 3 目標指定タスク計画により幅広い再配置タスクを実現

4. 従来手法: Transporter Networks (TN) FCNの平行移動等価性を利用して高効率なpick-and-placeタスクの学習を実現 2022/9/9 4

5. 従来手法: Goal-Conditioned Transporter Networks (GCTN) 目標状態の入力を追加し非剛体物体に対応 2022/9/9 5

6. 提案手法: Transporters with Visual Foresight (TVF) 行動提案と画像予測による木探索 2022/9/9 6

7. 提案手法: 画像予測モデル  入力  真上からのRGB-D 画像  行動情報 (Pick-pose, Place-pose)  出力  次ステップの画像  アーキテクチャ  36層のFCN (Fully Convolutional Network) 2022/9/9 7 FCNにより次ステップの画像を予測平行移動等価性による高効率学習

8. 提案手法: 複数行動提案モジュール 2022/9/9 8 木探索のために複数の行動を提案 1. GCTNで行動価値マップを取得 2. 行動価値マップを閾値処理 3. K-Means クラスタリング行動価値マップから数個の候補に絞り込み

9. 実験（シミュレーション） 14 種類のブロック積みタスク (未学習含む) 2022/9/9 9  シミュレータ: Ravens (pybulletベースのマニピュレータシミュレータ)  ロボット: UR5 (吸引機構)  データ数：1000

10. 実験結果（シミュレーション） 2022/9/9 10 未学習のタスクでも高い成功率を達成 > 90%

11. 実験（実世界） 6 種類のブロック積みタスク (未学習含む) 2022/9/9 11  手法  GCTN: Goal-Conditioned Transporter Networks (ベースライン)  TVF: Transporters with Visual Foresight (提案手法)  3 種類の学習タスクと 3 種類の未学習タスク  3 種類のタスクに合計 30 回のデモ（1 タスク 10 回）  各タスク 10 回の施行で検証

12. 実験結果（実世界） 2022/9/9 12 実世界でも高い成功率を達成未学習タスク結果例

13. まとめ 2022/9/9 13  複数行動提案と画像予測を繰り返すことにより行動とその結果を木探索  画像予測モデルにFCNを使用し高効率の学習を実現  未学習タスクに対して高い成功率を達成  実世界のロボットで検証し高い成功率を実証

【DL輪読会】Transporters with Visual Foresight for Solving Unseen Rearrangement Tasks

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to 【DL輪読会】Transporters with Visual Foresight for Solving Unseen Rearrangement Tasks

Similar to 【DL輪読会】Transporters with Visual Foresight for Solving Unseen Rearrangement Tasks (20)

More from Deep Learning JP

More from Deep Learning JP (20)

【DL輪読会】Transporters with Visual Foresight for Solving Unseen Rearrangement Tasks