SlideShare a Scribd company logo
1 of 60
Download to read offline
Robot Learning with Structured Knowledge
And Richer Sensing
Akihiko Yamaguchi
Robotics Institute, Carnegie Mellon University
src: www.wolframalpha.com
Manipulations in Everyday Activities
Folding clothes
Cleaning
Cooking
Bathing
Dressing
…
7
Japanese way of
folding T-shirts
https://youtu.be/b5A
WQ5aBjgE
Chinese
cooking skills
https://youtu.be
/PFGGTPPNdRQ
Dynamic Programming& ReinforcementLearning
8
• Tasty / Awful
• “I am satisfied ”
• ….
Dynamic Programming when {Fk} are given
Reinforcement Learningwhen {Fk} are unknown
https://www.youtube.com/playlist?list=PL41MvLpqzOg8FF0xekWT9NXCdjzN_8PUS
Yamaguchiet al. "DCOB: Action space for reinforcementlearning of high DoF robots", AutonomousRobots, 2013
https://www.youtube.com/playlist?list=PL41MvLpqzOg8FF0xekWT9NXCdjzN_8PUS
Deep Reinforcement Learning
Deep learning: With big data, NN can learn any I/O
mapping with any precision. We don't have to care
about how large the state space is. It can directly
handle image as an input without designing features.
Deep RL: Using deep NN to represent policy, dynamical
models, value functions, etc. Deep RL can handle large
state space with big data.
E.g. Atali, Google (S Levine)'s learning visual servoing.
11
Deep Reinforcement Learning
12
(T-L) Learning to play Atari games
by Google DeepMind, Mnih et al.
2015
https://youtu.be/cjpEIotvwFY
(T-R) DeepMPC Robotic
Experiments - PR2 cuts food, Lenz
et al. 2015
https://youtu.be/BwA90MmkvPU
(B-L) Learning to grasp from 50K
Tries, Pinto et al. 2016
https://youtu.be/oSqHc0nLkm8
(B-R) Learning hand-eye
coordination for robotic grasping,
Levine et al. 2017
https://youtu.be/l8zKZLqkfII
Deep Reinforcement Learning
Can Deep RL solve RL problems in general?
Maybe YES
Is that intelligentthat we expect to robots?
Maybe NO
Learning grasping: 50,000 samples (Pintoet al. 2016),
800,000samples (Levine et al. 2017)
How many samples are necessary to learn cooking
sushi?
Strategy to designing a problem is unclear
Learning with less samples is unclear
13
Intelligent Robot
English proverb says:
"A word to the wise is enough."
"Many words to a fool, half a word to the wise."
In Japanese:一を知って十を知る
Robot version:
Many practices to a fool robot, half a practice to the
intelligent robot.
14
How do we measure intelligence of robots?
Adaptation ability
Generalization ability
Scalability
15 From talk by Leslie Kaelbling inICRA’16
Key components to create intelligent robots
???
???
???
???
16
Key components to create intelligent robots
Library of skills
Structured knowledge
Learning and reasoning methods
Richer sensing and general hardware
18
My Work (Introduced today)
Deformable object manipulation (liquids, powders,
vegetables and fruits, etc.)
Representing behaviors with a skill library;
Verification in PR2 and Baxter pouring
Model-based RL with structured knowledge;
Verified in simulation pouring
Richer sensing helps learning: Liquid flow
perception, FingerVision
19
Library of skills is essential
20
http://reflectionsintheword.files.wordpress.com/
2012/08/pouring-water-into-glass.jpg
http://schools.graniteschools.org/
edtech-canderson/files/2013/01/
heinz-ketchup-old-bottle.jpg
http://old.post-gazette.com/images2/
20021213hosqueeze_230.jpg
http://img.diytrade.com/cdimg/1352823/17809917/
0/1292834033/shampoo_bottle_bodywash_bottle.jpg
http://www.nescafe.com/
upload/golden_roast_f_711.png
Pouring Behavior with Skill Library
Skill library
flow ctrl (tip, shake, …), grasp, move arm, …
 State machines (structure, feedback control)
Planning methods
grasp, re-grasp, pouring locations,
feasible trajectories, …
 Optimization-based approach
Learning methods
Skill selection  Table, Softmax
Parameter adjustment
(e.g. shake axis)  Optimization (CMA-ES)
Improve plan quality  Improve value functions27
Sharing Knowledge Among Robots
28
The same implementation
worked on PR2 and Baxter
PR2 and Baxter:
Diff: Kinematics, grippers
Same: Arm DoF, sensors
Sharable knowledge:
Skills
Behavior structure
Not sharable:
Policy parameters
Achieved and NOT Achieved
Achieved:
Generalization of grasping, moving container, and
pouring skills
 over container shapes
 over initial container poses
 over different target amounts
Adaptation of pouring skills
 to new material types & container shapes
NOT achieved:
Generalization of pouring skills
 over material types & container shapes
29
Reinforcement learning for generalization
30
Reinforcement Learning in Pouring
Components of pouring behavior:
Skill library: can be general
Behavior structure: can be general
Selection of skill and skill parameters: situation specific
 Planning (dynamic programming) is necessary
Dynamics are partially unknown
 Reinforcement Learning Problem
31
Reinforcement Learning
32
[ReinforcementLearning]
[DirectPolicySearch] [ValueFunction-based]
[Model-based]
[Model-free]
RL RL SL
[DynamicProgramming][Optimization]
Planning
depth
Learning
complexity
[Policy] [ValueFunctions] [ForwardModels]What is
learned
0 1 N
33
[Direct PolicySearch]
[Value Function-based]
[Model-based]
Model-free is tend to obtain better performance
34
[Kober,Peters,2011] [Kormushev,2010]
Model-free is robust in POMDP
35
Yamaguchiet al. "DCOB: Action space for reinforcementlearning of high DoF robots", AutonomousRobots, 2013
https://www.youtube.com/playlist?list=PL41MvLpqzOg8FF0xekWT9NXCdjzN_8PUS
POMDP:
Partially Observable
Markov Decision
Process
Model-based is suffered from simulation biases
36
Simulation bias: When forward models are inaccurate (usual when
learning models), integrating the forward models causes a rapid
increase of future state estimation errors
cf. [Atkeson,Schaal,1997b][Kober,Peters,2013]
Model-based is good at generalization
37
input
output
hidden
- u
update
FK ANN
Learning inverse kinematics of android face
[Magtanong, Yamaguchi, et al. 2012]
Model-based is good at sharing / reusing
learned components
38
Forward models are sharable / reusable
Analyticalmodelscan be combined
Model-based is flexible to reward changes
39
Our Approach
Model-based reinforcement learning
How to deal with simulation biases?
Do not try to learn dx/dt = F(x,u) (dt: small like xx ms)
Learn (sub)task-level dynamics
 Parameters  F_grasp  Grasp result
 Parameters  F_flow_ctrl  Flow ctrl result
Use stochastic models
 Gaussian  F  Gaussian
Use stochastic dynamic programming
 E.g. Stochastic (Differential) Dynamic Programming
How to work with a skill library?
40
Model-based RL for Graph-Structured Dynamics
41
Learning Unknown
Dynamical Systems
with Stochastic
Neural Networks
Planning Actions
with Stochastic
Graph-DDP
42
Forward model can be:
• Dynamical system with/wo action
parameters
• Kinematics
• Featuredetection, Policy
parameterization
• Reward
• …
Bifurcation model can be:
• Possible different results of an action
• Skill selection
• Spatial decomposition of dynamics
• Spatial conversion, including
kinematics, feature detection, policy
parameterization, and rewards
• …
GraphDDP
Bifurcation primitive
[Yamaguchi andAtkeson, Humanoids2015, 2016]
43
GraphDDP
Bifurcation primitive
[Yamaguchi andAtkeson, Humanoids2015, 2016]
Skill selection
Possibledifferent results of an action
Reward
Spatial decomposition of dynamics
44
GraphDDP
Bifurcation primitive
[Yamaguchi andAtkeson, Humanoids2015, 2016]
45
 Tree DDP with multi-point search
GraphDDP
Graphstructure analysis
[Yamaguchi andAtkeson, Humanoids2015, 2016]
46 [Yamaguchi andAtkeson, ICRA 2016]
Stochastic Neural Networks
47
Works in real robots
Pouring Simulation with OpenDE
48
49
Achieved GENERALIZATION
over material variation and
container shapes
Decomposition of dynamics and
richer sensing are useful in learning
50
Example-1: Flow in Pouring
Do robots need to perceive FLOW in pouring?
51
Skill parameters
Flow
Poured amount
Robot can learn skill
parameters to maximize
rewards (poured amount ==
target amount)
Considering decomposed
dynamics (flow as
intermediate state) makes
learning easier
52
Decomposed Not decomposed
How to Perceive Flow in Reality?
53
Example-2: Tactile Sensing in Manipulation
Tactilesensing is necessary in manipulation?
e.g. Google’s grasp learning: No tactile sensing; learning
visual servoing
What if grasping a container whose content is unknown?
What if external force is applied?
54
FingerVision: Vision-based Tactile Sensing
55
Multimodal tactile sensing
Force distribution
Proximity Vision
Slip / Deformation
Object pose, texture, shape
Low-cost and easy to
manufacture
Physically robust
Summary
Library of skills is essential
Skills and high-level behavior representations can be shared among robots
Consideringpros & cons of reinforcement learningapproaches is important
Model-free is tend to obtain better performance
Model-free is robust in POMDP
Model-based is suffered from simulation biases
Model-based is good at generalization
Model-based is good at sharing / reusing learned components
Model-based is flexible to reward changes
Model-based reinforcement learningmethod for graph-structured dynamical
systems is proposed
Learning forward models with stochastic neural networks
Planning with stochastic Graph-DDP (differential dynamicprogramming)
Generalization of pouring behavior over material types is achieved
Decomposition of dynamics and richer sensing is useful in learning
More work: http://akihikoy.net60

More Related Content

Similar to Robot Learning with Structured Knowledge And Richer Sensing

Deep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersDeep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersRoelof Pieters
 
Classrooms of the Future: How to Add Mixed Reality and Robotics to a Schools ...
Classrooms of the Future: How to Add Mixed Reality and Robotics to a Schools ...Classrooms of the Future: How to Add Mixed Reality and Robotics to a Schools ...
Classrooms of the Future: How to Add Mixed Reality and Robotics to a Schools ...Bond University
 
Machine learning limits (What can Machine Learning do and what it can't)
Machine learning limits (What can Machine Learning do and what it can't)Machine learning limits (What can Machine Learning do and what it can't)
Machine learning limits (What can Machine Learning do and what it can't)Moataz Mahmoud
 
Mod6,mr302,mtr converted
Mod6,mr302,mtr convertedMod6,mr302,mtr converted
Mod6,mr302,mtr convertedJishnu Jish
 
Wireless Pick & Place Robot
Wireless Pick & Place RobotWireless Pick & Place Robot
Wireless Pick & Place RobotMarmik Kothari
 
State representation learning for control: an overview
State representation learning for control: an overview State representation learning for control: an overview
State representation learning for control: an overview Natalia Díaz Rodríguez
 
Software Carpentry for the Geophysical Sciences
Software Carpentry for the Geophysical SciencesSoftware Carpentry for the Geophysical Sciences
Software Carpentry for the Geophysical SciencesAron Ahmadia
 
Robotics Slideshare Resource
Robotics Slideshare ResourceRobotics Slideshare Resource
Robotics Slideshare Resources0133116
 
Agile leadership practices for PIONEERS
 Agile leadership practices for PIONEERS Agile leadership practices for PIONEERS
Agile leadership practices for PIONEERSStefan Haas
 
Deep Learning @ ZHAW Datalab (with Mark Cieliebak & Yves Pauchard)
Deep Learning @ ZHAW Datalab (with Mark Cieliebak & Yves Pauchard)Deep Learning @ ZHAW Datalab (with Mark Cieliebak & Yves Pauchard)
Deep Learning @ ZHAW Datalab (with Mark Cieliebak & Yves Pauchard)Thilo Stadelmann
 
[244]로봇이 현실 세계에 대해 학습하도록 만들기
[244]로봇이 현실 세계에 대해 학습하도록 만들기[244]로봇이 현실 세계에 대해 학습하도록 만들기
[244]로봇이 현실 세계에 대해 학습하도록 만들기NAVER D2
 
RoboCup@HomeEDU AI-Focused Robotics Education by Home Service Robot DIY | Mon...
RoboCup@HomeEDU AI-Focused Robotics Education by Home Service Robot DIY | Mon...RoboCup@HomeEDU AI-Focused Robotics Education by Home Service Robot DIY | Mon...
RoboCup@HomeEDU AI-Focused Robotics Education by Home Service Robot DIY | Mon...Jeffrey Too Chuan TAN
 
RoboCup@HomeEDU AI-Focused Robotics Education by Home Service Robot DIY | Vic...
RoboCup@HomeEDU AI-Focused Robotics Education by Home Service Robot DIY | Vic...RoboCup@HomeEDU AI-Focused Robotics Education by Home Service Robot DIY | Vic...
RoboCup@HomeEDU AI-Focused Robotics Education by Home Service Robot DIY | Vic...Jeffrey Too Chuan TAN
 
How to use Artificial Intelligence with Python? Edureka
How to use Artificial Intelligence with Python? EdurekaHow to use Artificial Intelligence with Python? Edureka
How to use Artificial Intelligence with Python? EdurekaEdureka!
 
Pretotyping: Crash Test Your Idea - ITESCIA 2015-2016 (English Version)
Pretotyping: Crash Test Your Idea - ITESCIA 2015-2016 (English Version)Pretotyping: Crash Test Your Idea - ITESCIA 2015-2016 (English Version)
Pretotyping: Crash Test Your Idea - ITESCIA 2015-2016 (English Version)André De Sousa
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
Digital transformation; or how I learnt to stop worrying and love the bots!
Digital transformation; or how I learnt to stop worrying and love the bots!Digital transformation; or how I learnt to stop worrying and love the bots!
Digital transformation; or how I learnt to stop worrying and love the bots!Sayan Ghosh
 

Similar to Robot Learning with Structured Knowledge And Richer Sensing (20)

Deep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersDeep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ers
 
Classrooms of the Future: How to Add Mixed Reality and Robotics to a Schools ...
Classrooms of the Future: How to Add Mixed Reality and Robotics to a Schools ...Classrooms of the Future: How to Add Mixed Reality and Robotics to a Schools ...
Classrooms of the Future: How to Add Mixed Reality and Robotics to a Schools ...
 
Machine learning limits (What can Machine Learning do and what it can't)
Machine learning limits (What can Machine Learning do and what it can't)Machine learning limits (What can Machine Learning do and what it can't)
Machine learning limits (What can Machine Learning do and what it can't)
 
Mod6,mr302,mtr converted
Mod6,mr302,mtr convertedMod6,mr302,mtr converted
Mod6,mr302,mtr converted
 
Wireless Pick & Place Robot
Wireless Pick & Place RobotWireless Pick & Place Robot
Wireless Pick & Place Robot
 
State representation learning for control: an overview
State representation learning for control: an overview State representation learning for control: an overview
State representation learning for control: an overview
 
Software Carpentry for the Geophysical Sciences
Software Carpentry for the Geophysical SciencesSoftware Carpentry for the Geophysical Sciences
Software Carpentry for the Geophysical Sciences
 
Cracking the code
Cracking the codeCracking the code
Cracking the code
 
Robotics Slideshare Resource
Robotics Slideshare ResourceRobotics Slideshare Resource
Robotics Slideshare Resource
 
Agile leadership practices for PIONEERS
 Agile leadership practices for PIONEERS Agile leadership practices for PIONEERS
Agile leadership practices for PIONEERS
 
Deep Learning @ ZHAW Datalab (with Mark Cieliebak & Yves Pauchard)
Deep Learning @ ZHAW Datalab (with Mark Cieliebak & Yves Pauchard)Deep Learning @ ZHAW Datalab (with Mark Cieliebak & Yves Pauchard)
Deep Learning @ ZHAW Datalab (with Mark Cieliebak & Yves Pauchard)
 
[244]로봇이 현실 세계에 대해 학습하도록 만들기
[244]로봇이 현실 세계에 대해 학습하도록 만들기[244]로봇이 현실 세계에 대해 학습하도록 만들기
[244]로봇이 현실 세계에 대해 학습하도록 만들기
 
Maker Boot Camp
Maker Boot CampMaker Boot Camp
Maker Boot Camp
 
RoboCup@HomeEDU AI-Focused Robotics Education by Home Service Robot DIY | Mon...
RoboCup@HomeEDU AI-Focused Robotics Education by Home Service Robot DIY | Mon...RoboCup@HomeEDU AI-Focused Robotics Education by Home Service Robot DIY | Mon...
RoboCup@HomeEDU AI-Focused Robotics Education by Home Service Robot DIY | Mon...
 
RoboCup@HomeEDU AI-Focused Robotics Education by Home Service Robot DIY | Vic...
RoboCup@HomeEDU AI-Focused Robotics Education by Home Service Robot DIY | Vic...RoboCup@HomeEDU AI-Focused Robotics Education by Home Service Robot DIY | Vic...
RoboCup@HomeEDU AI-Focused Robotics Education by Home Service Robot DIY | Vic...
 
How to use Artificial Intelligence with Python? Edureka
How to use Artificial Intelligence with Python? EdurekaHow to use Artificial Intelligence with Python? Edureka
How to use Artificial Intelligence with Python? Edureka
 
Design Implications of the Experience API (Tin Can API)
Design Implications of the Experience API (Tin Can API)Design Implications of the Experience API (Tin Can API)
Design Implications of the Experience API (Tin Can API)
 
Pretotyping: Crash Test Your Idea - ITESCIA 2015-2016 (English Version)
Pretotyping: Crash Test Your Idea - ITESCIA 2015-2016 (English Version)Pretotyping: Crash Test Your Idea - ITESCIA 2015-2016 (English Version)
Pretotyping: Crash Test Your Idea - ITESCIA 2015-2016 (English Version)
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
Digital transformation; or how I learnt to stop worrying and love the bots!
Digital transformation; or how I learnt to stop worrying and love the bots!Digital transformation; or how I learnt to stop worrying and love the bots!
Digital transformation; or how I learnt to stop worrying and love the bots!
 

Recently uploaded

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 

Recently uploaded (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

Robot Learning with Structured Knowledge And Richer Sensing

  • 1. Robot Learning with Structured Knowledge And Richer Sensing Akihiko Yamaguchi Robotics Institute, Carnegie Mellon University
  • 2.
  • 3.
  • 4.
  • 6.
  • 7. Manipulations in Everyday Activities Folding clothes Cleaning Cooking Bathing Dressing … 7 Japanese way of folding T-shirts https://youtu.be/b5A WQ5aBjgE Chinese cooking skills https://youtu.be /PFGGTPPNdRQ
  • 8. Dynamic Programming& ReinforcementLearning 8 • Tasty / Awful • “I am satisfied ” • …. Dynamic Programming when {Fk} are given Reinforcement Learningwhen {Fk} are unknown
  • 10. Yamaguchiet al. "DCOB: Action space for reinforcementlearning of high DoF robots", AutonomousRobots, 2013 https://www.youtube.com/playlist?list=PL41MvLpqzOg8FF0xekWT9NXCdjzN_8PUS
  • 11. Deep Reinforcement Learning Deep learning: With big data, NN can learn any I/O mapping with any precision. We don't have to care about how large the state space is. It can directly handle image as an input without designing features. Deep RL: Using deep NN to represent policy, dynamical models, value functions, etc. Deep RL can handle large state space with big data. E.g. Atali, Google (S Levine)'s learning visual servoing. 11
  • 12. Deep Reinforcement Learning 12 (T-L) Learning to play Atari games by Google DeepMind, Mnih et al. 2015 https://youtu.be/cjpEIotvwFY (T-R) DeepMPC Robotic Experiments - PR2 cuts food, Lenz et al. 2015 https://youtu.be/BwA90MmkvPU (B-L) Learning to grasp from 50K Tries, Pinto et al. 2016 https://youtu.be/oSqHc0nLkm8 (B-R) Learning hand-eye coordination for robotic grasping, Levine et al. 2017 https://youtu.be/l8zKZLqkfII
  • 13. Deep Reinforcement Learning Can Deep RL solve RL problems in general? Maybe YES Is that intelligentthat we expect to robots? Maybe NO Learning grasping: 50,000 samples (Pintoet al. 2016), 800,000samples (Levine et al. 2017) How many samples are necessary to learn cooking sushi? Strategy to designing a problem is unclear Learning with less samples is unclear 13
  • 14. Intelligent Robot English proverb says: "A word to the wise is enough." "Many words to a fool, half a word to the wise." In Japanese:一を知って十を知る Robot version: Many practices to a fool robot, half a practice to the intelligent robot. 14
  • 15. How do we measure intelligence of robots? Adaptation ability Generalization ability Scalability 15 From talk by Leslie Kaelbling inICRA’16
  • 16. Key components to create intelligent robots ??? ??? ??? ??? 16
  • 17.
  • 18. Key components to create intelligent robots Library of skills Structured knowledge Learning and reasoning methods Richer sensing and general hardware 18
  • 19. My Work (Introduced today) Deformable object manipulation (liquids, powders, vegetables and fruits, etc.) Representing behaviors with a skill library; Verification in PR2 and Baxter pouring Model-based RL with structured knowledge; Verified in simulation pouring Richer sensing helps learning: Liquid flow perception, FingerVision 19
  • 20. Library of skills is essential 20
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27. Pouring Behavior with Skill Library Skill library flow ctrl (tip, shake, …), grasp, move arm, …  State machines (structure, feedback control) Planning methods grasp, re-grasp, pouring locations, feasible trajectories, …  Optimization-based approach Learning methods Skill selection  Table, Softmax Parameter adjustment (e.g. shake axis)  Optimization (CMA-ES) Improve plan quality  Improve value functions27
  • 28. Sharing Knowledge Among Robots 28 The same implementation worked on PR2 and Baxter PR2 and Baxter: Diff: Kinematics, grippers Same: Arm DoF, sensors Sharable knowledge: Skills Behavior structure Not sharable: Policy parameters
  • 29. Achieved and NOT Achieved Achieved: Generalization of grasping, moving container, and pouring skills  over container shapes  over initial container poses  over different target amounts Adaptation of pouring skills  to new material types & container shapes NOT achieved: Generalization of pouring skills  over material types & container shapes 29
  • 30. Reinforcement learning for generalization 30
  • 31. Reinforcement Learning in Pouring Components of pouring behavior: Skill library: can be general Behavior structure: can be general Selection of skill and skill parameters: situation specific  Planning (dynamic programming) is necessary Dynamics are partially unknown  Reinforcement Learning Problem 31
  • 32. Reinforcement Learning 32 [ReinforcementLearning] [DirectPolicySearch] [ValueFunction-based] [Model-based] [Model-free] RL RL SL [DynamicProgramming][Optimization] Planning depth Learning complexity [Policy] [ValueFunctions] [ForwardModels]What is learned 0 1 N
  • 34. Model-free is tend to obtain better performance 34 [Kober,Peters,2011] [Kormushev,2010]
  • 35. Model-free is robust in POMDP 35 Yamaguchiet al. "DCOB: Action space for reinforcementlearning of high DoF robots", AutonomousRobots, 2013 https://www.youtube.com/playlist?list=PL41MvLpqzOg8FF0xekWT9NXCdjzN_8PUS POMDP: Partially Observable Markov Decision Process
  • 36. Model-based is suffered from simulation biases 36 Simulation bias: When forward models are inaccurate (usual when learning models), integrating the forward models causes a rapid increase of future state estimation errors cf. [Atkeson,Schaal,1997b][Kober,Peters,2013]
  • 37. Model-based is good at generalization 37 input output hidden - u update FK ANN Learning inverse kinematics of android face [Magtanong, Yamaguchi, et al. 2012]
  • 38. Model-based is good at sharing / reusing learned components 38 Forward models are sharable / reusable Analyticalmodelscan be combined
  • 39. Model-based is flexible to reward changes 39
  • 40. Our Approach Model-based reinforcement learning How to deal with simulation biases? Do not try to learn dx/dt = F(x,u) (dt: small like xx ms) Learn (sub)task-level dynamics  Parameters  F_grasp  Grasp result  Parameters  F_flow_ctrl  Flow ctrl result Use stochastic models  Gaussian  F  Gaussian Use stochastic dynamic programming  E.g. Stochastic (Differential) Dynamic Programming How to work with a skill library? 40
  • 41. Model-based RL for Graph-Structured Dynamics 41 Learning Unknown Dynamical Systems with Stochastic Neural Networks Planning Actions with Stochastic Graph-DDP
  • 42. 42 Forward model can be: • Dynamical system with/wo action parameters • Kinematics • Featuredetection, Policy parameterization • Reward • … Bifurcation model can be: • Possible different results of an action • Skill selection • Spatial decomposition of dynamics • Spatial conversion, including kinematics, feature detection, policy parameterization, and rewards • … GraphDDP Bifurcation primitive [Yamaguchi andAtkeson, Humanoids2015, 2016]
  • 43. 43 GraphDDP Bifurcation primitive [Yamaguchi andAtkeson, Humanoids2015, 2016] Skill selection Possibledifferent results of an action Reward Spatial decomposition of dynamics
  • 45. 45  Tree DDP with multi-point search GraphDDP Graphstructure analysis [Yamaguchi andAtkeson, Humanoids2015, 2016]
  • 46. 46 [Yamaguchi andAtkeson, ICRA 2016] Stochastic Neural Networks
  • 49. 49 Achieved GENERALIZATION over material variation and container shapes
  • 50. Decomposition of dynamics and richer sensing are useful in learning 50
  • 51. Example-1: Flow in Pouring Do robots need to perceive FLOW in pouring? 51 Skill parameters Flow Poured amount Robot can learn skill parameters to maximize rewards (poured amount == target amount) Considering decomposed dynamics (flow as intermediate state) makes learning easier
  • 53. How to Perceive Flow in Reality? 53
  • 54. Example-2: Tactile Sensing in Manipulation Tactilesensing is necessary in manipulation? e.g. Google’s grasp learning: No tactile sensing; learning visual servoing What if grasping a container whose content is unknown? What if external force is applied? 54
  • 55. FingerVision: Vision-based Tactile Sensing 55 Multimodal tactile sensing Force distribution Proximity Vision Slip / Deformation Object pose, texture, shape Low-cost and easy to manufacture Physically robust
  • 56.
  • 57.
  • 58.
  • 59.
  • 60. Summary Library of skills is essential Skills and high-level behavior representations can be shared among robots Consideringpros & cons of reinforcement learningapproaches is important Model-free is tend to obtain better performance Model-free is robust in POMDP Model-based is suffered from simulation biases Model-based is good at generalization Model-based is good at sharing / reusing learned components Model-based is flexible to reward changes Model-based reinforcement learningmethod for graph-structured dynamical systems is proposed Learning forward models with stochastic neural networks Planning with stochastic Graph-DDP (differential dynamicprogramming) Generalization of pouring behavior over material types is achieved Decomposition of dynamics and richer sensing is useful in learning More work: http://akihikoy.net60