7. Manipulations in Everyday Activities
Folding clothes
Cleaning
Cooking
Bathing
Dressing
…
7
Japanese way of
folding T-shirts
https://youtu.be/b5A
WQ5aBjgE
Chinese
cooking skills
https://youtu.be
/PFGGTPPNdRQ
10. Yamaguchiet al. "DCOB: Action space for reinforcementlearning of high DoF robots", AutonomousRobots, 2013
https://www.youtube.com/playlist?list=PL41MvLpqzOg8FF0xekWT9NXCdjzN_8PUS
11. Deep Reinforcement Learning
Deep learning: With big data, NN can learn any I/O
mapping with any precision. We don't have to care
about how large the state space is. It can directly
handle image as an input without designing features.
Deep RL: Using deep NN to represent policy, dynamical
models, value functions, etc. Deep RL can handle large
state space with big data.
E.g. Atali, Google (S Levine)'s learning visual servoing.
11
12. Deep Reinforcement Learning
12
(T-L) Learning to play Atari games
by Google DeepMind, Mnih et al.
2015
https://youtu.be/cjpEIotvwFY
(T-R) DeepMPC Robotic
Experiments - PR2 cuts food, Lenz
et al. 2015
https://youtu.be/BwA90MmkvPU
(B-L) Learning to grasp from 50K
Tries, Pinto et al. 2016
https://youtu.be/oSqHc0nLkm8
(B-R) Learning hand-eye
coordination for robotic grasping,
Levine et al. 2017
https://youtu.be/l8zKZLqkfII
13. Deep Reinforcement Learning
Can Deep RL solve RL problems in general?
Maybe YES
Is that intelligentthat we expect to robots?
Maybe NO
Learning grasping: 50,000 samples (Pintoet al. 2016),
800,000samples (Levine et al. 2017)
How many samples are necessary to learn cooking
sushi?
Strategy to designing a problem is unclear
Learning with less samples is unclear
13
14. Intelligent Robot
English proverb says:
"A word to the wise is enough."
"Many words to a fool, half a word to the wise."
In Japanese:一を知って十を知る
Robot version:
Many practices to a fool robot, half a practice to the
intelligent robot.
14
15. How do we measure intelligence of robots?
Adaptation ability
Generalization ability
Scalability
15 From talk by Leslie Kaelbling inICRA’16
18. Key components to create intelligent robots
Library of skills
Structured knowledge
Learning and reasoning methods
Richer sensing and general hardware
18
19. My Work (Introduced today)
Deformable object manipulation (liquids, powders,
vegetables and fruits, etc.)
Representing behaviors with a skill library;
Verification in PR2 and Baxter pouring
Model-based RL with structured knowledge;
Verified in simulation pouring
Richer sensing helps learning: Liquid flow
perception, FingerVision
19
28. Sharing Knowledge Among Robots
28
The same implementation
worked on PR2 and Baxter
PR2 and Baxter:
Diff: Kinematics, grippers
Same: Arm DoF, sensors
Sharable knowledge:
Skills
Behavior structure
Not sharable:
Policy parameters
29. Achieved and NOT Achieved
Achieved:
Generalization of grasping, moving container, and
pouring skills
over container shapes
over initial container poses
over different target amounts
Adaptation of pouring skills
to new material types & container shapes
NOT achieved:
Generalization of pouring skills
over material types & container shapes
29
31. Reinforcement Learning in Pouring
Components of pouring behavior:
Skill library: can be general
Behavior structure: can be general
Selection of skill and skill parameters: situation specific
Planning (dynamic programming) is necessary
Dynamics are partially unknown
Reinforcement Learning Problem
31
34. Model-free is tend to obtain better performance
34
[Kober,Peters,2011] [Kormushev,2010]
35. Model-free is robust in POMDP
35
Yamaguchiet al. "DCOB: Action space for reinforcementlearning of high DoF robots", AutonomousRobots, 2013
https://www.youtube.com/playlist?list=PL41MvLpqzOg8FF0xekWT9NXCdjzN_8PUS
POMDP:
Partially Observable
Markov Decision
Process
36. Model-based is suffered from simulation biases
36
Simulation bias: When forward models are inaccurate (usual when
learning models), integrating the forward models causes a rapid
increase of future state estimation errors
cf. [Atkeson,Schaal,1997b][Kober,Peters,2013]
37. Model-based is good at generalization
37
input
output
hidden
- u
update
FK ANN
Learning inverse kinematics of android face
[Magtanong, Yamaguchi, et al. 2012]
38. Model-based is good at sharing / reusing
learned components
38
Forward models are sharable / reusable
Analyticalmodelscan be combined
40. Our Approach
Model-based reinforcement learning
How to deal with simulation biases?
Do not try to learn dx/dt = F(x,u) (dt: small like xx ms)
Learn (sub)task-level dynamics
Parameters F_grasp Grasp result
Parameters F_flow_ctrl Flow ctrl result
Use stochastic models
Gaussian F Gaussian
Use stochastic dynamic programming
E.g. Stochastic (Differential) Dynamic Programming
How to work with a skill library?
40
41. Model-based RL for Graph-Structured Dynamics
41
Learning Unknown
Dynamical Systems
with Stochastic
Neural Networks
Planning Actions
with Stochastic
Graph-DDP
42. 42
Forward model can be:
• Dynamical system with/wo action
parameters
• Kinematics
• Featuredetection, Policy
parameterization
• Reward
• …
Bifurcation model can be:
• Possible different results of an action
• Skill selection
• Spatial decomposition of dynamics
• Spatial conversion, including
kinematics, feature detection, policy
parameterization, and rewards
• …
GraphDDP
Bifurcation primitive
[Yamaguchi andAtkeson, Humanoids2015, 2016]
51. Example-1: Flow in Pouring
Do robots need to perceive FLOW in pouring?
51
Skill parameters
Flow
Poured amount
Robot can learn skill
parameters to maximize
rewards (poured amount ==
target amount)
Considering decomposed
dynamics (flow as
intermediate state) makes
learning easier
54. Example-2: Tactile Sensing in Manipulation
Tactilesensing is necessary in manipulation?
e.g. Google’s grasp learning: No tactile sensing; learning
visual servoing
What if grasping a container whose content is unknown?
What if external force is applied?
54
55. FingerVision: Vision-based Tactile Sensing
55
Multimodal tactile sensing
Force distribution
Proximity Vision
Slip / Deformation
Object pose, texture, shape
Low-cost and easy to
manufacture
Physically robust
56.
57.
58.
59.
60. Summary
Library of skills is essential
Skills and high-level behavior representations can be shared among robots
Consideringpros & cons of reinforcement learningapproaches is important
Model-free is tend to obtain better performance
Model-free is robust in POMDP
Model-based is suffered from simulation biases
Model-based is good at generalization
Model-based is good at sharing / reusing learned components
Model-based is flexible to reward changes
Model-based reinforcement learningmethod for graph-structured dynamical
systems is proposed
Learning forward models with stochastic neural networks
Planning with stochastic Graph-DDP (differential dynamicprogramming)
Generalization of pouring behavior over material types is achieved
Decomposition of dynamics and richer sensing is useful in learning
More work: http://akihikoy.net60