2. What is deep robotics?
source : RI seminar : sergey Levine
https://www.youtube.com/watch?v=eKaYnXQUb2g&t=346s
3. What is deep robotics?
• Think about computer vision
source : RI seminar : sergey Levine
https://www.youtube.com/watch?v=eKaYnXQUb2g&t=346s
4. What is deep robotics?
• Think about computer vision
features
eg. HOG
traditional
computer
vision
mid-level
features
eg. DPM
classifier
eg. SVM
imge semantic
label
training trainingtraining
source : RI seminar : sergey Levine
https://www.youtube.com/watch?v=eKaYnXQUb2g&t=346s
5. What is deep robotics?
• Think about computer vision
features
eg. HOG
traditional
computer
vision
mid-level
features
eg. DPM
classifier
eg. SVM
imge semantic
label
deep
learning
imge artificial neural network
semantic
label
training trainingtraining
end-to-end training
source : RI seminar : sergey Levine
https://www.youtube.com/watch?v=eKaYnXQUb2g&t=346s
6. What is deep robotics?
• deep robotics analogus to computer vision
source : RI seminar : sergey Levine
https://www.youtube.com/watch?v=eKaYnXQUb2g&t=346s
7. What is deep robotics?
• deep robotics analogus to computer vision
state
estimation
traditional
robotics
modeling
&
prediction
planning
observation
(eg, image)
controls
training
source : RI seminar : sergey Levine
https://www.youtube.com/watch?v=eKaYnXQUb2g&t=346s
low-level
control
training training training
8. What is deep robotics?
• deep robotics analogus to computer vision
state
estimation
traditional
robotics
modeling
&
prediction
planning
observation
(eg, image)
controls
training
source : RI seminar : sergey Levine
https://www.youtube.com/watch?v=eKaYnXQUb2g&t=346s
low-level
control
training training training
deep
robotics
observation artificial neural network controls
end-to-end training
9. What is deep robotics?
• Connectionist model in robotics
• Benefit?
- General-purpose algorithm
- Combine perception and control
- Acquire complex skills with general-purpose representations
source : RI seminar : sergey Levine
https://www.youtube.com/watch?v=eKaYnXQUb2g&t=346s
20. In real world...
Several robots
Several acceleration algorithms
Several patience...
Deep Reinforcement Learning for Robotic Manipulation with Asynchronous Off-Policy Update (S.Gu et al, 2016)
(no pixel input)
43. Guided policy search
Solve optimal control
(C-step)
Training policy network
(S-step)
𝑝𝑝𝑖𝑖
roll-out
𝜃𝜃
𝑝𝑝𝑖𝑖
𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 𝑜𝑜𝑜𝑜𝑜𝑜
Learning dynamics
𝑝𝑝𝑖𝑖 = 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑛𝑛𝑝𝑝𝑖𝑖
∑ 𝐿𝐿𝐿(𝑠𝑠𝑡𝑡, 𝑎𝑎𝑡𝑡)
where 𝐿𝐿′ 𝑠𝑠𝑡𝑡, 𝑎𝑎𝑡𝑡 = 𝐿𝐿 𝑠𝑠𝑡𝑡, 𝑎𝑎𝑡𝑡 + 𝐾𝐾𝐾𝐾 𝑝𝑝𝑖𝑖 𝜋𝜋 𝜃𝜃
This constraint is very important for convergence
(constraint for the optimal control
not to be far from policy)