SlideShare ist ein Scribd-Unternehmen logo
1 von 33
Downloaden Sie, um offline zu lesen
“Deep Learning in Robotics”
Student: Gabriele Sisinna (516706)
Course: Intelligent Systems
Professor: Beatrice Lazzerini
Authors
Harry A. Pierson
Michael S. Gashler
Introduction
• This review discusses the applications, benefits, and
limitations of deep learning for robotic systems, using
contemporary research as example.
• Applying deep learning to
robotics is an active
research area, with at
least thirty papers
published on the subject
from 2014 through the
time of this writing (2017).
Deep learning
• Deep learning is the science of training large artificial
neural networks. Deep neural networks (DNNs) can have
hundreds of millions of parameters, allowing them to model
complex functions such as nonlinear dynamics.
History
• Several important advances have slowly transformed regression
into what we now call deep learning. First, the addition of an
activation function enabled regression methods to fit to
nonlinear functions, and It introduced biological similarity with
brain cells.
• Next, nonlinear models were stacked in “layers” to create
powerful models, called multi-layer perceptrons (MLP).
History
• Multi-layer perceptrons are universal function approximators,
meaning they could fit to any data, no matter how complex, with
arbitrary precision, using a finite number of regression units.
• Backpropagation marked the beginning of the deep learning
revolution; however, researchers still mostly limited their neural
networks to a few layers because of the problem of vanishing
gradients
Application in Robotics
• Neural networks were successfully applied for robotics
control as early as the 1980s. It was quickly recognized that
nonlinear regression provided the functionality that was
needed for operating dynamical systems in continuous
spaces
Biorobotics and Neural networks
• In 2008, neuroscientists made advances in recognizing how
animals achieved locomotion, and were able to extend this
knowledge to neural networks for experimental control of
biomimetic robots
Infinite Degree of Freedom discretization
• In the soft robotics field
new techniques are
needed for the control of
continuous systems
with high number of
DOFs
Structure A: MLP as function approximator
• DNNs are well suited for use with robots because they are
flexible and can be used in structures that other machine
learning models cannot support.
• MLP are trained by presenting a large collection of example
training pairs:
• An optimization method is applied to minimize the
prediction loss
Supervised
Classification
• This structures also excel at classification tasks, such as
determining what type of object lies before the robot, which
grasping approach or general planning strategy is best
suited for current conditions, or what is the state of a certain
complex object with which the robot is interacting.
Parallel Computing: training DNNs
• To make effective use of deep learning models, it is
important to train on one or more General Purpose
Graphical Processing Units (GPGPUs). Many other ways of
parallelizing deep neural networks have been attempted, but
none of them yet yield the performance gains of GPGPUs.
Structure B: Autoencoders
• Auto-encoders are used primarily in cases where high-
dimensional observations are available, but the user wants
a low-dimensional representation of state.
• It is one common model for facilitating “unsupervised
learning.” It requires two DNNs, called an “encoder” and a
“decoder.”
Unsupervised
Structure C: Recurrent Neural Networks
• They can keep track of the past
thanks to feedback loops
(discrete time non autonomous
dynamical systems)
• Structure C is a type of “recurrent
neural network,” which is designed to
model dynamical systems, including
robots. It is often trained with an
approach called “backpropagation
through time”
Supervised
Structure D: Deep Reinforcement Learning
• Deep reinforcement learning (DRL) uses deep learning and
reinforcement learning principles to create efficient algorithms
applied on areas like robotics, video games, healthcare, ecc…
• Implementing deep learning architectures (deep neural networks)
with reinforcement learning algorithms (Q-learning, actor critic,
etc.) is capable of scaling to previously unsolvable problems.
Exploration and exploitation
• Instead of minimizing
prediction error against a
training set of samples, deep
Q-networks seek to maximize
long-term reward.
• This is done through seeking
a balance between
exploration and exploitation
that ultimately leads to an
effective policy model.
Biological analogy
• Doya identified that supervised learning methods (Structures
A and C) mirror the function of the cerebellum.
• Unsupervised methods (Structure B) learn in a manner
comparable to that of the cerebral cortex and reinforcement
learning (Structure D) is analogous with the basal ganglia.
What’s the point?
• Every part of a complex system can be made to “learn”.
• The real power of deep learning does not come from using
just one of the structures described in the previous slides as
a component in a robotics system, but in connecting parts of
all these structures together to form a full system that learns
throughout.
• This is where the “deep” in deep learning begins to make its
impact – when each part of a system is capable of learning,
the system can adapt in sophisticated ways.
Limits
• Some remaining barriers to the adoption of deep learning in
robotics include the necessity for large training data and
long training times. One promising trend is crowdsourcing
training data via cloud robotics.
• Distributed computing offers the potential to direct more
computing resources to a given problem but can be limited
by communication speeds.
• DNNs excel at 2D image recognition, but they are known to
be highly susceptible to adversarial samples, and they still
struggle to model 3D spatial layouts.
Open challenges for the next years
1.Learning complex, high-dimensional, and novel dynamics
2.Learning control policies in dynamic environments
3.Advanced manipulation
4.Advanced object recognition
5.Interpreting and anticipating human actions (next slides)
6.Sensor fusion & dimensionality reduction
7.High-level task planning
Robot gains Social Intelligence
through Multimodal Deep
Reinforcement Learning
Authors
Ahmed Hussain Qureshi, Yutaka Nakamura, Yuichiro Yoshikawa and Hiroshi Ishiguro
Pepper Robot
• Designed to be used in professional environments, Pepper
is a humanoid robot that can interact with people, ‘read’
emotions, learn, move and adapt to its environment, and
even recharge on its own. Pepper can perform facial
recognition and develop individualized relationships when it
interacts with people.
• The authors propose a Multimodal Deep Q-Network
(MDQN) to enable a robot to learn human-like interaction
skills through a trial and error method.
Reinforcement Learning background
• An agent interacts
sequentially with an
environment E with an aim of
maximizing cumulative
reward.
• At each time-step, the agent
observes a state 𝐒𝒕, takes an
action at from the set of legal
actions 𝑨 = {𝟏,· · · , 𝑲} and
receives a scalar reward 𝑹𝒕
from the environment.
• An agent’s behavior is
formalized by a policy π,
which maps states to actions.
• The goal of a RL agent is to
learn a policy π that
maximizes the expected total
return (reward)
Deep Q-network
• Further advancements in
machine learning have merged
deep learning with reinforcement
learning (RL) which has led to
the development of the
deep Q-network (DQN)
• DQN utilizes an automatic
feature extractor called deep
convolutional neural network
(Convnets) to approximate the
action-value function of
Q-learning method
CNN for action-value function approximation
• The structure of the two streams is identical and each stream comprises of
eight layers (excluding the input layer).
• Since each stream takes eight frames as an input, therefore, the last eight
frames from the corresponding camera are pre-processed and stacked
together to form the input for each stream of the network.
Multimodal Deep Q-Network (MDQN)
• The dual stream convnets process the depth and grayscale
images independently
• The robot learns to greet people using a set of four legal actions,
i.e., waiting, looking towards human, waving hand and
handshaking.
• The objective of the robot is to learn which action to perform in
each situation.
Reward and action-value function
• The expected total return is the sum of rewards discounted by
factor 𝜸: [𝟎, 𝟏] at each time-step (𝛾 = 0.99 for the proposed work)
• Given that the optimal Q-function 𝑸′(𝒔’, 𝒂’) of the sequence 𝒔’ at
next time-step is deterministic for all possible actions 𝒂’, the
optimal policy is to select an action 𝒂’ that maximizes the expected
value of: 𝐫 + 𝐐′ 𝐬’, 𝐚’
• In DQN, the parameters of the Q-network are adjusted iteratively
towards the Bellman target by minimizing the following loss
function:
Parameters and agent behavior
• The current parameters are updated by stochastic gradient
descent in the direction of the gradient of the loss function with
respect to the parameters
• The agent’s behavior at each time-step is selected by an ε-greedy
policy where the greedy strategy is adopted with probability
(1−ε) while the random strategy with probability ε.
• The robot gets a reward of 1 on the successful handshake, -0.1
on an unsuccessful handshake and 0 for the rest of the three
actions.
Proposed algorithm
• Data generation phase: the system interacts with the environment
using Q-network 𝑄(𝑠, 𝑎; 𝜃). The system observes the current
scene, which comprises of grayscale and depth frames, and takes
an action using the 𝜺-greedy strategy. The environment in return
provides the scalar reward. The interaction experience
𝑒 = (𝑠𝑖, 𝑎𝑖, 𝑟𝑖, 𝑠𝑖 + 1) is stored in the replay memory 𝑴.
• Training phase: the system utilizes the collected data, stored in
replay memory 𝑴, for training the networks. The hyperparameter 𝒏
denotes the number of experience replay. For each experience
replay, a mini buffer 𝑩 of size 2000 interaction experiences is
randomly sampled from the finite sized replay memory M. The
model is trained on the mini batches sampled from buffer B and the
network parameters are updated iteratively.
Evaluation
• For testing the model performance, a separate test dataset,
comprising 4480 grayscale and depth frames not seen by the
system during learning was collected.
• If the agent’s decision was considered wrong by the majority, then
the evaluators were asked to consent on the most appropriate
action for that scenario.
Results
• The authors evaluated the trained y-channel Q-network,
depth-channel Q-network and the MDQN on the test
dataset; table 1 summarizes the performance measures of
these trained Q-networks. In table 1, accuracy corresponds
to how often the predictions by the Q-networks were correct.
• The multimodal deep Q-network achieved maximum
accuracy of 95.3 %, whereas the y-channel and the depth-
channel of Q-networks achieved 85.9% and 82.6% accuracy,
respectively. The results in table 1 validate that fusion of
two streams improves the social cognitive ability of the
agent.
Performance
• This figure shows the performance of MDQN on the test dataset
over the series of episodes. The episode 0 on the plot
corresponds to the Q-network with randomly initialized parameters.
The plot indicates that the performance of MQDN agent on test
dataset is continuously improving as the agent gets more and
more interaction experience with humans.
Conclusions
• In social physical human-robot interaction, it is very difficult to
envisage all the possible interaction scenarios which the robot can
face in the real-world, hence programming a social robot is
notoriously hard.
• The MDQN-agent has learned to give importance to walking
trajectories, head orientation, body language and the activity in
progress in order to decide its best action.
• Aims: i) increase the action space instead of limiting it to just four
actions; ii) use recurrent attention model so that the robot can
indicate its attention; iii) evaluate the influence of three actions,
other than handshake, on the human behavior.
Thanks!
References
• Deep Learning in Robotics: A Review of Recent Research
(Harry A. Pierson, Michael S. Gashler)
• Robot gains Social Intelligence through Multimodal Deep
Reinforcement Learning (Ahmed Hussain Qureshi, Yutaka
Nakamura, Yuichiro Yoshikawa, Hiroshi Ishiguro)

Weitere ähnliche Inhalte

Was ist angesagt?

6 games
6 games6 games
6 gamesMhd Sb
 
Artificial Intelligence gaming techniques
Artificial Intelligence gaming techniquesArtificial Intelligence gaming techniques
Artificial Intelligence gaming techniquesSomnathMore3
 
Turing Test in Artificial Intelligence.pptx
Turing Test in Artificial Intelligence.pptxTuring Test in Artificial Intelligence.pptx
Turing Test in Artificial Intelligence.pptxRSAISHANKAR
 
A brief overview of Reinforcement Learning applied to games
A brief overview of Reinforcement Learning applied to gamesA brief overview of Reinforcement Learning applied to games
A brief overview of Reinforcement Learning applied to gamesThomas da Silva Paula
 
Heuristic Search Techniques {Artificial Intelligence}
Heuristic Search Techniques {Artificial Intelligence}Heuristic Search Techniques {Artificial Intelligence}
Heuristic Search Techniques {Artificial Intelligence}FellowBuddy.com
 
Proximal Policy Optimization (Reinforcement Learning)
Proximal Policy Optimization (Reinforcement Learning)Proximal Policy Optimization (Reinforcement Learning)
Proximal Policy Optimization (Reinforcement Learning)Thom Lane
 
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...Universitat Politècnica de Catalunya
 
Reinforcement Learning 2. Multi-armed Bandits
Reinforcement Learning 2. Multi-armed BanditsReinforcement Learning 2. Multi-armed Bandits
Reinforcement Learning 2. Multi-armed BanditsSeung Jae Lee
 
Reinforcement learning:policy gradient (part 1)
Reinforcement learning:policy gradient (part 1)Reinforcement learning:policy gradient (part 1)
Reinforcement learning:policy gradient (part 1)Bean Yen
 
Unsupervised learning represenation with DCGAN
Unsupervised learning represenation with DCGANUnsupervised learning represenation with DCGAN
Unsupervised learning represenation with DCGANShyam Krishna Khadka
 
Algorithmic Game Theory
Algorithmic Game TheoryAlgorithmic Game Theory
Algorithmic Game TheoryKarel Ha
 
Backpropagation in Convolutional Neural Network
Backpropagation in Convolutional Neural NetworkBackpropagation in Convolutional Neural Network
Backpropagation in Convolutional Neural NetworkHiroshi Kuwajima
 
Feature Engineering
Feature Engineering Feature Engineering
Feature Engineering odsc
 
Markov decision process
Markov decision processMarkov decision process
Markov decision processJie-Han Chen
 
Brief introduction on GAN
Brief introduction on GANBrief introduction on GAN
Brief introduction on GANDai-Hai Nguyen
 
PR-409: Denoising Diffusion Probabilistic Models
PR-409: Denoising Diffusion Probabilistic ModelsPR-409: Denoising Diffusion Probabilistic Models
PR-409: Denoising Diffusion Probabilistic ModelsHyeongmin Lee
 
An introduction to reinforcement learning
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learningSubrat Panda, PhD
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks남주 김
 

Was ist angesagt? (20)

6 games
6 games6 games
6 games
 
Artificial Intelligence gaming techniques
Artificial Intelligence gaming techniquesArtificial Intelligence gaming techniques
Artificial Intelligence gaming techniques
 
Turing Test in Artificial Intelligence.pptx
Turing Test in Artificial Intelligence.pptxTuring Test in Artificial Intelligence.pptx
Turing Test in Artificial Intelligence.pptx
 
A brief overview of Reinforcement Learning applied to games
A brief overview of Reinforcement Learning applied to gamesA brief overview of Reinforcement Learning applied to games
A brief overview of Reinforcement Learning applied to games
 
Heuristic Search Techniques {Artificial Intelligence}
Heuristic Search Techniques {Artificial Intelligence}Heuristic Search Techniques {Artificial Intelligence}
Heuristic Search Techniques {Artificial Intelligence}
 
Proximal Policy Optimization (Reinforcement Learning)
Proximal Policy Optimization (Reinforcement Learning)Proximal Policy Optimization (Reinforcement Learning)
Proximal Policy Optimization (Reinforcement Learning)
 
AI Lecture 3 (solving problems by searching)
AI Lecture 3 (solving problems by searching)AI Lecture 3 (solving problems by searching)
AI Lecture 3 (solving problems by searching)
 
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...
 
Reinforcement Learning 2. Multi-armed Bandits
Reinforcement Learning 2. Multi-armed BanditsReinforcement Learning 2. Multi-armed Bandits
Reinforcement Learning 2. Multi-armed Bandits
 
Hill climbing
Hill climbingHill climbing
Hill climbing
 
Reinforcement learning:policy gradient (part 1)
Reinforcement learning:policy gradient (part 1)Reinforcement learning:policy gradient (part 1)
Reinforcement learning:policy gradient (part 1)
 
Unsupervised learning represenation with DCGAN
Unsupervised learning represenation with DCGANUnsupervised learning represenation with DCGAN
Unsupervised learning represenation with DCGAN
 
Algorithmic Game Theory
Algorithmic Game TheoryAlgorithmic Game Theory
Algorithmic Game Theory
 
Backpropagation in Convolutional Neural Network
Backpropagation in Convolutional Neural NetworkBackpropagation in Convolutional Neural Network
Backpropagation in Convolutional Neural Network
 
Feature Engineering
Feature Engineering Feature Engineering
Feature Engineering
 
Markov decision process
Markov decision processMarkov decision process
Markov decision process
 
Brief introduction on GAN
Brief introduction on GANBrief introduction on GAN
Brief introduction on GAN
 
PR-409: Denoising Diffusion Probabilistic Models
PR-409: Denoising Diffusion Probabilistic ModelsPR-409: Denoising Diffusion Probabilistic Models
PR-409: Denoising Diffusion Probabilistic Models
 
An introduction to reinforcement learning
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learning
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 

Ähnlich wie Deep Learning in Robotics: Robot gains Social Intelligence through Multimodal Deep Reinforcement Learning

Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...Impetus Technologies
 
A Survey of Convolutional Neural Networks
A Survey of Convolutional Neural NetworksA Survey of Convolutional Neural Networks
A Survey of Convolutional Neural NetworksRimzim Thube
 
Model Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningModel Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningPramit Choudhary
 
Learning of robot navigation tasks by
Learning of robot navigation tasks byLearning of robot navigation tasks by
Learning of robot navigation tasks bycsandit
 
LEARNING OF ROBOT NAVIGATION TASKS BY PROBABILISTIC NEURAL NETWORK
LEARNING OF ROBOT NAVIGATION TASKS BY PROBABILISTIC NEURAL NETWORKLEARNING OF ROBOT NAVIGATION TASKS BY PROBABILISTIC NEURAL NETWORK
LEARNING OF ROBOT NAVIGATION TASKS BY PROBABILISTIC NEURAL NETWORKcsandit
 
Deep reinforcement learning framework for autonomous driving
Deep reinforcement learning framework for autonomous drivingDeep reinforcement learning framework for autonomous driving
Deep reinforcement learning framework for autonomous drivingGopikaGopinath5
 
Facial Emotion Detection on Children's Emotional Face
Facial Emotion Detection on Children's Emotional FaceFacial Emotion Detection on Children's Emotional Face
Facial Emotion Detection on Children's Emotional FaceTakrim Ul Islam Laskar
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learningAkshay Kanchan
 
LEARNING OF ROBOT NAVIGATION TASKS BY PROBABILISTIC NEURAL NETWORK
LEARNING OF ROBOT NAVIGATION TASKS BY PROBABILISTIC NEURAL NETWORKLEARNING OF ROBOT NAVIGATION TASKS BY PROBABILISTIC NEURAL NETWORK
LEARNING OF ROBOT NAVIGATION TASKS BY PROBABILISTIC NEURAL NETWORKcscpconf
 
Real Time Object Dectection using machine learning
Real Time Object Dectection using machine learningReal Time Object Dectection using machine learning
Real Time Object Dectection using machine learningpratik pratyay
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)DonghyunKang12
 
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerMDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerPoo Kuan Hoong
 
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...Tulipp. Eu
 
Muhammad Usman Akhtar | Ph.D Scholar | Wuhan University | School of Co...
Muhammad Usman Akhtar  |  Ph.D Scholar  |  Wuhan  University  |  School of Co...Muhammad Usman Akhtar  |  Ph.D Scholar  |  Wuhan  University  |  School of Co...
Muhammad Usman Akhtar | Ph.D Scholar | Wuhan University | School of Co...Wuhan University
 
Machine learning
Machine learningMachine learning
Machine learninghplap
 

Ähnlich wie Deep Learning in Robotics: Robot gains Social Intelligence through Multimodal Deep Reinforcement Learning (20)

Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
 
A Survey of Convolutional Neural Networks
A Survey of Convolutional Neural NetworksA Survey of Convolutional Neural Networks
A Survey of Convolutional Neural Networks
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
ppt.pdf
ppt.pdfppt.pdf
ppt.pdf
 
Model Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningModel Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep Learning
 
Learning of robot navigation tasks by
Learning of robot navigation tasks byLearning of robot navigation tasks by
Learning of robot navigation tasks by
 
LEARNING OF ROBOT NAVIGATION TASKS BY PROBABILISTIC NEURAL NETWORK
LEARNING OF ROBOT NAVIGATION TASKS BY PROBABILISTIC NEURAL NETWORKLEARNING OF ROBOT NAVIGATION TASKS BY PROBABILISTIC NEURAL NETWORK
LEARNING OF ROBOT NAVIGATION TASKS BY PROBABILISTIC NEURAL NETWORK
 
Deep reinforcement learning framework for autonomous driving
Deep reinforcement learning framework for autonomous drivingDeep reinforcement learning framework for autonomous driving
Deep reinforcement learning framework for autonomous driving
 
Facial Emotion Detection on Children's Emotional Face
Facial Emotion Detection on Children's Emotional FaceFacial Emotion Detection on Children's Emotional Face
Facial Emotion Detection on Children's Emotional Face
 
Computer Design Concepts for Machine Learning
Computer Design Concepts for Machine LearningComputer Design Concepts for Machine Learning
Computer Design Concepts for Machine Learning
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
 
LEARNING OF ROBOT NAVIGATION TASKS BY PROBABILISTIC NEURAL NETWORK
LEARNING OF ROBOT NAVIGATION TASKS BY PROBABILISTIC NEURAL NETWORKLEARNING OF ROBOT NAVIGATION TASKS BY PROBABILISTIC NEURAL NETWORK
LEARNING OF ROBOT NAVIGATION TASKS BY PROBABILISTIC NEURAL NETWORK
 
Real Time Object Dectection using machine learning
Real Time Object Dectection using machine learningReal Time Object Dectection using machine learning
Real Time Object Dectection using machine learning
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)
 
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerMDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
 
slide-171212080528.pptx
slide-171212080528.pptxslide-171212080528.pptx
slide-171212080528.pptx
 
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
 
Deep learning
Deep learningDeep learning
Deep learning
 
Muhammad Usman Akhtar | Ph.D Scholar | Wuhan University | School of Co...
Muhammad Usman Akhtar  |  Ph.D Scholar  |  Wuhan  University  |  School of Co...Muhammad Usman Akhtar  |  Ph.D Scholar  |  Wuhan  University  |  School of Co...
Muhammad Usman Akhtar | Ph.D Scholar | Wuhan University | School of Co...
 
Machine learning
Machine learningMachine learning
Machine learning
 

Kürzlich hochgeladen

Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxfenichawla
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 

Kürzlich hochgeladen (20)

Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 

Deep Learning in Robotics: Robot gains Social Intelligence through Multimodal Deep Reinforcement Learning

  • 1. “Deep Learning in Robotics” Student: Gabriele Sisinna (516706) Course: Intelligent Systems Professor: Beatrice Lazzerini Authors Harry A. Pierson Michael S. Gashler
  • 2. Introduction • This review discusses the applications, benefits, and limitations of deep learning for robotic systems, using contemporary research as example. • Applying deep learning to robotics is an active research area, with at least thirty papers published on the subject from 2014 through the time of this writing (2017).
  • 3. Deep learning • Deep learning is the science of training large artificial neural networks. Deep neural networks (DNNs) can have hundreds of millions of parameters, allowing them to model complex functions such as nonlinear dynamics.
  • 4. History • Several important advances have slowly transformed regression into what we now call deep learning. First, the addition of an activation function enabled regression methods to fit to nonlinear functions, and It introduced biological similarity with brain cells. • Next, nonlinear models were stacked in “layers” to create powerful models, called multi-layer perceptrons (MLP).
  • 5. History • Multi-layer perceptrons are universal function approximators, meaning they could fit to any data, no matter how complex, with arbitrary precision, using a finite number of regression units. • Backpropagation marked the beginning of the deep learning revolution; however, researchers still mostly limited their neural networks to a few layers because of the problem of vanishing gradients
  • 6. Application in Robotics • Neural networks were successfully applied for robotics control as early as the 1980s. It was quickly recognized that nonlinear regression provided the functionality that was needed for operating dynamical systems in continuous spaces
  • 7. Biorobotics and Neural networks • In 2008, neuroscientists made advances in recognizing how animals achieved locomotion, and were able to extend this knowledge to neural networks for experimental control of biomimetic robots Infinite Degree of Freedom discretization • In the soft robotics field new techniques are needed for the control of continuous systems with high number of DOFs
  • 8. Structure A: MLP as function approximator • DNNs are well suited for use with robots because they are flexible and can be used in structures that other machine learning models cannot support. • MLP are trained by presenting a large collection of example training pairs: • An optimization method is applied to minimize the prediction loss Supervised
  • 9. Classification • This structures also excel at classification tasks, such as determining what type of object lies before the robot, which grasping approach or general planning strategy is best suited for current conditions, or what is the state of a certain complex object with which the robot is interacting.
  • 10. Parallel Computing: training DNNs • To make effective use of deep learning models, it is important to train on one or more General Purpose Graphical Processing Units (GPGPUs). Many other ways of parallelizing deep neural networks have been attempted, but none of them yet yield the performance gains of GPGPUs.
  • 11. Structure B: Autoencoders • Auto-encoders are used primarily in cases where high- dimensional observations are available, but the user wants a low-dimensional representation of state. • It is one common model for facilitating “unsupervised learning.” It requires two DNNs, called an “encoder” and a “decoder.” Unsupervised
  • 12. Structure C: Recurrent Neural Networks • They can keep track of the past thanks to feedback loops (discrete time non autonomous dynamical systems) • Structure C is a type of “recurrent neural network,” which is designed to model dynamical systems, including robots. It is often trained with an approach called “backpropagation through time” Supervised
  • 13. Structure D: Deep Reinforcement Learning • Deep reinforcement learning (DRL) uses deep learning and reinforcement learning principles to create efficient algorithms applied on areas like robotics, video games, healthcare, ecc… • Implementing deep learning architectures (deep neural networks) with reinforcement learning algorithms (Q-learning, actor critic, etc.) is capable of scaling to previously unsolvable problems.
  • 14. Exploration and exploitation • Instead of minimizing prediction error against a training set of samples, deep Q-networks seek to maximize long-term reward. • This is done through seeking a balance between exploration and exploitation that ultimately leads to an effective policy model.
  • 15. Biological analogy • Doya identified that supervised learning methods (Structures A and C) mirror the function of the cerebellum. • Unsupervised methods (Structure B) learn in a manner comparable to that of the cerebral cortex and reinforcement learning (Structure D) is analogous with the basal ganglia.
  • 16. What’s the point? • Every part of a complex system can be made to “learn”. • The real power of deep learning does not come from using just one of the structures described in the previous slides as a component in a robotics system, but in connecting parts of all these structures together to form a full system that learns throughout. • This is where the “deep” in deep learning begins to make its impact – when each part of a system is capable of learning, the system can adapt in sophisticated ways.
  • 17. Limits • Some remaining barriers to the adoption of deep learning in robotics include the necessity for large training data and long training times. One promising trend is crowdsourcing training data via cloud robotics. • Distributed computing offers the potential to direct more computing resources to a given problem but can be limited by communication speeds. • DNNs excel at 2D image recognition, but they are known to be highly susceptible to adversarial samples, and they still struggle to model 3D spatial layouts.
  • 18. Open challenges for the next years 1.Learning complex, high-dimensional, and novel dynamics 2.Learning control policies in dynamic environments 3.Advanced manipulation 4.Advanced object recognition 5.Interpreting and anticipating human actions (next slides) 6.Sensor fusion & dimensionality reduction 7.High-level task planning
  • 19. Robot gains Social Intelligence through Multimodal Deep Reinforcement Learning Authors Ahmed Hussain Qureshi, Yutaka Nakamura, Yuichiro Yoshikawa and Hiroshi Ishiguro
  • 20. Pepper Robot • Designed to be used in professional environments, Pepper is a humanoid robot that can interact with people, ‘read’ emotions, learn, move and adapt to its environment, and even recharge on its own. Pepper can perform facial recognition and develop individualized relationships when it interacts with people. • The authors propose a Multimodal Deep Q-Network (MDQN) to enable a robot to learn human-like interaction skills through a trial and error method.
  • 21. Reinforcement Learning background • An agent interacts sequentially with an environment E with an aim of maximizing cumulative reward. • At each time-step, the agent observes a state 𝐒𝒕, takes an action at from the set of legal actions 𝑨 = {𝟏,· · · , 𝑲} and receives a scalar reward 𝑹𝒕 from the environment. • An agent’s behavior is formalized by a policy π, which maps states to actions. • The goal of a RL agent is to learn a policy π that maximizes the expected total return (reward)
  • 22. Deep Q-network • Further advancements in machine learning have merged deep learning with reinforcement learning (RL) which has led to the development of the deep Q-network (DQN) • DQN utilizes an automatic feature extractor called deep convolutional neural network (Convnets) to approximate the action-value function of Q-learning method
  • 23. CNN for action-value function approximation • The structure of the two streams is identical and each stream comprises of eight layers (excluding the input layer). • Since each stream takes eight frames as an input, therefore, the last eight frames from the corresponding camera are pre-processed and stacked together to form the input for each stream of the network.
  • 24. Multimodal Deep Q-Network (MDQN) • The dual stream convnets process the depth and grayscale images independently • The robot learns to greet people using a set of four legal actions, i.e., waiting, looking towards human, waving hand and handshaking. • The objective of the robot is to learn which action to perform in each situation.
  • 25. Reward and action-value function • The expected total return is the sum of rewards discounted by factor 𝜸: [𝟎, 𝟏] at each time-step (𝛾 = 0.99 for the proposed work) • Given that the optimal Q-function 𝑸′(𝒔’, 𝒂’) of the sequence 𝒔’ at next time-step is deterministic for all possible actions 𝒂’, the optimal policy is to select an action 𝒂’ that maximizes the expected value of: 𝐫 + 𝐐′ 𝐬’, 𝐚’ • In DQN, the parameters of the Q-network are adjusted iteratively towards the Bellman target by minimizing the following loss function:
  • 26. Parameters and agent behavior • The current parameters are updated by stochastic gradient descent in the direction of the gradient of the loss function with respect to the parameters • The agent’s behavior at each time-step is selected by an ε-greedy policy where the greedy strategy is adopted with probability (1−ε) while the random strategy with probability ε. • The robot gets a reward of 1 on the successful handshake, -0.1 on an unsuccessful handshake and 0 for the rest of the three actions.
  • 27. Proposed algorithm • Data generation phase: the system interacts with the environment using Q-network 𝑄(𝑠, 𝑎; 𝜃). The system observes the current scene, which comprises of grayscale and depth frames, and takes an action using the 𝜺-greedy strategy. The environment in return provides the scalar reward. The interaction experience 𝑒 = (𝑠𝑖, 𝑎𝑖, 𝑟𝑖, 𝑠𝑖 + 1) is stored in the replay memory 𝑴. • Training phase: the system utilizes the collected data, stored in replay memory 𝑴, for training the networks. The hyperparameter 𝒏 denotes the number of experience replay. For each experience replay, a mini buffer 𝑩 of size 2000 interaction experiences is randomly sampled from the finite sized replay memory M. The model is trained on the mini batches sampled from buffer B and the network parameters are updated iteratively.
  • 28. Evaluation • For testing the model performance, a separate test dataset, comprising 4480 grayscale and depth frames not seen by the system during learning was collected. • If the agent’s decision was considered wrong by the majority, then the evaluators were asked to consent on the most appropriate action for that scenario.
  • 29. Results • The authors evaluated the trained y-channel Q-network, depth-channel Q-network and the MDQN on the test dataset; table 1 summarizes the performance measures of these trained Q-networks. In table 1, accuracy corresponds to how often the predictions by the Q-networks were correct. • The multimodal deep Q-network achieved maximum accuracy of 95.3 %, whereas the y-channel and the depth- channel of Q-networks achieved 85.9% and 82.6% accuracy, respectively. The results in table 1 validate that fusion of two streams improves the social cognitive ability of the agent.
  • 30. Performance • This figure shows the performance of MDQN on the test dataset over the series of episodes. The episode 0 on the plot corresponds to the Q-network with randomly initialized parameters. The plot indicates that the performance of MQDN agent on test dataset is continuously improving as the agent gets more and more interaction experience with humans.
  • 31. Conclusions • In social physical human-robot interaction, it is very difficult to envisage all the possible interaction scenarios which the robot can face in the real-world, hence programming a social robot is notoriously hard. • The MDQN-agent has learned to give importance to walking trajectories, head orientation, body language and the activity in progress in order to decide its best action. • Aims: i) increase the action space instead of limiting it to just four actions; ii) use recurrent attention model so that the robot can indicate its attention; iii) evaluate the influence of three actions, other than handshake, on the human behavior.
  • 33. References • Deep Learning in Robotics: A Review of Recent Research (Harry A. Pierson, Michael S. Gashler) • Robot gains Social Intelligence through Multimodal Deep Reinforcement Learning (Ahmed Hussain Qureshi, Yutaka Nakamura, Yuichiro Yoshikawa, Hiroshi Ishiguro)