SlideShare ist ein Scribd-Unternehmen logo
1 von 31
Department of Computer
Science and Engineering
IIT Kharagpur
Risk-sensitive Imitation Learning
Learning to Act like Humans, from Humans – Safely
21 Jan 2018
Anirban Santara
santara.github.io
Department of Computer
Science and Engineering
IIT Kharagpur
About me
Anirban Santara
Intel Student Ambassador for
AI (2018-Present)
Google India Ph.D. Fellow at
IIT Kharagpur (2015-Present)
B.Tech. in Electronics and
Electrical Communication
Engineering from IIT
Kharagpur in 2015
Department of Computer
Science and Engineering
IIT Kharagpur
Credits
• The work presented in this talk was done as a part of a year-long
internship at the Parallel Computing Lab, Intel Labs India
• The work was presented as a paper1 in the Deep Reinforcement
Learning Symposium at NIPS-2017
• I thank my collaborators:
• Abhishek Naik and Prof. Balaraman Ravindran from IIT Madras
• Dipankar Das, Dheevatsa Mudigere, Sasikanth Avancha and Bharat Kaul from
Intel Labs, India
1Santara,A.,Naik,A.,Ravindran,B.,Das,D.,Mudigere,D.,Avancha,S., Kaul, B., (2017). “RAIL: Risk-Averse
Imitation Learning”. In: Deep Reinforcement Learning Symposium at NIPS-2017.
Department of Computer
Science and Engineering
IIT Kharagpur
Description of the Imitation
Learning Problem
Department of Computer
Science and Engineering
IIT Kharagpur
Imitation Learning
Imitation Learning
techniques aim to mimic
human behavior at a given
task1
1 Hussein, Ahmed, et al. "Imitation Learning: A Survey
of Learning Methods." ACM Computing Surveys
(CSUR) 50.2 (2017): 21.
Image Source: GRASP lab - University of Pennsylvania
Department of Computer
Science and Engineering
IIT Kharagpur
Why should you care?
• Imitation learning methods are rooted in neuro-science and form an
important part of learning in humans
• Makes it possible to teach robots complex tasks with minimal expert
knowledge of the tasks
• No need for explicit programming or task-specific reward function design
• Its high time!
• Modern sensors are able to collect and transmit high volumes of data at high speed
• High performance computing is cheaper, more capable and ubiquitous than
ever
• Virtual Reality systems – that are considered the best portal of human-machine
interaction – are widely available
Department of Computer
Science and Engineering
IIT Kharagpur
Example Application Areas
Department of Computer
Science and Engineering
IIT Kharagpur
Autonomous Driving
No more accidents due to human error. No more traffic jams.
Department of Computer
Science and Engineering
IIT Kharagpur
Robotic Surgery
Complex Actions in Critical Situations – Accurate. Every time.
Department of Computer
Science and Engineering
IIT Kharagpur
Industrial Automation
Efficiency. Precise Quality Control. Safety.
Department of Computer
Science and Engineering
IIT Kharagpur
Assistive Robotics
Elderly Care. Rehabilitation. Special Needs.
Department of Computer
Science and Engineering
IIT Kharagpur
Conversational Agents
Assistance. Recommendation. Therapy.
Department of Computer
Science and Engineering
IIT Kharagpur
Problem Setting
Our Agent has to achieve its
goal by taking a sequence of
actions in an environment
whose states change in
response to the agent’s
actions.
ActionNew State
Environment
Agent
Department of Computer
Science and Engineering
IIT Kharagpur
Some Definitions
• Policy 𝜋: 𝑆 → 𝐴: A function that predicts actions for a given state
• Trajectory 𝜏: A sequence of (𝑠𝑡, 𝑎 𝑡) tuples that describe an episode of experiences
of an agent as it executes a policy.
𝜏 = 𝑠0, 𝑎0, 𝑠1, 𝑎1, … , 𝑠𝑡, 𝑎 𝑡, … , 𝑠 𝑇
Department of Computer
Science and Engineering
IIT Kharagpur
Problem Definition
• Given: a dataset of trajectories demonstrated by an expert:
where each trajectory is a sequence of states and actions:
• Goal: Find a policy 𝜋∗
that achieves “expert-like performance”
𝜏 𝑖 𝑖=1
𝑁
𝜏 = 𝑠0, 𝑎0, 𝑠1, 𝑎1, … , 𝑠𝑡, 𝑎 𝑡, … , 𝑠 𝑇
Department of Computer
Science and Engineering
IIT Kharagpur
Our Special Requirements
Scalable in large
environments for
complex continuous-
control tasks
Worst-case
performance
acceptable for risk-
sensitive applications
Department of Computer
Science and Engineering
IIT Kharagpur
Baseline System
Department of Computer
Science and Engineering
IIT Kharagpur
Generative Adversarial Imitation Learning (GAIL)
Ho and Ermon 2016
Generative Adversarial Imitation Learning (GAIL) enables an agent to directly learn a
policy from expert trajectories, as if it were obtained by Reinforcement Learning (RL)
following Inverse Reinforcement Learning (IRL)
Department of Computer
Science and Engineering
IIT Kharagpur
The Challenge
Department of Computer
Science and Engineering
IIT Kharagpur
Heavy Tail Problem of GAIL
We evaluated in terms of the expert’s cost function and found that the
distributions of trajectory cost are more heavy tailed for GAIL than the
expert.
Department of Computer
Science and Engineering
IIT Kharagpur
Implications
• GAIL agents encounter high-cost trajectories more often than the
experts.
• Since high trajectory-costs may correspond to events of catastrophic
failure, they are not reliable in risk-sensitive applications like robotic
surgery and autonomous driving.
Department of Computer
Science and Engineering
IIT Kharagpur
Our Solution
Santara,A.,Naik,A.,Ravindran,B.,Das,D.,Mudigere,D.,Avancha,S., Kaul, B., (2017). “RAIL: Risk-Averse
Imitation Learning”. In: Deep Reinforcement Learning Symposium at NIPS-2017.
Department of Computer
Science and Engineering
IIT Kharagpur
Conditional Value at Risk (CVaR)
Rockafellar, R. Tyrrell, and Stanislav Uryasev. "Optimization of conditional value-at-risk."
Journal of risk 2 (2000): 21-42.
1.0
𝛼
CDF
PDF
Z
Department of Computer
Science and Engineering
IIT Kharagpur
Conditional Value at Risk (CVaR)
(Rockafellar 2000)
Department of Computer
Science and Engineering
IIT Kharagpur
Risk of a Trajectory
Discounted sum of costs along a trajectory
Department of Computer
Science and Engineering
IIT Kharagpur
CVaR objective
Minimize the maximum possible value (over all choices of the cost
function) of CVaR
Department of Computer
Science and Engineering
IIT Kharagpur
RAIL: Risk-Averse Imitation Learning
Integrating the CVaR objective in the GAIL framework
Department of Computer
Science and Engineering
IIT Kharagpur
Results
• RAIL is a superior choice
than GAIL in risk-sensitive
applications
• RAIL converges almost as
fast as GAIL in mean
• RAIL preserves the
scalability of GAIL
Department of Computer
Science and Engineering
IIT Kharagpur
Powered by Intel
Department of Computer
Science and Engineering
IIT Kharagpur
Why Intel
• The Multiprocessing Python library along with Intel Math Kernel
Library (MKL) allow the simulation of multiple instances of an agent
interacting with the environment in parallel on Multi-core Intel CPUs.
• Parallel simulation and learning is crucial for success in Reinforcement
Learning based settings like RAIL – as the agent learns by trial and
error.
Department of Computer
Science and Engineering
IIT Kharagpur
Thank you!

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Problem Formulation in Artificial Inteligence Projects
Problem Formulation in Artificial Inteligence ProjectsProblem Formulation in Artificial Inteligence Projects
Problem Formulation in Artificial Inteligence Projects
 
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021
 
Incorporating Diversity in a Learning to Rank Recommender System
Incorporating Diversity in a Learning to Rank Recommender SystemIncorporating Diversity in a Learning to Rank Recommender System
Incorporating Diversity in a Learning to Rank Recommender System
 
Data Structures and Algorithm - Week 11 - Algorithm Analysis
Data Structures and Algorithm - Week 11 - Algorithm AnalysisData Structures and Algorithm - Week 11 - Algorithm Analysis
Data Structures and Algorithm - Week 11 - Algorithm Analysis
 
Machine learning Algorithms with a Sagemaker demo
Machine learning Algorithms with a Sagemaker demoMachine learning Algorithms with a Sagemaker demo
Machine learning Algorithms with a Sagemaker demo
 
Algorithms Design Patterns
Algorithms Design PatternsAlgorithms Design Patterns
Algorithms Design Patterns
 
Applied Machine Learning for Chemistry II (HSI2020)
Applied Machine Learning for Chemistry II (HSI2020)Applied Machine Learning for Chemistry II (HSI2020)
Applied Machine Learning for Chemistry II (HSI2020)
 
Data Structures and Algorithm - Week 8 - Minimum Spanning Trees
Data Structures and Algorithm - Week 8 - Minimum Spanning TreesData Structures and Algorithm - Week 8 - Minimum Spanning Trees
Data Structures and Algorithm - Week 8 - Minimum Spanning Trees
 
Exploration Strategies in Reinforcement Learning
Exploration Strategies in Reinforcement LearningExploration Strategies in Reinforcement Learning
Exploration Strategies in Reinforcement Learning
 
Optimization problems and algorithms
Optimization problems and  algorithmsOptimization problems and  algorithms
Optimization problems and algorithms
 
Data Structures and Algorithm - Week 9 - Search Algorithms
Data Structures and Algorithm - Week 9 - Search AlgorithmsData Structures and Algorithm - Week 9 - Search Algorithms
Data Structures and Algorithm - Week 9 - Search Algorithms
 
[SIGIR17] Learning to Rank Using Localized Geometric Mean Metrics
[SIGIR17] Learning to Rank Using Localized Geometric Mean Metrics[SIGIR17] Learning to Rank Using Localized Geometric Mean Metrics
[SIGIR17] Learning to Rank Using Localized Geometric Mean Metrics
 
Artificial Intelligence Searching Techniques
Artificial Intelligence Searching TechniquesArtificial Intelligence Searching Techniques
Artificial Intelligence Searching Techniques
 
Introduction to optimization technique
Introduction to optimization techniqueIntroduction to optimization technique
Introduction to optimization technique
 
Introduction to cyclical learning rates for training neural nets
Introduction to cyclical learning rates for training neural netsIntroduction to cyclical learning rates for training neural nets
Introduction to cyclical learning rates for training neural nets
 
Algorithms and Programming
Algorithms and ProgrammingAlgorithms and Programming
Algorithms and Programming
 
IRJET - House Price Predictor using ML through Artificial Neural Network
IRJET - House Price Predictor using ML through Artificial Neural NetworkIRJET - House Price Predictor using ML through Artificial Neural Network
IRJET - House Price Predictor using ML through Artificial Neural Network
 
Data Structures and Algorithm - Week 3 - Stacks and Queues
Data Structures and Algorithm - Week 3 - Stacks and QueuesData Structures and Algorithm - Week 3 - Stacks and Queues
Data Structures and Algorithm - Week 3 - Stacks and Queues
 
Data Structures and Algorithm - Week 5 - AVL Trees
Data Structures and Algorithm - Week 5 - AVL TreesData Structures and Algorithm - Week 5 - AVL Trees
Data Structures and Algorithm - Week 5 - AVL Trees
 
Data Structures and Algorithm - Week 4 - Trees, Binary Trees
Data Structures and Algorithm - Week 4 - Trees, Binary TreesData Structures and Algorithm - Week 4 - Trees, Binary Trees
Data Structures and Algorithm - Week 4 - Trees, Binary Trees
 

Ähnlich wie RAIL: Risk-Averse Imitation Learning | Invited talk at Intel AI Workshop at Kshitij 2018 | IIT Kharagpur

ajit resume 8 nov
ajit resume 8 novajit resume 8 nov
ajit resume 8 nov
Ajit Kumar
 
CV_ArnabAcharyya_8017017997
CV_ArnabAcharyya_8017017997CV_ArnabAcharyya_8017017997
CV_ArnabAcharyya_8017017997
Arnab Acharyya
 
Irfan'sResume (1)
Irfan'sResume (1)Irfan'sResume (1)
Irfan'sResume (1)
Irfan Ali
 
Sagar_Lachure_resume-new
Sagar_Lachure_resume-newSagar_Lachure_resume-new
Sagar_Lachure_resume-new
Sagar Lachure
 
AyanGhatak_Resume_Nov15
AyanGhatak_Resume_Nov15AyanGhatak_Resume_Nov15
AyanGhatak_Resume_Nov15
Ayan Ghatak
 

Ähnlich wie RAIL: Risk-Averse Imitation Learning | Invited talk at Intel AI Workshop at Kshitij 2018 | IIT Kharagpur (20)

ajit resume 8 nov
ajit resume 8 novajit resume 8 nov
ajit resume 8 nov
 
Ravi patel
Ravi patelRavi patel
Ravi patel
 
cv
cvcv
cv
 
Rajat_saxena_cv
Rajat_saxena_cvRajat_saxena_cv
Rajat_saxena_cv
 
CV_ArnabAcharyya_8017017997
CV_ArnabAcharyya_8017017997CV_ArnabAcharyya_8017017997
CV_ArnabAcharyya_8017017997
 
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
 
Irfan'sResume (1)
Irfan'sResume (1)Irfan'sResume (1)
Irfan'sResume (1)
 
Sagar_Lachure_resume-new
Sagar_Lachure_resume-newSagar_Lachure_resume-new
Sagar_Lachure_resume-new
 
IRJET- Intelligent Laboratory Management System based on Internet of Thin...
IRJET-  	  Intelligent Laboratory Management System based on Internet of Thin...IRJET-  	  Intelligent Laboratory Management System based on Internet of Thin...
IRJET- Intelligent Laboratory Management System based on Internet of Thin...
 
AyanGhatak_Resume_Nov15
AyanGhatak_Resume_Nov15AyanGhatak_Resume_Nov15
AyanGhatak_Resume_Nov15
 
Resume
Resume Resume
Resume
 
Resume jayasurya
Resume jayasuryaResume jayasurya
Resume jayasurya
 
Sai Pavan_IITM_resume
Sai Pavan_IITM_resumeSai Pavan_IITM_resume
Sai Pavan_IITM_resume
 
03_Optimization (1).pptx
03_Optimization (1).pptx03_Optimization (1).pptx
03_Optimization (1).pptx
 
Gouthammi
GouthammiGouthammi
Gouthammi
 
Resume bnkr
Resume bnkrResume bnkr
Resume bnkr
 
Resume ujjwal
Resume ujjwalResume ujjwal
Resume ujjwal
 
Water Quality Index Calculation of River Ganga using Decision Tree Algorithm
Water Quality Index Calculation of River Ganga using Decision Tree AlgorithmWater Quality Index Calculation of River Ganga using Decision Tree Algorithm
Water Quality Index Calculation of River Ganga using Decision Tree Algorithm
 
IRJET-3D Object Tracking and Manipulation in Augmented Reality
IRJET-3D Object Tracking and Manipulation in Augmented RealityIRJET-3D Object Tracking and Manipulation in Augmented Reality
IRJET-3D Object Tracking and Manipulation in Augmented Reality
 
Ssnit kanhangad
Ssnit kanhangadSsnit kanhangad
Ssnit kanhangad
 

Kürzlich hochgeladen

Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Silpa
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
Silpa
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
seri bangash
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
Scintica Instrumentation
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
Silpa
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
Silpa
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 

Kürzlich hochgeladen (20)

Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Genetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsGenetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditions
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdf
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 

RAIL: Risk-Averse Imitation Learning | Invited talk at Intel AI Workshop at Kshitij 2018 | IIT Kharagpur

  • 1. Department of Computer Science and Engineering IIT Kharagpur Risk-sensitive Imitation Learning Learning to Act like Humans, from Humans – Safely 21 Jan 2018 Anirban Santara santara.github.io
  • 2. Department of Computer Science and Engineering IIT Kharagpur About me Anirban Santara Intel Student Ambassador for AI (2018-Present) Google India Ph.D. Fellow at IIT Kharagpur (2015-Present) B.Tech. in Electronics and Electrical Communication Engineering from IIT Kharagpur in 2015
  • 3. Department of Computer Science and Engineering IIT Kharagpur Credits • The work presented in this talk was done as a part of a year-long internship at the Parallel Computing Lab, Intel Labs India • The work was presented as a paper1 in the Deep Reinforcement Learning Symposium at NIPS-2017 • I thank my collaborators: • Abhishek Naik and Prof. Balaraman Ravindran from IIT Madras • Dipankar Das, Dheevatsa Mudigere, Sasikanth Avancha and Bharat Kaul from Intel Labs, India 1Santara,A.,Naik,A.,Ravindran,B.,Das,D.,Mudigere,D.,Avancha,S., Kaul, B., (2017). “RAIL: Risk-Averse Imitation Learning”. In: Deep Reinforcement Learning Symposium at NIPS-2017.
  • 4. Department of Computer Science and Engineering IIT Kharagpur Description of the Imitation Learning Problem
  • 5. Department of Computer Science and Engineering IIT Kharagpur Imitation Learning Imitation Learning techniques aim to mimic human behavior at a given task1 1 Hussein, Ahmed, et al. "Imitation Learning: A Survey of Learning Methods." ACM Computing Surveys (CSUR) 50.2 (2017): 21. Image Source: GRASP lab - University of Pennsylvania
  • 6. Department of Computer Science and Engineering IIT Kharagpur Why should you care? • Imitation learning methods are rooted in neuro-science and form an important part of learning in humans • Makes it possible to teach robots complex tasks with minimal expert knowledge of the tasks • No need for explicit programming or task-specific reward function design • Its high time! • Modern sensors are able to collect and transmit high volumes of data at high speed • High performance computing is cheaper, more capable and ubiquitous than ever • Virtual Reality systems – that are considered the best portal of human-machine interaction – are widely available
  • 7. Department of Computer Science and Engineering IIT Kharagpur Example Application Areas
  • 8. Department of Computer Science and Engineering IIT Kharagpur Autonomous Driving No more accidents due to human error. No more traffic jams.
  • 9. Department of Computer Science and Engineering IIT Kharagpur Robotic Surgery Complex Actions in Critical Situations – Accurate. Every time.
  • 10. Department of Computer Science and Engineering IIT Kharagpur Industrial Automation Efficiency. Precise Quality Control. Safety.
  • 11. Department of Computer Science and Engineering IIT Kharagpur Assistive Robotics Elderly Care. Rehabilitation. Special Needs.
  • 12. Department of Computer Science and Engineering IIT Kharagpur Conversational Agents Assistance. Recommendation. Therapy.
  • 13. Department of Computer Science and Engineering IIT Kharagpur Problem Setting Our Agent has to achieve its goal by taking a sequence of actions in an environment whose states change in response to the agent’s actions. ActionNew State Environment Agent
  • 14. Department of Computer Science and Engineering IIT Kharagpur Some Definitions • Policy 𝜋: 𝑆 → 𝐴: A function that predicts actions for a given state • Trajectory 𝜏: A sequence of (𝑠𝑡, 𝑎 𝑡) tuples that describe an episode of experiences of an agent as it executes a policy. 𝜏 = 𝑠0, 𝑎0, 𝑠1, 𝑎1, … , 𝑠𝑡, 𝑎 𝑡, … , 𝑠 𝑇
  • 15. Department of Computer Science and Engineering IIT Kharagpur Problem Definition • Given: a dataset of trajectories demonstrated by an expert: where each trajectory is a sequence of states and actions: • Goal: Find a policy 𝜋∗ that achieves “expert-like performance” 𝜏 𝑖 𝑖=1 𝑁 𝜏 = 𝑠0, 𝑎0, 𝑠1, 𝑎1, … , 𝑠𝑡, 𝑎 𝑡, … , 𝑠 𝑇
  • 16. Department of Computer Science and Engineering IIT Kharagpur Our Special Requirements Scalable in large environments for complex continuous- control tasks Worst-case performance acceptable for risk- sensitive applications
  • 17. Department of Computer Science and Engineering IIT Kharagpur Baseline System
  • 18. Department of Computer Science and Engineering IIT Kharagpur Generative Adversarial Imitation Learning (GAIL) Ho and Ermon 2016 Generative Adversarial Imitation Learning (GAIL) enables an agent to directly learn a policy from expert trajectories, as if it were obtained by Reinforcement Learning (RL) following Inverse Reinforcement Learning (IRL)
  • 19. Department of Computer Science and Engineering IIT Kharagpur The Challenge
  • 20. Department of Computer Science and Engineering IIT Kharagpur Heavy Tail Problem of GAIL We evaluated in terms of the expert’s cost function and found that the distributions of trajectory cost are more heavy tailed for GAIL than the expert.
  • 21. Department of Computer Science and Engineering IIT Kharagpur Implications • GAIL agents encounter high-cost trajectories more often than the experts. • Since high trajectory-costs may correspond to events of catastrophic failure, they are not reliable in risk-sensitive applications like robotic surgery and autonomous driving.
  • 22. Department of Computer Science and Engineering IIT Kharagpur Our Solution Santara,A.,Naik,A.,Ravindran,B.,Das,D.,Mudigere,D.,Avancha,S., Kaul, B., (2017). “RAIL: Risk-Averse Imitation Learning”. In: Deep Reinforcement Learning Symposium at NIPS-2017.
  • 23. Department of Computer Science and Engineering IIT Kharagpur Conditional Value at Risk (CVaR) Rockafellar, R. Tyrrell, and Stanislav Uryasev. "Optimization of conditional value-at-risk." Journal of risk 2 (2000): 21-42. 1.0 𝛼 CDF PDF Z
  • 24. Department of Computer Science and Engineering IIT Kharagpur Conditional Value at Risk (CVaR) (Rockafellar 2000)
  • 25. Department of Computer Science and Engineering IIT Kharagpur Risk of a Trajectory Discounted sum of costs along a trajectory
  • 26. Department of Computer Science and Engineering IIT Kharagpur CVaR objective Minimize the maximum possible value (over all choices of the cost function) of CVaR
  • 27. Department of Computer Science and Engineering IIT Kharagpur RAIL: Risk-Averse Imitation Learning Integrating the CVaR objective in the GAIL framework
  • 28. Department of Computer Science and Engineering IIT Kharagpur Results • RAIL is a superior choice than GAIL in risk-sensitive applications • RAIL converges almost as fast as GAIL in mean • RAIL preserves the scalability of GAIL
  • 29. Department of Computer Science and Engineering IIT Kharagpur Powered by Intel
  • 30. Department of Computer Science and Engineering IIT Kharagpur Why Intel • The Multiprocessing Python library along with Intel Math Kernel Library (MKL) allow the simulation of multiple instances of an agent interacting with the environment in parallel on Multi-core Intel CPUs. • Parallel simulation and learning is crucial for success in Reinforcement Learning based settings like RAIL – as the agent learns by trial and error.
  • 31. Department of Computer Science and Engineering IIT Kharagpur Thank you!