Anirban Santara presented a technique called Risk-Averse Imitation Learning (RAIL) to address the problem of Generative Adversarial Imitation Learning (GAIL) producing policies with riskier, heavier-tailed cost distributions compared to the expert. RAIL integrates the Conditional Value at Risk (CVaR) objective into the GAIL framework to directly optimize for worst-case performance. Experimental results showed that RAIL converges nearly as fast as GAIL in the average case, while producing policies that more closely match the expert's risk profile, making it preferable for risk-sensitive applications like robotics.
Module for Grade 9 for Asynchronous/Distance learning
RAIL: Risk-Averse Imitation Learning | Invited talk at Intel AI Workshop at Kshitij 2018 | IIT Kharagpur
1. Department of Computer
Science and Engineering
IIT Kharagpur
Risk-sensitive Imitation Learning
Learning to Act like Humans, from Humans – Safely
21 Jan 2018
Anirban Santara
santara.github.io
2. Department of Computer
Science and Engineering
IIT Kharagpur
About me
Anirban Santara
Intel Student Ambassador for
AI (2018-Present)
Google India Ph.D. Fellow at
IIT Kharagpur (2015-Present)
B.Tech. in Electronics and
Electrical Communication
Engineering from IIT
Kharagpur in 2015
3. Department of Computer
Science and Engineering
IIT Kharagpur
Credits
• The work presented in this talk was done as a part of a year-long
internship at the Parallel Computing Lab, Intel Labs India
• The work was presented as a paper1 in the Deep Reinforcement
Learning Symposium at NIPS-2017
• I thank my collaborators:
• Abhishek Naik and Prof. Balaraman Ravindran from IIT Madras
• Dipankar Das, Dheevatsa Mudigere, Sasikanth Avancha and Bharat Kaul from
Intel Labs, India
1Santara,A.,Naik,A.,Ravindran,B.,Das,D.,Mudigere,D.,Avancha,S., Kaul, B., (2017). “RAIL: Risk-Averse
Imitation Learning”. In: Deep Reinforcement Learning Symposium at NIPS-2017.
5. Department of Computer
Science and Engineering
IIT Kharagpur
Imitation Learning
Imitation Learning
techniques aim to mimic
human behavior at a given
task1
1 Hussein, Ahmed, et al. "Imitation Learning: A Survey
of Learning Methods." ACM Computing Surveys
(CSUR) 50.2 (2017): 21.
Image Source: GRASP lab - University of Pennsylvania
6. Department of Computer
Science and Engineering
IIT Kharagpur
Why should you care?
• Imitation learning methods are rooted in neuro-science and form an
important part of learning in humans
• Makes it possible to teach robots complex tasks with minimal expert
knowledge of the tasks
• No need for explicit programming or task-specific reward function design
• Its high time!
• Modern sensors are able to collect and transmit high volumes of data at high speed
• High performance computing is cheaper, more capable and ubiquitous than
ever
• Virtual Reality systems – that are considered the best portal of human-machine
interaction – are widely available
13. Department of Computer
Science and Engineering
IIT Kharagpur
Problem Setting
Our Agent has to achieve its
goal by taking a sequence of
actions in an environment
whose states change in
response to the agent’s
actions.
ActionNew State
Environment
Agent
14. Department of Computer
Science and Engineering
IIT Kharagpur
Some Definitions
• Policy 𝜋: 𝑆 → 𝐴: A function that predicts actions for a given state
• Trajectory 𝜏: A sequence of (𝑠𝑡, 𝑎 𝑡) tuples that describe an episode of experiences
of an agent as it executes a policy.
𝜏 = 𝑠0, 𝑎0, 𝑠1, 𝑎1, … , 𝑠𝑡, 𝑎 𝑡, … , 𝑠 𝑇
15. Department of Computer
Science and Engineering
IIT Kharagpur
Problem Definition
• Given: a dataset of trajectories demonstrated by an expert:
where each trajectory is a sequence of states and actions:
• Goal: Find a policy 𝜋∗
that achieves “expert-like performance”
𝜏 𝑖 𝑖=1
𝑁
𝜏 = 𝑠0, 𝑎0, 𝑠1, 𝑎1, … , 𝑠𝑡, 𝑎 𝑡, … , 𝑠 𝑇
16. Department of Computer
Science and Engineering
IIT Kharagpur
Our Special Requirements
Scalable in large
environments for
complex continuous-
control tasks
Worst-case
performance
acceptable for risk-
sensitive applications
18. Department of Computer
Science and Engineering
IIT Kharagpur
Generative Adversarial Imitation Learning (GAIL)
Ho and Ermon 2016
Generative Adversarial Imitation Learning (GAIL) enables an agent to directly learn a
policy from expert trajectories, as if it were obtained by Reinforcement Learning (RL)
following Inverse Reinforcement Learning (IRL)
20. Department of Computer
Science and Engineering
IIT Kharagpur
Heavy Tail Problem of GAIL
We evaluated in terms of the expert’s cost function and found that the
distributions of trajectory cost are more heavy tailed for GAIL than the
expert.
21. Department of Computer
Science and Engineering
IIT Kharagpur
Implications
• GAIL agents encounter high-cost trajectories more often than the
experts.
• Since high trajectory-costs may correspond to events of catastrophic
failure, they are not reliable in risk-sensitive applications like robotic
surgery and autonomous driving.
22. Department of Computer
Science and Engineering
IIT Kharagpur
Our Solution
Santara,A.,Naik,A.,Ravindran,B.,Das,D.,Mudigere,D.,Avancha,S., Kaul, B., (2017). “RAIL: Risk-Averse
Imitation Learning”. In: Deep Reinforcement Learning Symposium at NIPS-2017.
23. Department of Computer
Science and Engineering
IIT Kharagpur
Conditional Value at Risk (CVaR)
Rockafellar, R. Tyrrell, and Stanislav Uryasev. "Optimization of conditional value-at-risk."
Journal of risk 2 (2000): 21-42.
1.0
𝛼
CDF
PDF
Z
25. Department of Computer
Science and Engineering
IIT Kharagpur
Risk of a Trajectory
Discounted sum of costs along a trajectory
26. Department of Computer
Science and Engineering
IIT Kharagpur
CVaR objective
Minimize the maximum possible value (over all choices of the cost
function) of CVaR
27. Department of Computer
Science and Engineering
IIT Kharagpur
RAIL: Risk-Averse Imitation Learning
Integrating the CVaR objective in the GAIL framework
28. Department of Computer
Science and Engineering
IIT Kharagpur
Results
• RAIL is a superior choice
than GAIL in risk-sensitive
applications
• RAIL converges almost as
fast as GAIL in mean
• RAIL preserves the
scalability of GAIL
30. Department of Computer
Science and Engineering
IIT Kharagpur
Why Intel
• The Multiprocessing Python library along with Intel Math Kernel
Library (MKL) allow the simulation of multiple instances of an agent
interacting with the environment in parallel on Multi-core Intel CPUs.
• Parallel simulation and learning is crucial for success in Reinforcement
Learning based settings like RAIL – as the agent learns by trial and
error.