RAIL: Risk-Averse Imitation Learning | Invited talk at Intel AI Workshop at Kshitij 2018 | IIT Kharagpur

Department of Computer
Science and Engineering
IIT Kharagpur
Risk-sensitive Imitation Learning
Learning to Act like Humans, from Humans – Safely
21 Jan 2018
Anirban Santara
santara.github.io

IIT Kharagpur
About me
Anirban Santara
Intel Student Ambassador for
AI (2018-Present)
Google India Ph.D. Fellow at
IIT Kharagpur (2015-Present)
B.Tech. in Electronics and
Electrical Communication
Engineering from IIT
Kharagpur in 2015

IIT Kharagpur
Credits
• The work presented in this talk was done as a part of a year-long
internship at the Parallel Computing Lab, Intel Labs India
• The work was presented as a paper1 in the Deep Reinforcement
Learning Symposium at NIPS-2017
• I thank my collaborators:
• Abhishek Naik and Prof. Balaraman Ravindran from IIT Madras
• Dipankar Das, Dheevatsa Mudigere, Sasikanth Avancha and Bharat Kaul from
Intel Labs, India
1Santara,A.,Naik,A.,Ravindran,B.,Das,D.,Mudigere,D.,Avancha,S., Kaul, B., (2017). “RAIL: Risk-Averse
Imitation Learning”. In: Deep Reinforcement Learning Symposium at NIPS-2017.

IIT Kharagpur
Description of the Imitation
Learning Problem

IIT Kharagpur
Imitation Learning
Imitation Learning
techniques aim to mimic
human behavior at a given
task1
1 Hussein, Ahmed, et al. "Imitation Learning: A Survey
of Learning Methods." ACM Computing Surveys
(CSUR) 50.2 (2017): 21.
Image Source: GRASP lab - University of Pennsylvania

IIT Kharagpur
Why should you care?
• Imitation learning methods are rooted in neuro-science and form an
important part of learning in humans
• Makes it possible to teach robots complex tasks with minimal expert
knowledge of the tasks
• No need for explicit programming or task-specific reward function design
• Its high time!
• Modern sensors are able to collect and transmit high volumes of data at high speed
• High performance computing is cheaper, more capable and ubiquitous than
ever
• Virtual Reality systems – that are considered the best portal of human-machine
interaction – are widely available

IIT Kharagpur
Example Application Areas

IIT Kharagpur
Autonomous Driving
No more accidents due to human error. No more traffic jams.

IIT Kharagpur
Robotic Surgery
Complex Actions in Critical Situations – Accurate. Every time.

IIT Kharagpur
Industrial Automation
Efficiency. Precise Quality Control. Safety.

IIT Kharagpur
Assistive Robotics
Elderly Care. Rehabilitation. Special Needs.

IIT Kharagpur
Conversational Agents
Assistance. Recommendation. Therapy.

IIT Kharagpur
Problem Setting
Our Agent has to achieve its
goal by taking a sequence of
actions in an environment
whose states change in
response to the agent’s
actions.
ActionNew State
Environment
Agent

IIT Kharagpur
Some Definitions
• Policy 𝜋: 𝑆 → 𝐴: A function that predicts actions for a given state
• Trajectory 𝜏: A sequence of (𝑠𝑡, 𝑎 𝑡) tuples that describe an episode of experiences
of an agent as it executes a policy.
𝜏 = 𝑠0, 𝑎0, 𝑠1, 𝑎1, … , 𝑠𝑡, 𝑎 𝑡, … , 𝑠 𝑇

IIT Kharagpur
Problem Definition
• Given: a dataset of trajectories demonstrated by an expert:
where each trajectory is a sequence of states and actions:
• Goal: Find a policy 𝜋∗
that achieves “expert-like performance”
𝜏 𝑖 𝑖=1
𝑁
𝜏 = 𝑠0, 𝑎0, 𝑠1, 𝑎1, … , 𝑠𝑡, 𝑎 𝑡, … , 𝑠 𝑇

IIT Kharagpur
Our Special Requirements
Scalable in large
environments for
complex continuous-
control tasks
Worst-case
performance
acceptable for risk-
sensitive applications

IIT Kharagpur
Baseline System

IIT Kharagpur
Generative Adversarial Imitation Learning (GAIL)
Ho and Ermon 2016
Generative Adversarial Imitation Learning (GAIL) enables an agent to directly learn a
policy from expert trajectories, as if it were obtained by Reinforcement Learning (RL)
following Inverse Reinforcement Learning (IRL)

IIT Kharagpur
The Challenge

IIT Kharagpur
Heavy Tail Problem of GAIL
We evaluated in terms of the expert’s cost function and found that the
distributions of trajectory cost are more heavy tailed for GAIL than the
expert.

IIT Kharagpur
Implications
• GAIL agents encounter high-cost trajectories more often than the
experts.
• Since high trajectory-costs may correspond to events of catastrophic
failure, they are not reliable in risk-sensitive applications like robotic
surgery and autonomous driving.

IIT Kharagpur
Our Solution
Santara,A.,Naik,A.,Ravindran,B.,Das,D.,Mudigere,D.,Avancha,S., Kaul, B., (2017). “RAIL: Risk-Averse
Imitation Learning”. In: Deep Reinforcement Learning Symposium at NIPS-2017.

IIT Kharagpur
Conditional Value at Risk (CVaR)
Rockafellar, R. Tyrrell, and Stanislav Uryasev. "Optimization of conditional value-at-risk."
Journal of risk 2 (2000): 21-42.
1.0
𝛼
CDF
PDF
Z

IIT Kharagpur
Conditional Value at Risk (CVaR)
(Rockafellar 2000)

IIT Kharagpur
Risk of a Trajectory
Discounted sum of costs along a trajectory

IIT Kharagpur
CVaR objective
Minimize the maximum possible value (over all choices of the cost
function) of CVaR

IIT Kharagpur
RAIL: Risk-Averse Imitation Learning
Integrating the CVaR objective in the GAIL framework

IIT Kharagpur
Results
• RAIL is a superior choice
than GAIL in risk-sensitive
applications
• RAIL converges almost as
fast as GAIL in mean
• RAIL preserves the
scalability of GAIL

IIT Kharagpur
Powered by Intel

IIT Kharagpur
Why Intel
• The Multiprocessing Python library along with Intel Math Kernel
Library (MKL) allow the simulation of multiple instances of an agent
interacting with the environment in parallel on Multi-core Intel CPUs.
• Parallel simulation and learning is crucial for success in Reinforcement
Learning based settings like RAIL – as the agent learns by trial and
error.

IIT Kharagpur
Thank you!

RAIL: Risk-Averse Imitation Learning | Invited talk at Intel AI Workshop at Kshitij 2018 | IIT Kharagpur

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie RAIL: Risk-Averse Imitation Learning | Invited talk at Intel AI Workshop at Kshitij 2018 | IIT Kharagpur

Ähnlich wie RAIL: Risk-Averse Imitation Learning | Invited talk at Intel AI Workshop at Kshitij 2018 | IIT Kharagpur (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

RAIL: Risk-Averse Imitation Learning | Invited talk at Intel AI Workshop at Kshitij 2018 | IIT Kharagpur