SlideShare ist ein Scribd-Unternehmen logo
1 von 38
Downloaden Sie, um offline zu lesen
Autonomous Driving and Reinforcement Learning – an Introduction
Michael Bosello
michael.bosello@studio.unibo.it
Universit`a di Bologna – Department of Computer Science and Engineering, Cesena, Italy
Smart Vehicular Systems A.Y. 2019/2020
Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 1 / 30
Outline
1 The Driving Problem
Single task approach
End-to-end approach
Pitfalls of Simulations
2 Reinforcement Learning Basic Concepts
Problem Definition
Learning a Policy
3 Example of a Driving Agent: DQN
Neural Network and Training Cycle
Issues and Solutions
Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 2 / 30
The Driving Problem
Next in Line...
1 The Driving Problem
Single task approach
End-to-end approach
Pitfalls of Simulations
2 Reinforcement Learning Basic Concepts
Problem Definition
Learning a Policy
3 Example of a Driving Agent: DQN
Neural Network and Training Cycle
Issues and Solutions
Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 3 / 30
The Driving Problem
Self-Driving I
Goals
Reach the destination
Avoid dangerous states
Provide comfort to passengers (e.g. avoid oscillations, sudden moves)
Settings
Multi-agent problem
One have to consider others
Interaction is both
Cooperative, all the agents desire safety
Competitive, each agent has its own goal
Environment
Non-deterministic
Partially observable
Dynamic
Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 4 / 30
The Driving Problem
Self-Driving II
Three sub-problems
Recognition Identify environment’s components
Prediction Predict the evolution of the surrounding
Decision making Take decisions and act to pursue the goal
Two approaches
Single task handling
Use human ingenuity to inject knowledge about the domain
End-to-end
Let the algorithm optimize towards the final goal without constrains
Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 5 / 30
The Driving Problem
Self-Driving III
Figure: Modules of autonomous driving systems (source [Talpaert et al., 2019] - modified)
Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 6 / 30
The Driving Problem
Sensing & Control
Sensing
Which sensors?
Single/multiple
Sensor expense
Operating conditions
Control
Continuous – steering, acceleration, braking
Discrete – forward, right, left, brake
Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 7 / 30
The Driving Problem Single task approach
Recognition
Convolutional Neural Networks (CNNs)
YOLO [Redmon et al., 2015]
Computer Vision techniques for object detection
Feature based
Local descriptor based
Mixed
Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 8 / 30
The Driving Problem Single task approach
Prediction
Model-based
Physics-based
Maneuver-based
Interaction-aware
Data-driven
Recurrent NN (like Long-Short Term Memory LSTM)
Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 9 / 30
The Driving Problem Single task approach
Decision Making
Planning/Tree Search/Optimization
Agent-oriented programming (like the BDI model)
Deep Learning (DL)
Reinforcement Learning (RL)
Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 10 / 30
The Driving Problem End-to-end approach
End-to-end
Supervised Learning
Based on imitation
Current approach by major car manufacturer
Robotic drivers are expected to be perfect
Drawbacks
Training data Huge amounts of labeled data or human effort
Covering all possible driving scenarios is very hard
Unsupervised Learning
Learns behaviors by trial-and-error
like a young human driving student
Does not require explicit supervision from humans.
Mainly used on simulated environment
to avoid consequences in real-life
Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 11 / 30
The Driving Problem Pitfalls of Simulations
Simulations
Keep in mind
An agent trained in a virtual environment will not perform well in the real setting
For cameras: different visual appearance, especially for textures
You need some kind of transfer learning or conversion of images/data
Always enable noises of sensors/actuators
Some “grand truth” sensors present in some simulators aren’t available in real life
For example, in TORCS (a circuit simulator popular in the ML field) you can have exact
distances from other cars and from the borders. Too easy...
Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 12 / 30
Reinforcement Learning Basic Concepts
Next in Line...
1 The Driving Problem
Single task approach
End-to-end approach
Pitfalls of Simulations
2 Reinforcement Learning Basic Concepts
Problem Definition
Learning a Policy
3 Example of a Driving Agent: DQN
Neural Network and Training Cycle
Issues and Solutions
Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 13 / 30
Reinforcement Learning Basic Concepts Problem Definition
Agent-environment interaction
In RL, an agent learns how to fulfill a task by interacting with its environment
The interaction between the agent and the environment can be reduced to:
State Every information about the environment useful to predict the future
Action What the agent do
Reward A real number that Indicates how well the agent is doing
Figure: At each time step t the agent perform an action At and receives the couple St+1, Rt+1
(source [Sutton and Barto, 2018])
Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 14 / 30
Reinforcement Learning Basic Concepts Problem Definition
Terminology
Episode
A complete run from a starting state to a final state
Episodic task A task that has an end
Continuing task A task that goes on without limit
We can split it e.g. by setting a maximum number of steps
Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 15 / 30
Reinforcement Learning Basic Concepts Problem Definition
Markov Decision Process (MDP)
A RL problem can be formalized as a MDP. The 4-tuple:
S set of states
As set of actions available in state s
P(s |s, a) state-transition probabilities (probability to reach s given s and a)
R(s, s ) reward distribution associate to state transitions
Figure: Example of a simple MDP (source Wikipedia)
Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 16 / 30
Reinforcement Learning Basic Concepts Problem Definition
Environment features
MDPs can deal with non-deterministic environments
(because they have probability distributions)
But we need to cope with observability
Markov Property – Full observability
Classical RL algorithms rely on the Markov property
A state satisfy the property if it includes information about all aspects of the (past)
agent-environment interaction that make a difference for the future
From states to observations – Partial observability
Sometimes, an agent has only a partial view of the world state
The agent gets information (observations) about some aspects of the environment
Abusing the language, observations/history are called “state” and symbolized St
How to deal with partial observability?
Approximation methods (e.g. NN) doesn’t rely on the Markov property
Sometimes we can reconstruct a Markov state using a history
Stacking the last n states
Using a RNN (e.g. LSTM)
Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 17 / 30
Reinforcement Learning Basic Concepts Learning a Policy
Learning a Policy I
We want the agent to learn a strategy i.e. a policy
Policy π
A map from states to actions used by the agent to choose the next action
In particular, we are searching for the optimal policy
→ The policy that maximize the cumulative reward
Cumulative reward
The discounted sum of rewards over time
R =
n
t=0
γt
rt+1 | 0 ≤ γ ≤ 1
For larger values of γ, the agent focuses more about the long term rewards
For smaller values of γ, the agent focuses more about the immediate rewards
What does this mean?
The actions of the agent affect its possibilities later in time
It could be better to choose an action that led to a better state than an action that
gives you a greater reward but led to a worse state
Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 18 / 30
Reinforcement Learning Basic Concepts Learning a Policy
Learning a Policy II
Problem
we don’t know the MDP model (state transition probabilities and reward distribution)
Solution
Monte Carlo approach: let’s make estimations based on experience
A way to produce a policy is to estimate the value (or action-value) function
given the function, the agent needs only to perform the action with the greatest value
Expected return E[Gt]
The expectation of cumulative reward i.e. our current estimation
Value function Vπ(s) = E[Gt|St = s]
(state) → expected return when starting in s and following π thereafter
Action-value function Qπ(s, a) = E[Gt|St = s, At = a]
(state, action) → expected return performing a and following π thereafter
Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 19 / 30
Reinforcement Learning Basic Concepts Learning a Policy
Learning a Policy III
SARSA and Q-learning
They are algorithms that approximate the action-value function
At every iteration the estimation is refined thanks to the new experience.
Every time the evaluation becomes more precise, the policy gets closer to the
optimal one.
Generalized Policy Iteration (GPI)
Almost all RL methods could be described as GPI
They all have identifiable policies and value functions
Learning consist of two interacting processes
Policy evaluation Compute the value function
Policy improvement Make the policy greedy
Figure: [Sutton and Barto, 2018]
Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 20 / 30
Reinforcement Learning Basic Concepts Learning a Policy
Learning a policy IV
The exploration/exploitation dilemma
The agent seeks to learn action values of optimal behavior, but it needs to behave
non-optimally in order to explore all actions (to find the optimal actions)
ε-greedy policy: the agent behave greedily but there is a (small) ε probability to
select a random action.
On-policy learning
The agent learns action values of the policy that produces its own behavior (qπ)
It learns about a near-optimal policy (ε-greedy)
Target policy and behavior policy correspond
Off-policy learning
The agent learns action values of the optimal policy (q∗)
There are two policies: target and behavior
Every n steps the target policy is copied into the behavior one
If n = 1 there is only one action-value function
But it’s not the same of on-policy (the target and behavior policy still not correspond)
Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 21 / 30
Reinforcement Learning Basic Concepts Learning a Policy
How to estimate the value function?
A table that associate state (or state-action pairs) to a value
Randomly initialized
Temporal Difference (TD) update
V (St) ← V (St) + α[Rt+1 + γV (St+1) − V (St)]
TD error
δt = Rt+1 + γV (St+1) − V (St)
Rt+1 + γV (St+1) is a better estimation of V (St)
The actual reward obtained plus the expected rewards for the next state
By subtracting the old estimate we get the error made at time t
Sarsa (on-policy)
Q(St, At) ← Q(St, At) + α[Rt+1 + γQ(St+1, At+1) − Q(St, At)]
Q-learning (off-policy)
Q(St, At) ← Q(St, At) + α[Rt+1 + γ maxa Q(St+1, a) − Q(St, At)]
Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 22 / 30
Example of a Driving Agent: DQN
Next in Line...
1 The Driving Problem
Single task approach
End-to-end approach
Pitfalls of Simulations
2 Reinforcement Learning Basic Concepts
Problem Definition
Learning a Policy
3 Example of a Driving Agent: DQN
Neural Network and Training Cycle
Issues and Solutions
Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 23 / 30
Example of a Driving Agent: DQN Neural Network and Training Cycle
Deep Q-Network
DQN
When the number of states increases, it becomes impossible to use a table to
store the Q-function (alias for action-value function)
A neural network can approximate the Q-function
We also get better generalization and the ability to deal with non-Markovian envs
How to represent the action-value function in a NN?
1 Input: the state (as a vector) → Output: the values of each action
2 Input: the state and an action → Output: the value for that action
Design the driver agent
Environment: states and actions, reward function
Neural network
Training cycle
Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 24 / 30
Example of a Driving Agent: DQN Neural Network and Training Cycle
Neural Network
Pre-processing
It’s important how we present the data to the network
Let’s take images as an example
Resize to small images (e.g. 84x84)
If colors are not discriminatory, use grayscale
How many frames to stack?
Remember: the larger the input vector, the larger the searching space
Convolutional Neural Networks
A good idea for structured/spatially local data
e.g. images, lidar data ...
Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 25 / 30
Example of a Driving Agent: DQN Neural Network and Training Cycle
Training Cycle
Transaction samples are used to compute the target update
Training use a gradient form of Q-learning
([Mnih et al., 2013] and [Sutton and Barto, 2018] for details)
Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 26 / 30
Example of a Driving Agent: DQN Issues and Solutions
DQN Issues and Solutions I
Each of these features significantly improve performance
Experience Replay
We put samples in a buffer and we take random batches from it for training
When full, oldest samples are discarded
Why? Experience replay is needed to break temporal relation
In supervised learning, inputs should be independent and identically distributed (i.i.d.)
i.e. the generative process have no memory of past generated samples
Other advantages:
More efficient use of experience (samples are used several times)
Avoid catastrophic forgetting (by presenting again old samples)
Double DQN
We have two networks, one behavior NN and one target NN
We train the target NN
Every N steps we copy the target NN into the behavior one
Why? To reduce target instability
Supervised learning to perform well requires that for the same input a label does not
change over time
In RL, the target change constantly (as we refine the estimations)
Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 27 / 30
Example of a Driving Agent: DQN Issues and Solutions
DQN Issues and Solutions II
Rewards scaling
Scaling rewards in the range [−1; 1] dramatically improves efficiency [Mnih et al., 2013]
Frame skipping
Consequent frames are very similar → we can skip frames without loosing much
information and speed up training by N times
The selected action is repeated for N frames
Initial sampling
It could be useful to save K samples in the replay buffer before start training
So many parameters...
[Mnih et al., 2015] is a good place to start
The table of hyperparameters used in their work is in the next slide
More recent works and experiments could provide better suggestions
Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 28 / 30
Example of a Driving Agent: DQN Issues and Solutions
Hyperparameters used in [Mnih et al., 2015]
Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 29 / 30
Example of a Driving Agent: DQN Issues and Solutions
Last things
Reward function
Designing a reward function is not an easy task
It could severely affect learning
In the case of a driving agent (like in RL for robotics), the rewards are internal and
doesn’t come from the environment
You should give rewards also for intermediate steps to direct the agent (i.e. not
only +1 when successful and −1 when it fails)
Some naive hints
Reach target +1
Crash −1
Car goes forward +0.4 (if too big it will inhibit crash penalty)
Car turns +0.1 (if too big it will keep circle around)
−0.2 instead of the positive reward if the agent turns to a direction and just after it turns
to the opposite one (i.e. right-left or left-right) to reduce car oscillations
Exploration/exploitation
Start with a big ε
Gradually reduce it with an ε-decay (e.g. 0.999)
εt+1 = εt ∗ decay
Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 30 / 30
Appendix: Reading Suggestions
Appendix: Reading Suggestions
Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 31 / 30
Appendix: Reading Suggestions
Reading Suggestions: Access to Resources
Books and papers are available through the university proxy
Search engine: https://sba-unibo-it.ezproxy.unibo.it/AlmaStart
If you want to access an article by link
converts dots in scores and add .ezproxy.unibo.it at the end of the url e.g:
https://www.nature.com/articles/nature14236
↓
https://www-nature-com.ezproxy.unibo.it/articles/nature14236
Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 32 / 30
Appendix: Reading Suggestions
Reading Suggestions: Driving Problem
Extended review of autonomous driving approaches [Leon and Gavrilescu, 2019]
Overview of RL for autonomous driving and its challenges [Talpaert et al., 2019]
Tutorial of RL with CARLA simulator https://pythonprogramming.net/
introduction-self-driving-autonomous-cars-carla-python
End-to-end approach examples
CNN to steering on a real car [Bojarski et al., 2016]
DQN car control [Yu et al., 2016]
Single task approach example
NN for recognition + RL for control [Li et al., 2018]
Converting virtual images to real one
[Pan et al., 2017]
Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 33 / 30
Appendix: Reading Suggestions
Reading Suggestions: RL & DL
In depth (books)
[Sutton and Barto, 2018] is the bible of RL – the second version (2018) contains
also recent breakthroughs
[Goodfellow et al., 2016] for Deep Learning
Compact introductions
Briefly explanation of RL basics and DQN: https://rubenfiszel.github.
io/posts/rl4j/2016-08-24-Reinforcement-Learning-and-DQN
CNN: https://cs231n.github.io/convolutional-networks/
Even more compact: https://pathmind.com/wiki/convolutional-network
DQN
[Mnih et al., 2015] Revised version (better)
[Mnih et al., 2013] Original version
[van Hasselt et al., 2015] Double Q-learning
Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 34 / 30
References
References I
Bojarski, M., Testa, D. D., Dworakowski, D., Firner, B., Flepp, B., Goyal, P., Jackel,
L. D., Monfort, M., Muller, U., Zhang, J., Zhang, X., Zhao, J., and Zieba, K. (2016).
End to end learning for self-driving cars.
Goodfellow, I., Bengio, Y., and Courville, A. (2016).
Deep Learning.
MIT Press.
http://www.deeplearningbook.org.
Leon, F. and Gavrilescu, M. (2019).
A review of tracking, prediction and decision making methods for autonomous
driving.
Li, D., Zhao, D., Zhang, Q., and Chen, Y. (2018).
Reinforcement learning and deep learning based lateral control for autonomous
driving.
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., and
Riedmiller, M. (2013).
Playing atari with deep reinforcement learning.
Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 35 / 30
References
References II
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G.,
Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C.,
Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., and
Hassabis, D. (2015).
Human-level control through deep reinforcement learning.
Nature, 518(7540):529–533.
Pan, X., You, Y., Wang, Z., and Lu, C. (2017).
Virtual to real reinforcement learning for autonomous driving.
Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2015).
You only look once: Unified, real-time object detection.
Sallab, A. E., Abdou, M., Perot, E., and Yogamani, S. (2017).
Deep reinforcement learning framework for autonomous driving.
Electronic Imaging, 2017(19):70–76.
Sutton, R. S. and Barto, A. G. (2018).
Reinforcement learning : an introduction.
The MIT Press.
Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 36 / 30
References
References III
Talpaert, V., Sobh, I., Kiran, B. R., Mannion, P., Yogamani, S., El-Sallab, A., and
Perez, P. (2019).
Exploring applications of deep reinforcement learning for real-world autonomous
driving systems.
van Hasselt, H., Guez, A., and Silver, D. (2015).
Deep reinforcement learning with double q-learning.
Yu, A., Palefsky-Smith, R., and Bedi, R. (2016).
Deep reinforcement learning for simulated autonomous vehicle control.
Course Project Reports: Winter, pages 1–7.
Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 37 / 30
Autonomous Driving and Reinforcement Learning – an Introduction
Michael Bosello
michael.bosello@studio.unibo.it
Universit`a di Bologna – Department of Computer Science and Engineering, Cesena, Italy
Smart Vehicular Systems A.Y. 2019/2020
Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 38 / 30

Weitere ähnliche Inhalte

Was ist angesagt?

Machine learning for social media analytics
Machine learning for  social media analyticsMachine learning for  social media analytics
Machine learning for social media analyticsJenya Terpil
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learningDing Li
 
Lecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation MaximizationLecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation Maximizationbutest
 
Introduction to continual learning
Introduction to continual learningIntroduction to continual learning
Introduction to continual learningNguyen Giang
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningSalem-Kabbani
 
Reinforcement Learning Q-Learning
Reinforcement Learning   Q-Learning Reinforcement Learning   Q-Learning
Reinforcement Learning Q-Learning Melaku Eneayehu
 
Intro to Reinforcement learning - part III
Intro to Reinforcement learning - part IIIIntro to Reinforcement learning - part III
Intro to Reinforcement learning - part IIIMikko Mäkipää
 
Reinforcement learning, Q-Learning
Reinforcement learning, Q-LearningReinforcement learning, Q-Learning
Reinforcement learning, Q-LearningKuppusamy P
 
Autonomous or self driving cars
Autonomous or self driving carsAutonomous or self driving cars
Autonomous or self driving carsSandeep Nayak
 
The Race to 2021: The State of Autonomous Vehicles and a "Who's Who" of Indus...
The Race to 2021: The State of Autonomous Vehicles and a "Who's Who" of Indus...The Race to 2021: The State of Autonomous Vehicles and a "Who's Who" of Indus...
The Race to 2021: The State of Autonomous Vehicles and a "Who's Who" of Indus...Altimeter, a Prophet Company
 
What is Deep Learning | Deep Learning Simplified | Deep Learning Tutorial | E...
What is Deep Learning | Deep Learning Simplified | Deep Learning Tutorial | E...What is Deep Learning | Deep Learning Simplified | Deep Learning Tutorial | E...
What is Deep Learning | Deep Learning Simplified | Deep Learning Tutorial | E...Edureka!
 
Intelligent Vehicles
Intelligent VehiclesIntelligent Vehicles
Intelligent Vehiclesanakarenbm
 
Machine learning seminar presentation
Machine learning seminar presentationMachine learning seminar presentation
Machine learning seminar presentationsweety seth
 
Single Image Super Resolution Overview
Single Image Super Resolution OverviewSingle Image Super Resolution Overview
Single Image Super Resolution OverviewLEE HOSEONG
 
Self Driving Cars V11
Self Driving Cars V11Self Driving Cars V11
Self Driving Cars V11Kevin Root
 
An introduction to deep reinforcement learning
An introduction to deep reinforcement learningAn introduction to deep reinforcement learning
An introduction to deep reinforcement learningBig Data Colombia
 

Was ist angesagt? (20)

Machine learning for social media analytics
Machine learning for  social media analyticsMachine learning for  social media analytics
Machine learning for social media analytics
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Autonomous cars
Autonomous carsAutonomous cars
Autonomous cars
 
Lecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation MaximizationLecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation Maximization
 
Introduction to continual learning
Introduction to continual learningIntroduction to continual learning
Introduction to continual learning
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Reinforcement Learning Q-Learning
Reinforcement Learning   Q-Learning Reinforcement Learning   Q-Learning
Reinforcement Learning Q-Learning
 
Intro to Reinforcement learning - part III
Intro to Reinforcement learning - part IIIIntro to Reinforcement learning - part III
Intro to Reinforcement learning - part III
 
Reinforcement learning, Q-Learning
Reinforcement learning, Q-LearningReinforcement learning, Q-Learning
Reinforcement learning, Q-Learning
 
Autonomous or self driving cars
Autonomous or self driving carsAutonomous or self driving cars
Autonomous or self driving cars
 
Autonomous Vehicles
Autonomous VehiclesAutonomous Vehicles
Autonomous Vehicles
 
The Race to 2021: The State of Autonomous Vehicles and a "Who's Who" of Indus...
The Race to 2021: The State of Autonomous Vehicles and a "Who's Who" of Indus...The Race to 2021: The State of Autonomous Vehicles and a "Who's Who" of Indus...
The Race to 2021: The State of Autonomous Vehicles and a "Who's Who" of Indus...
 
Autonomous car
Autonomous car Autonomous car
Autonomous car
 
What is Deep Learning | Deep Learning Simplified | Deep Learning Tutorial | E...
What is Deep Learning | Deep Learning Simplified | Deep Learning Tutorial | E...What is Deep Learning | Deep Learning Simplified | Deep Learning Tutorial | E...
What is Deep Learning | Deep Learning Simplified | Deep Learning Tutorial | E...
 
Intelligent Vehicles
Intelligent VehiclesIntelligent Vehicles
Intelligent Vehicles
 
Machine learning seminar presentation
Machine learning seminar presentationMachine learning seminar presentation
Machine learning seminar presentation
 
Support Vector Machine
Support Vector MachineSupport Vector Machine
Support Vector Machine
 
Single Image Super Resolution Overview
Single Image Super Resolution OverviewSingle Image Super Resolution Overview
Single Image Super Resolution Overview
 
Self Driving Cars V11
Self Driving Cars V11Self Driving Cars V11
Self Driving Cars V11
 
An introduction to deep reinforcement learning
An introduction to deep reinforcement learningAn introduction to deep reinforcement learning
An introduction to deep reinforcement learning
 

Ähnlich wie Autonomous Driving and Reinforcement Learning - an Introduction

MSN 2019 - Robot Drivers: Learning to Drive by Trial & Error
MSN 2019 - Robot Drivers: Learning to Drive by Trial & ErrorMSN 2019 - Robot Drivers: Learning to Drive by Trial & Error
MSN 2019 - Robot Drivers: Learning to Drive by Trial & ErrorMichael Bosello
 
Concepts of predictive control
Concepts of predictive controlConcepts of predictive control
Concepts of predictive controlJARossiter
 
IRJET- A Review on Deep Reinforcement Learning Induced Autonomous Driving Fra...
IRJET- A Review on Deep Reinforcement Learning Induced Autonomous Driving Fra...IRJET- A Review on Deep Reinforcement Learning Induced Autonomous Driving Fra...
IRJET- A Review on Deep Reinforcement Learning Induced Autonomous Driving Fra...IRJET Journal
 
Planning in Markov Stochastic Task Domains
Planning in Markov Stochastic Task DomainsPlanning in Markov Stochastic Task Domains
Planning in Markov Stochastic Task DomainsWaqas Tariq
 
Towards Reinforcement Learning-based Aggregate Computing
Towards Reinforcement Learning-based Aggregate ComputingTowards Reinforcement Learning-based Aggregate Computing
Towards Reinforcement Learning-based Aggregate ComputingGianluca Aguzzi
 
ARTIFICIAL INTELLIGENCE ppt.pptx
ARTIFICIAL INTELLIGENCE ppt.pptxARTIFICIAL INTELLIGENCE ppt.pptx
ARTIFICIAL INTELLIGENCE ppt.pptxVinodhKumar658343
 
[DOLAP2023] The Whys and Wherefores of Cubes
[DOLAP2023] The Whys and Wherefores of Cubes[DOLAP2023] The Whys and Wherefores of Cubes
[DOLAP2023] The Whys and Wherefores of CubesUniversity of Bologna
 
Quantitative management
Quantitative managementQuantitative management
Quantitative managementsmumbahelp
 
课堂讲义(最后更新:2009-9-25)
课堂讲义(最后更新:2009-9-25)课堂讲义(最后更新:2009-9-25)
课堂讲义(最后更新:2009-9-25)butest
 
Aj Copulas V4
Aj Copulas V4Aj Copulas V4
Aj Copulas V4jainan33
 
1PPA 670 Public Policy AnalysisBasic Policy Terms an.docx
1PPA 670 Public Policy AnalysisBasic Policy Terms an.docx1PPA 670 Public Policy AnalysisBasic Policy Terms an.docx
1PPA 670 Public Policy AnalysisBasic Policy Terms an.docxfelicidaddinwoodie
 
Codecamp Iasi 7 mai 2011 Monte Carlo Simulation
Codecamp Iasi 7 mai 2011 Monte Carlo SimulationCodecamp Iasi 7 mai 2011 Monte Carlo Simulation
Codecamp Iasi 7 mai 2011 Monte Carlo SimulationCodecamp Romania
 
reinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdfreinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdfVaishnavGhadge1
 
Demolition Derby 2011 at GECCO
Demolition Derby 2011 at GECCO Demolition Derby 2011 at GECCO
Demolition Derby 2011 at GECCO Martin Butz
 
Machine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paperMachine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paperJames by CrowdProcess
 
On Collective Reinforcement Learning
On Collective Reinforcement LearningOn Collective Reinforcement Learning
On Collective Reinforcement LearningGianluca Aguzzi
 
Sgd teaching-consulting-10-jan-2009 (1)
Sgd teaching-consulting-10-jan-2009 (1)Sgd teaching-consulting-10-jan-2009 (1)
Sgd teaching-consulting-10-jan-2009 (1)Sanjeev Deshmukh
 

Ähnlich wie Autonomous Driving and Reinforcement Learning - an Introduction (20)

MSN 2019 - Robot Drivers: Learning to Drive by Trial & Error
MSN 2019 - Robot Drivers: Learning to Drive by Trial & ErrorMSN 2019 - Robot Drivers: Learning to Drive by Trial & Error
MSN 2019 - Robot Drivers: Learning to Drive by Trial & Error
 
Concepts of predictive control
Concepts of predictive controlConcepts of predictive control
Concepts of predictive control
 
IRJET- A Review on Deep Reinforcement Learning Induced Autonomous Driving Fra...
IRJET- A Review on Deep Reinforcement Learning Induced Autonomous Driving Fra...IRJET- A Review on Deep Reinforcement Learning Induced Autonomous Driving Fra...
IRJET- A Review on Deep Reinforcement Learning Induced Autonomous Driving Fra...
 
Planning in Markov Stochastic Task Domains
Planning in Markov Stochastic Task DomainsPlanning in Markov Stochastic Task Domains
Planning in Markov Stochastic Task Domains
 
Towards Reinforcement Learning-based Aggregate Computing
Towards Reinforcement Learning-based Aggregate ComputingTowards Reinforcement Learning-based Aggregate Computing
Towards Reinforcement Learning-based Aggregate Computing
 
FurkatKochkarov-propoal
FurkatKochkarov-propoalFurkatKochkarov-propoal
FurkatKochkarov-propoal
 
ARTIFICIAL INTELLIGENCE ppt.pptx
ARTIFICIAL INTELLIGENCE ppt.pptxARTIFICIAL INTELLIGENCE ppt.pptx
ARTIFICIAL INTELLIGENCE ppt.pptx
 
[DOLAP2023] The Whys and Wherefores of Cubes
[DOLAP2023] The Whys and Wherefores of Cubes[DOLAP2023] The Whys and Wherefores of Cubes
[DOLAP2023] The Whys and Wherefores of Cubes
 
Towards Systematically Engineering Ensembles - Martin Wirsing
Towards Systematically Engineering Ensembles - Martin WirsingTowards Systematically Engineering Ensembles - Martin Wirsing
Towards Systematically Engineering Ensembles - Martin Wirsing
 
Mobile data offloading
Mobile data offloadingMobile data offloading
Mobile data offloading
 
Quantitative management
Quantitative managementQuantitative management
Quantitative management
 
课堂讲义(最后更新:2009-9-25)
课堂讲义(最后更新:2009-9-25)课堂讲义(最后更新:2009-9-25)
课堂讲义(最后更新:2009-9-25)
 
Aj Copulas V4
Aj Copulas V4Aj Copulas V4
Aj Copulas V4
 
1PPA 670 Public Policy AnalysisBasic Policy Terms an.docx
1PPA 670 Public Policy AnalysisBasic Policy Terms an.docx1PPA 670 Public Policy AnalysisBasic Policy Terms an.docx
1PPA 670 Public Policy AnalysisBasic Policy Terms an.docx
 
Codecamp Iasi 7 mai 2011 Monte Carlo Simulation
Codecamp Iasi 7 mai 2011 Monte Carlo SimulationCodecamp Iasi 7 mai 2011 Monte Carlo Simulation
Codecamp Iasi 7 mai 2011 Monte Carlo Simulation
 
reinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdfreinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdf
 
Demolition Derby 2011 at GECCO
Demolition Derby 2011 at GECCO Demolition Derby 2011 at GECCO
Demolition Derby 2011 at GECCO
 
Machine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paperMachine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paper
 
On Collective Reinforcement Learning
On Collective Reinforcement LearningOn Collective Reinforcement Learning
On Collective Reinforcement Learning
 
Sgd teaching-consulting-10-jan-2009 (1)
Sgd teaching-consulting-10-jan-2009 (1)Sgd teaching-consulting-10-jan-2009 (1)
Sgd teaching-consulting-10-jan-2009 (1)
 

Kürzlich hochgeladen

Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 

Kürzlich hochgeladen (20)

Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 

Autonomous Driving and Reinforcement Learning - an Introduction

  • 1. Autonomous Driving and Reinforcement Learning – an Introduction Michael Bosello michael.bosello@studio.unibo.it Universit`a di Bologna – Department of Computer Science and Engineering, Cesena, Italy Smart Vehicular Systems A.Y. 2019/2020 Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 1 / 30
  • 2. Outline 1 The Driving Problem Single task approach End-to-end approach Pitfalls of Simulations 2 Reinforcement Learning Basic Concepts Problem Definition Learning a Policy 3 Example of a Driving Agent: DQN Neural Network and Training Cycle Issues and Solutions Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 2 / 30
  • 3. The Driving Problem Next in Line... 1 The Driving Problem Single task approach End-to-end approach Pitfalls of Simulations 2 Reinforcement Learning Basic Concepts Problem Definition Learning a Policy 3 Example of a Driving Agent: DQN Neural Network and Training Cycle Issues and Solutions Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 3 / 30
  • 4. The Driving Problem Self-Driving I Goals Reach the destination Avoid dangerous states Provide comfort to passengers (e.g. avoid oscillations, sudden moves) Settings Multi-agent problem One have to consider others Interaction is both Cooperative, all the agents desire safety Competitive, each agent has its own goal Environment Non-deterministic Partially observable Dynamic Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 4 / 30
  • 5. The Driving Problem Self-Driving II Three sub-problems Recognition Identify environment’s components Prediction Predict the evolution of the surrounding Decision making Take decisions and act to pursue the goal Two approaches Single task handling Use human ingenuity to inject knowledge about the domain End-to-end Let the algorithm optimize towards the final goal without constrains Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 5 / 30
  • 6. The Driving Problem Self-Driving III Figure: Modules of autonomous driving systems (source [Talpaert et al., 2019] - modified) Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 6 / 30
  • 7. The Driving Problem Sensing & Control Sensing Which sensors? Single/multiple Sensor expense Operating conditions Control Continuous – steering, acceleration, braking Discrete – forward, right, left, brake Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 7 / 30
  • 8. The Driving Problem Single task approach Recognition Convolutional Neural Networks (CNNs) YOLO [Redmon et al., 2015] Computer Vision techniques for object detection Feature based Local descriptor based Mixed Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 8 / 30
  • 9. The Driving Problem Single task approach Prediction Model-based Physics-based Maneuver-based Interaction-aware Data-driven Recurrent NN (like Long-Short Term Memory LSTM) Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 9 / 30
  • 10. The Driving Problem Single task approach Decision Making Planning/Tree Search/Optimization Agent-oriented programming (like the BDI model) Deep Learning (DL) Reinforcement Learning (RL) Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 10 / 30
  • 11. The Driving Problem End-to-end approach End-to-end Supervised Learning Based on imitation Current approach by major car manufacturer Robotic drivers are expected to be perfect Drawbacks Training data Huge amounts of labeled data or human effort Covering all possible driving scenarios is very hard Unsupervised Learning Learns behaviors by trial-and-error like a young human driving student Does not require explicit supervision from humans. Mainly used on simulated environment to avoid consequences in real-life Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 11 / 30
  • 12. The Driving Problem Pitfalls of Simulations Simulations Keep in mind An agent trained in a virtual environment will not perform well in the real setting For cameras: different visual appearance, especially for textures You need some kind of transfer learning or conversion of images/data Always enable noises of sensors/actuators Some “grand truth” sensors present in some simulators aren’t available in real life For example, in TORCS (a circuit simulator popular in the ML field) you can have exact distances from other cars and from the borders. Too easy... Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 12 / 30
  • 13. Reinforcement Learning Basic Concepts Next in Line... 1 The Driving Problem Single task approach End-to-end approach Pitfalls of Simulations 2 Reinforcement Learning Basic Concepts Problem Definition Learning a Policy 3 Example of a Driving Agent: DQN Neural Network and Training Cycle Issues and Solutions Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 13 / 30
  • 14. Reinforcement Learning Basic Concepts Problem Definition Agent-environment interaction In RL, an agent learns how to fulfill a task by interacting with its environment The interaction between the agent and the environment can be reduced to: State Every information about the environment useful to predict the future Action What the agent do Reward A real number that Indicates how well the agent is doing Figure: At each time step t the agent perform an action At and receives the couple St+1, Rt+1 (source [Sutton and Barto, 2018]) Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 14 / 30
  • 15. Reinforcement Learning Basic Concepts Problem Definition Terminology Episode A complete run from a starting state to a final state Episodic task A task that has an end Continuing task A task that goes on without limit We can split it e.g. by setting a maximum number of steps Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 15 / 30
  • 16. Reinforcement Learning Basic Concepts Problem Definition Markov Decision Process (MDP) A RL problem can be formalized as a MDP. The 4-tuple: S set of states As set of actions available in state s P(s |s, a) state-transition probabilities (probability to reach s given s and a) R(s, s ) reward distribution associate to state transitions Figure: Example of a simple MDP (source Wikipedia) Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 16 / 30
  • 17. Reinforcement Learning Basic Concepts Problem Definition Environment features MDPs can deal with non-deterministic environments (because they have probability distributions) But we need to cope with observability Markov Property – Full observability Classical RL algorithms rely on the Markov property A state satisfy the property if it includes information about all aspects of the (past) agent-environment interaction that make a difference for the future From states to observations – Partial observability Sometimes, an agent has only a partial view of the world state The agent gets information (observations) about some aspects of the environment Abusing the language, observations/history are called “state” and symbolized St How to deal with partial observability? Approximation methods (e.g. NN) doesn’t rely on the Markov property Sometimes we can reconstruct a Markov state using a history Stacking the last n states Using a RNN (e.g. LSTM) Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 17 / 30
  • 18. Reinforcement Learning Basic Concepts Learning a Policy Learning a Policy I We want the agent to learn a strategy i.e. a policy Policy π A map from states to actions used by the agent to choose the next action In particular, we are searching for the optimal policy → The policy that maximize the cumulative reward Cumulative reward The discounted sum of rewards over time R = n t=0 γt rt+1 | 0 ≤ γ ≤ 1 For larger values of γ, the agent focuses more about the long term rewards For smaller values of γ, the agent focuses more about the immediate rewards What does this mean? The actions of the agent affect its possibilities later in time It could be better to choose an action that led to a better state than an action that gives you a greater reward but led to a worse state Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 18 / 30
  • 19. Reinforcement Learning Basic Concepts Learning a Policy Learning a Policy II Problem we don’t know the MDP model (state transition probabilities and reward distribution) Solution Monte Carlo approach: let’s make estimations based on experience A way to produce a policy is to estimate the value (or action-value) function given the function, the agent needs only to perform the action with the greatest value Expected return E[Gt] The expectation of cumulative reward i.e. our current estimation Value function Vπ(s) = E[Gt|St = s] (state) → expected return when starting in s and following π thereafter Action-value function Qπ(s, a) = E[Gt|St = s, At = a] (state, action) → expected return performing a and following π thereafter Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 19 / 30
  • 20. Reinforcement Learning Basic Concepts Learning a Policy Learning a Policy III SARSA and Q-learning They are algorithms that approximate the action-value function At every iteration the estimation is refined thanks to the new experience. Every time the evaluation becomes more precise, the policy gets closer to the optimal one. Generalized Policy Iteration (GPI) Almost all RL methods could be described as GPI They all have identifiable policies and value functions Learning consist of two interacting processes Policy evaluation Compute the value function Policy improvement Make the policy greedy Figure: [Sutton and Barto, 2018] Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 20 / 30
  • 21. Reinforcement Learning Basic Concepts Learning a Policy Learning a policy IV The exploration/exploitation dilemma The agent seeks to learn action values of optimal behavior, but it needs to behave non-optimally in order to explore all actions (to find the optimal actions) ε-greedy policy: the agent behave greedily but there is a (small) ε probability to select a random action. On-policy learning The agent learns action values of the policy that produces its own behavior (qπ) It learns about a near-optimal policy (ε-greedy) Target policy and behavior policy correspond Off-policy learning The agent learns action values of the optimal policy (q∗) There are two policies: target and behavior Every n steps the target policy is copied into the behavior one If n = 1 there is only one action-value function But it’s not the same of on-policy (the target and behavior policy still not correspond) Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 21 / 30
  • 22. Reinforcement Learning Basic Concepts Learning a Policy How to estimate the value function? A table that associate state (or state-action pairs) to a value Randomly initialized Temporal Difference (TD) update V (St) ← V (St) + α[Rt+1 + γV (St+1) − V (St)] TD error δt = Rt+1 + γV (St+1) − V (St) Rt+1 + γV (St+1) is a better estimation of V (St) The actual reward obtained plus the expected rewards for the next state By subtracting the old estimate we get the error made at time t Sarsa (on-policy) Q(St, At) ← Q(St, At) + α[Rt+1 + γQ(St+1, At+1) − Q(St, At)] Q-learning (off-policy) Q(St, At) ← Q(St, At) + α[Rt+1 + γ maxa Q(St+1, a) − Q(St, At)] Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 22 / 30
  • 23. Example of a Driving Agent: DQN Next in Line... 1 The Driving Problem Single task approach End-to-end approach Pitfalls of Simulations 2 Reinforcement Learning Basic Concepts Problem Definition Learning a Policy 3 Example of a Driving Agent: DQN Neural Network and Training Cycle Issues and Solutions Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 23 / 30
  • 24. Example of a Driving Agent: DQN Neural Network and Training Cycle Deep Q-Network DQN When the number of states increases, it becomes impossible to use a table to store the Q-function (alias for action-value function) A neural network can approximate the Q-function We also get better generalization and the ability to deal with non-Markovian envs How to represent the action-value function in a NN? 1 Input: the state (as a vector) → Output: the values of each action 2 Input: the state and an action → Output: the value for that action Design the driver agent Environment: states and actions, reward function Neural network Training cycle Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 24 / 30
  • 25. Example of a Driving Agent: DQN Neural Network and Training Cycle Neural Network Pre-processing It’s important how we present the data to the network Let’s take images as an example Resize to small images (e.g. 84x84) If colors are not discriminatory, use grayscale How many frames to stack? Remember: the larger the input vector, the larger the searching space Convolutional Neural Networks A good idea for structured/spatially local data e.g. images, lidar data ... Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 25 / 30
  • 26. Example of a Driving Agent: DQN Neural Network and Training Cycle Training Cycle Transaction samples are used to compute the target update Training use a gradient form of Q-learning ([Mnih et al., 2013] and [Sutton and Barto, 2018] for details) Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 26 / 30
  • 27. Example of a Driving Agent: DQN Issues and Solutions DQN Issues and Solutions I Each of these features significantly improve performance Experience Replay We put samples in a buffer and we take random batches from it for training When full, oldest samples are discarded Why? Experience replay is needed to break temporal relation In supervised learning, inputs should be independent and identically distributed (i.i.d.) i.e. the generative process have no memory of past generated samples Other advantages: More efficient use of experience (samples are used several times) Avoid catastrophic forgetting (by presenting again old samples) Double DQN We have two networks, one behavior NN and one target NN We train the target NN Every N steps we copy the target NN into the behavior one Why? To reduce target instability Supervised learning to perform well requires that for the same input a label does not change over time In RL, the target change constantly (as we refine the estimations) Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 27 / 30
  • 28. Example of a Driving Agent: DQN Issues and Solutions DQN Issues and Solutions II Rewards scaling Scaling rewards in the range [−1; 1] dramatically improves efficiency [Mnih et al., 2013] Frame skipping Consequent frames are very similar → we can skip frames without loosing much information and speed up training by N times The selected action is repeated for N frames Initial sampling It could be useful to save K samples in the replay buffer before start training So many parameters... [Mnih et al., 2015] is a good place to start The table of hyperparameters used in their work is in the next slide More recent works and experiments could provide better suggestions Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 28 / 30
  • 29. Example of a Driving Agent: DQN Issues and Solutions Hyperparameters used in [Mnih et al., 2015] Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 29 / 30
  • 30. Example of a Driving Agent: DQN Issues and Solutions Last things Reward function Designing a reward function is not an easy task It could severely affect learning In the case of a driving agent (like in RL for robotics), the rewards are internal and doesn’t come from the environment You should give rewards also for intermediate steps to direct the agent (i.e. not only +1 when successful and −1 when it fails) Some naive hints Reach target +1 Crash −1 Car goes forward +0.4 (if too big it will inhibit crash penalty) Car turns +0.1 (if too big it will keep circle around) −0.2 instead of the positive reward if the agent turns to a direction and just after it turns to the opposite one (i.e. right-left or left-right) to reduce car oscillations Exploration/exploitation Start with a big ε Gradually reduce it with an ε-decay (e.g. 0.999) εt+1 = εt ∗ decay Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 30 / 30
  • 31. Appendix: Reading Suggestions Appendix: Reading Suggestions Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 31 / 30
  • 32. Appendix: Reading Suggestions Reading Suggestions: Access to Resources Books and papers are available through the university proxy Search engine: https://sba-unibo-it.ezproxy.unibo.it/AlmaStart If you want to access an article by link converts dots in scores and add .ezproxy.unibo.it at the end of the url e.g: https://www.nature.com/articles/nature14236 ↓ https://www-nature-com.ezproxy.unibo.it/articles/nature14236 Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 32 / 30
  • 33. Appendix: Reading Suggestions Reading Suggestions: Driving Problem Extended review of autonomous driving approaches [Leon and Gavrilescu, 2019] Overview of RL for autonomous driving and its challenges [Talpaert et al., 2019] Tutorial of RL with CARLA simulator https://pythonprogramming.net/ introduction-self-driving-autonomous-cars-carla-python End-to-end approach examples CNN to steering on a real car [Bojarski et al., 2016] DQN car control [Yu et al., 2016] Single task approach example NN for recognition + RL for control [Li et al., 2018] Converting virtual images to real one [Pan et al., 2017] Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 33 / 30
  • 34. Appendix: Reading Suggestions Reading Suggestions: RL & DL In depth (books) [Sutton and Barto, 2018] is the bible of RL – the second version (2018) contains also recent breakthroughs [Goodfellow et al., 2016] for Deep Learning Compact introductions Briefly explanation of RL basics and DQN: https://rubenfiszel.github. io/posts/rl4j/2016-08-24-Reinforcement-Learning-and-DQN CNN: https://cs231n.github.io/convolutional-networks/ Even more compact: https://pathmind.com/wiki/convolutional-network DQN [Mnih et al., 2015] Revised version (better) [Mnih et al., 2013] Original version [van Hasselt et al., 2015] Double Q-learning Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 34 / 30
  • 35. References References I Bojarski, M., Testa, D. D., Dworakowski, D., Firner, B., Flepp, B., Goyal, P., Jackel, L. D., Monfort, M., Muller, U., Zhang, J., Zhang, X., Zhao, J., and Zieba, K. (2016). End to end learning for self-driving cars. Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning. MIT Press. http://www.deeplearningbook.org. Leon, F. and Gavrilescu, M. (2019). A review of tracking, prediction and decision making methods for autonomous driving. Li, D., Zhao, D., Zhang, Q., and Chen, Y. (2018). Reinforcement learning and deep learning based lateral control for autonomous driving. Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., and Riedmiller, M. (2013). Playing atari with deep reinforcement learning. Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 35 / 30
  • 36. References References II Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., and Hassabis, D. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540):529–533. Pan, X., You, Y., Wang, Z., and Lu, C. (2017). Virtual to real reinforcement learning for autonomous driving. Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2015). You only look once: Unified, real-time object detection. Sallab, A. E., Abdou, M., Perot, E., and Yogamani, S. (2017). Deep reinforcement learning framework for autonomous driving. Electronic Imaging, 2017(19):70–76. Sutton, R. S. and Barto, A. G. (2018). Reinforcement learning : an introduction. The MIT Press. Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 36 / 30
  • 37. References References III Talpaert, V., Sobh, I., Kiran, B. R., Mannion, P., Yogamani, S., El-Sallab, A., and Perez, P. (2019). Exploring applications of deep reinforcement learning for real-world autonomous driving systems. van Hasselt, H., Guez, A., and Silver, D. (2015). Deep reinforcement learning with double q-learning. Yu, A., Palefsky-Smith, R., and Bedi, R. (2016). Deep reinforcement learning for simulated autonomous vehicle control. Course Project Reports: Winter, pages 1–7. Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 37 / 30
  • 38. Autonomous Driving and Reinforcement Learning – an Introduction Michael Bosello michael.bosello@studio.unibo.it Universit`a di Bologna – Department of Computer Science and Engineering, Cesena, Italy Smart Vehicular Systems A.Y. 2019/2020 Michael Bosello Autonomous Driving and RL – an introduction SVS 2020 38 / 30