SlideShare a Scribd company logo
1 of 40
Rl for self-driving cars
Under the guidance of Prof. Srikanth Krishnamurthy
Sneha Ravikumar
Dhanshri More
Shweta Srinivasan
Components of Self-Driving Cars
● Obstacle detection using sensors to detect presence and notify game state
● Implemented with a pygame environment
● The second component is implementation of Deep Deterministic Policy
Gradient algorithm (DDPG) to control acceleration and braking in self-driving
cars.
● Implemented using Gym Torcs that was custom built
Obstacle detection
● The goal here is to identify the presence of obstacles in the path of the car using Reinforcement Learning trained on
a neural network.
● This is enabled with the help of sensors which track the obstacles and enforce a reward based system where the car
learns to maneuver around such obstacles and change course of path if obstacles are encountered.
● Obstacles with car represented using a Pygame environment
● Pymunk is the physics engine used by the simulation. Pymunk along with Pygame render the environment for the
game to run with obstacles and cars to detect obstacles.
Installation Dependencies
● Python3
● Keras with Theano backend
● Numpy, h5py
● Pygame
● Pymunk4
Sensors
● Sensors used in this game is a set of sonar readings which returns distance for each sonar we are simulating.
● There are 3 sonars in this environment, one at the center, one on either side of the center at an angle of 45 degrees.
● Instead of a grid of boolean sensors, sonar reading returns “N” distance readings, one for each sonar we are
simulating
● Distance is the count of first non-zero readings starting at the object
● In simple words, the sensor input is a reading of three distances from the car to any object it detects.
● At any given point in time, distance of 1 indicates obstacle in vicinity of car.
● These sensors get updated and distance computed for every frame with the current state of the car.
Key components
● Game environment to render the screen and car with the obstacles. This also manages the speed, direction and
control of the car.
● Learning component of the game where the heart of the Q Learning process resides.
● Neural Network model to train input from the sensors and output the action for the game state.
Game Environment
● The car automatically moves itself forward, faster as the game progresses. If it runs into a wall or an obstacle, the
game ends.
● There are three available actions at each frame: turn left(0), turn right(1), do nothing(2).
● At every frame, the game returns both a state and a reward.
● The state is a one dimensional array of sensor values, which can be 0, 1 or 2, as stated above.
● The reward is -500 if it runs into something and average of sensor readings if it doesn’t. Lower the sum of the
sensors, farther away it is from running into an obstacle.
Neural Network
● Input data for the neural network is the input from the three sensors.
● An input layer of 3 units (because there are 3 different sensors).
● 2 hidden layers of 164 and 150 units, each hidden layer is followed by a dropout layer with dropout rate 0.2 to avoid
overfitting
● Output layer of 3, one for each of our possible actions (left, right, do nothing) [in that order].
Reasons for choosing this architecture
Google’s DeepMind for Atari explains in detail the advantages of this architecture as opposed to a traditional architecture for
Q learning problems.
Traditional approach:
Input is the state and action and the output is the value of that single state action pair.
DeepMind approach:
Input is the state and the output is separate Q values for each possible action in its output layer.
In Q learning we need to get the maxQ(S’, A’), max of Q values for every action in the new state. Instead of running the
network forward for every action, just run it forward once instead.
Implementation
● Move the car forward one frame once the game starts.
● Get the sensor readings
● Based on the readings, predict Q values which show car’s confidence to take each of the three actions.
● Epsilon Greedy methodology to explore for 10%
● Execute action and get another sensor reading and reward
● Store the original reading, action we took, reward and new reading in a buffer
● Randomly sample the above buffer to generate the training data to be fed to the neural network
● Set y value for the iteration to a prediction based on original reading
● Make a new prediction based on new reading
● Observe reward, -500 indicates crash, set y and action as -500. If not, multiply predicted Q value and gamma and set
Dive in deeper
Sensor arms
Rewards
Results
Building Gym
Environment for Torcs
Software, Framework, Environment
1.Python 2.7
2.Keras 1.1.0
3.Tensorflow r0.10
4.xautomation (http://linux.die.net/man/7/xautomation)
5.OpenAI-Gym (https://github.com/openai/gym)
6.numpy
7.vtorcs-RL-color
8.plib 1.8.5, FreeGLUT
TORCS
TORCS, The Open Racing Car Simulator is a highly portable multi platform car racing simulation.
It is used as ordinary car racing game, as AI racing game and as research platform. It runs on
Linux (x86, AMD64 and PPC), FreeBSD, OpenSolaris, MacOSX and Windows.
Why TORCS?
You can visualize how the neural network learns over time and inspect its learning process,
rather than just looking at the final result
TORCS can help us simulate and understand machine learning technique in automated driving,
which is important for self-driving car technologies
Gym Torcs
OpenAI Gym is a toolkit for building reinforcement learning (RL) algorithm
Gym doesn’t have the environment set for Torcs. So the process starts from building the environment, defining rewards
and then training the agent through Reinforcement Learning
There are three steps to have this agent running.
Server for Torcs
Client for Torcs
An environment, built like Gym environments that gives the observations and rewards based on the agent st.
Server and Client
v-Torcs
This is an all in one package of TORCS.
The link below gives a complete overview of how this can be installed and set up on a linux machine.
https://github.com/giuse/vtorcs
This captures various sensor information that can be used to train the agent once we build the environment
SnakeOil
SnakeOil is a Python library for interfacing with a TORCS race car simulator
Its as simle as creating the client as shown and implementing the custom
drive function
Involves mechanics of driving the car & not its implementation
More about SnakeOil
These objects contain a member dictionary "d"
(for data dictionary) which contain key value pairs based on the
server's syntax.
We can read the following:
angle, curLapTime, damage, distFromStart, distRaced, focus,
fuel, gear, lastLapTime, opponents, racePos, rpm,
speedX, speedY, speedZ, track, trackPos, wheelSpinVel
We can set the following:
accel, brake, clutch, gear, steer, focus, meta
https://www.youtube.com/watch?v=Bg4t16TVXew#action=share
Defining Step() function for Gym Environment
The environment's step function returns exactly what we need. It returns four
values:
Observation.
Reward
Done
Info
We have written functions to map the the dictionary of values we get from the
client to the Gym environment
Design of the rewards
AI will try to accelerate the gas pedal very hard (to get maximum reward) and it
hits the edge and the episode terminated very quickly. Therefore, the neural network
stuck in a very poor local minimum.
we want to maximum longitudinal velocity , minimize transverse velocity, and we
also penalize the AI if it constantly drives very off center of the track.
We found the new reward function greatly improves the stability and the learning
time of TORCS.
Acceleration and brake
control with RL
Objective
Implement Deep Deterministic Policy Gradient algorithm (DDPG) to control
acceleration and braking in self driving cars
Choice of algorithm:
DQN solves problems with high-dimensional observation spaces, but it can only
handle discrete and low-dimensional action spaces.
DQN cannot be straightforwardly applied to continuous domains
An obvious approach to adapt DQN to continuous domains is to simply discretize the
action space but this result in dimensionality explosion.
Solution: Google Deepmind has innovated a new algorithm to tackle the continuous
action space problem by DDPG algorithm
DDPG algorithm:
Google Deepmind has developed a new algorithm to tackle the continuous action space problem by
combining 3 techniques together
1. Deterministic Policy-Gradient Algorithms
2. Actor-Critic Methods
3. Deep Q-Network
DDPG is a policy gradient algorithm that uses a stochastic behavior policy for good exploration but
estimates a deterministic target policy, which is much easier to learn.
Policy gradient algorithms utilize a form of policy iteration: they evaluate the policy, and then follow the
policy gradient to maximize performance.
Self driving reinforcement learning in Torcs
The code receives the sensor input in the form of array from gym_torcs environment
Input: network will take states of game ie. XSpeed, YSpeed, angle between car and
track, position of car and so on as explained earlier.
The sensor input will be fed into our Neural Network, and the network will output 3
real numbers (value of the steering, acceleration and brake)
The network will be trained many times, via the Deep Deterministic Policy Gradient,
to maximize the future expected reward.
Output: the action such as steer left or right, hit the gas pedal or hit the brake
Self driving in Torcs environment
Policy objective function:
reinforcement technique can be used to find π​θ​​(s,a)
total discount future reward
An intuitive policy objective function will be the expectation of the total discount
reward
where the expectations of the total reward R is calculated under some probability
distribution p(x∣θ)
Actor-Critic Algorithm
The Actor-Critic Algorithm is essentially
a hybrid method to combine the policy
gradient method and the value function
method together.
Actor : policy function
Critic: value function
Essentially, the actor produces the action
for given the current state of the
environment s, while the critic produces a
signal to criticizes the actions made by the
actor.
Actor Network
We used 2 hidden layers with 300 and 600
hidden units respectively.
The output consist of 3 continuous actions:
1. Steering, which is a single unit with tanh
activation function (where -1 means max
right turn and +1 means max left turn)
2. Acceleration, which is a single unit with
sigmoid activation function (where 0
means no gas, 1 means full gas)
3. Brake, another single unit with sigmoid
activation function (where 0 means no
brake, 1 bull brake)
Critic Network
Critic network takes both the states and
the action as inputs.
According to the DDPG paper, the
actions were not included until the 2nd
hidden layer of Q-network.
Target Network
1. Directly implementing Q-learning with neural networks is unstable in environments like
TORCS.
2. Deepmind team came up the solution to the problem is to use a target network, where we
created a copy of the actor and critic networks respectively, that are used for calculating
the target values. The weights of these target networks are then updated by having them
slowly track the learned networks:
1. where τ≪1, here 0.0001
2. This means that the target values are constrained to change slowly, greatly improving
Policy
Now we can use the inputs above to feed into the neural network.
for j in range(max_steps):
a_t = actor.model.predict(s_t.reshape(1, s_t.shape[0]))
ob, r_t, done, info = env.step(a_t[0])
Design of the rewards
AI will try to accelerate the gas pedal very hard (to get maximum reward) and it
hits the edge and the episode terminated very quickly. Therefore, the neural network
stuck in a very poor local minimum.
we want to maximum longitudinal velocity , minimize transverse velocity, and we
also penalize the AI if it constantly drives very off center of the track.
We found the new reward function greatly improves the stability and the learning
time of TORCS.
Design of the exploration algorithm
We have used ϵ greedy policy in our RL problem like pac man, atari breakout where
the agent to try a random action some percentage of the time.
Above approach does not work very well in TORCS because we have 3 actions
(steering,acceleration,brake). If we just randomly choose the action from uniform
random distribution we will generate some the combinations like value of the brake is
greater than the value of acceleration and the car simply not moving.
Therefore, we add the noise using Ornstein-Uhlenbeck process to do the exploration.
Ornstein-Uhlenbeck process is a stochastic process which has mean-reverting
properties.
Braking Mechanism
To train AI to learn how to brake is much harder than steering or acceleration,
because
1.Reward decreases
2.Exploration phase can give break and acceleration at same time
3.Chances of getting stuck in local minima is more.
stochastic brake: allows AI agent accelerate very fast in a straight line and brake
properly before the turn. I like this driving action as it is much more closer to human
Training
we first update the critic by minimizing the loss,
Then the actor policy is updated using the sampled policy gradient,
Then update the target network
Conclusion
Implemented obstacle detection using reinforcement learning on a pygame
environment.
We have implemented Deep deterministic policy gradient algorithm to train a car to
drive in torcs environment.
We used Ornstein-Uhlenbeck process to perform the exploration. This helped to
stabilize the policy in continuous domain like driving the vehicle.
Future Work:
Build all the modules to deploy RL model.
We have successfully build obstacle detection module, We are aiming to build
module to detect exact position of car on road, road edge detection to get angle
between car and road.
Questions?

More Related Content

What's hot

Reinforcement Learning Q-Learning
Reinforcement Learning   Q-Learning Reinforcement Learning   Q-Learning
Reinforcement Learning Q-Learning Melaku Eneayehu
 
Reinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners TutorialReinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners TutorialOmar Enayet
 
An introduction to reinforcement learning
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learningSubrat Panda, PhD
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningDongHyun Kwak
 
Reinforcement Learning 5. Monte Carlo Methods
Reinforcement Learning 5. Monte Carlo MethodsReinforcement Learning 5. Monte Carlo Methods
Reinforcement Learning 5. Monte Carlo MethodsSeung Jae Lee
 
Temporal difference learning
Temporal difference learningTemporal difference learning
Temporal difference learningJie-Han Chen
 
Artificial Intelligence: What Is Reinforcement Learning?
Artificial Intelligence: What Is Reinforcement Learning?Artificial Intelligence: What Is Reinforcement Learning?
Artificial Intelligence: What Is Reinforcement Learning?Bernard Marr
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningCloudxLab
 
Reinforcement learning
Reinforcement learning Reinforcement learning
Reinforcement learning Chandra Meena
 
Multi armed bandit
Multi armed banditMulti armed bandit
Multi armed banditJie-Han Chen
 
Deep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsDeep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsBill Liu
 
Reinforcement learning slides
Reinforcement learning slidesReinforcement learning slides
Reinforcement learning slidesOmranHakami
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement LearningUsman Qayyum
 
Reinforcement learning 7313
Reinforcement learning 7313Reinforcement learning 7313
Reinforcement learning 7313Slideshare
 
Continuous control with deep reinforcement learning (DDPG)
Continuous control with deep reinforcement learning (DDPG)Continuous control with deep reinforcement learning (DDPG)
Continuous control with deep reinforcement learning (DDPG)Taehoon Kim
 
DQN (Deep Q-Network)
DQN (Deep Q-Network)DQN (Deep Q-Network)
DQN (Deep Q-Network)Dong Guo
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learningDongHyun Kwak
 
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...Edureka!
 

What's hot (20)

Reinforcement Learning Q-Learning
Reinforcement Learning   Q-Learning Reinforcement Learning   Q-Learning
Reinforcement Learning Q-Learning
 
Reinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners TutorialReinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners Tutorial
 
An introduction to reinforcement learning
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learning
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Reinforcement Learning 5. Monte Carlo Methods
Reinforcement Learning 5. Monte Carlo MethodsReinforcement Learning 5. Monte Carlo Methods
Reinforcement Learning 5. Monte Carlo Methods
 
Temporal difference learning
Temporal difference learningTemporal difference learning
Temporal difference learning
 
Hill climbing algorithm
Hill climbing algorithmHill climbing algorithm
Hill climbing algorithm
 
Artificial Intelligence: What Is Reinforcement Learning?
Artificial Intelligence: What Is Reinforcement Learning?Artificial Intelligence: What Is Reinforcement Learning?
Artificial Intelligence: What Is Reinforcement Learning?
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Reinforcement learning
Reinforcement learning Reinforcement learning
Reinforcement learning
 
Multi armed bandit
Multi armed banditMulti armed bandit
Multi armed bandit
 
Deep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsDeep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its Applications
 
Reinforcement learning slides
Reinforcement learning slidesReinforcement learning slides
Reinforcement learning slides
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
 
Reinforcement learning 7313
Reinforcement learning 7313Reinforcement learning 7313
Reinforcement learning 7313
 
Continuous control with deep reinforcement learning (DDPG)
Continuous control with deep reinforcement learning (DDPG)Continuous control with deep reinforcement learning (DDPG)
Continuous control with deep reinforcement learning (DDPG)
 
Activation function
Activation functionActivation function
Activation function
 
DQN (Deep Q-Network)
DQN (Deep Q-Network)DQN (Deep Q-Network)
DQN (Deep Q-Network)
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
 

Similar to Reinforcement Learning for Self Driving Cars

IRJET- Self Driving RC Car using Behavioral Cloning
IRJET-  	  Self Driving RC Car using Behavioral CloningIRJET-  	  Self Driving RC Car using Behavioral Cloning
IRJET- Self Driving RC Car using Behavioral CloningIRJET Journal
 
IRJET-Survey on Simulation of Self-Driving Cars using Supervised and Reinforc...
IRJET-Survey on Simulation of Self-Driving Cars using Supervised and Reinforc...IRJET-Survey on Simulation of Self-Driving Cars using Supervised and Reinforc...
IRJET-Survey on Simulation of Self-Driving Cars using Supervised and Reinforc...IRJET Journal
 
deep-reinforcement-learning-framework.pdf
deep-reinforcement-learning-framework.pdfdeep-reinforcement-learning-framework.pdf
deep-reinforcement-learning-framework.pdfYugank Aman
 
Comparative Analysis of Tuning Hyperparameters in Policy-Based DRL Algorithm ...
Comparative Analysis of Tuning Hyperparameters in Policy-Based DRL Algorithm ...Comparative Analysis of Tuning Hyperparameters in Policy-Based DRL Algorithm ...
Comparative Analysis of Tuning Hyperparameters in Policy-Based DRL Algorithm ...IRJET Journal
 
Traffic Congestion using IOT
Traffic Congestion using IOTTraffic Congestion using IOT
Traffic Congestion using IOTSayantanGhosh58
 
IRJET - Steering Wheel Angle Prediction for Self-Driving Cars
IRJET - Steering Wheel Angle Prediction for Self-Driving CarsIRJET - Steering Wheel Angle Prediction for Self-Driving Cars
IRJET - Steering Wheel Angle Prediction for Self-Driving CarsIRJET Journal
 
Computational Inteligence in Racing Games
Computational Inteligence in Racing GamesComputational Inteligence in Racing Games
Computational Inteligence in Racing GamesArkadiusz Janicki
 
Smart anti-theft system for the security of vehicles- A Review
Smart anti-theft system for the security of vehicles- A ReviewSmart anti-theft system for the security of vehicles- A Review
Smart anti-theft system for the security of vehicles- A ReviewIRJET Journal
 
Online learning & adaptive game playing
Online learning & adaptive game playingOnline learning & adaptive game playing
Online learning & adaptive game playingSaeid Ghafouri
 
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...PATHALAMRAJESH
 
IRJET- Quality Inspection of Tire using Deep Learning based Computer Vision
IRJET-  	  Quality Inspection of Tire using Deep Learning based Computer VisionIRJET-  	  Quality Inspection of Tire using Deep Learning based Computer Vision
IRJET- Quality Inspection of Tire using Deep Learning based Computer VisionIRJET Journal
 
A machine learning model for average fuel consumption in heavy vehicles
A machine learning model for average fuel consumption in heavy vehiclesA machine learning model for average fuel consumption in heavy vehicles
A machine learning model for average fuel consumption in heavy vehiclesVenkat Projects
 
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place SolutionKaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place SolutionPreferred Networks
 
Sample of project synopsis (2)
Sample of project synopsis (2)Sample of project synopsis (2)
Sample of project synopsis (2)TH8B
 

Similar to Reinforcement Learning for Self Driving Cars (20)

Deep presentation.pptx
Deep presentation.pptxDeep presentation.pptx
Deep presentation.pptx
 
IRJET- Self Driving RC Car using Behavioral Cloning
IRJET-  	  Self Driving RC Car using Behavioral CloningIRJET-  	  Self Driving RC Car using Behavioral Cloning
IRJET- Self Driving RC Car using Behavioral Cloning
 
IRJET-Survey on Simulation of Self-Driving Cars using Supervised and Reinforc...
IRJET-Survey on Simulation of Self-Driving Cars using Supervised and Reinforc...IRJET-Survey on Simulation of Self-Driving Cars using Supervised and Reinforc...
IRJET-Survey on Simulation of Self-Driving Cars using Supervised and Reinforc...
 
deep-reinforcement-learning-framework.pdf
deep-reinforcement-learning-framework.pdfdeep-reinforcement-learning-framework.pdf
deep-reinforcement-learning-framework.pdf
 
Comparative Analysis of Tuning Hyperparameters in Policy-Based DRL Algorithm ...
Comparative Analysis of Tuning Hyperparameters in Policy-Based DRL Algorithm ...Comparative Analysis of Tuning Hyperparameters in Policy-Based DRL Algorithm ...
Comparative Analysis of Tuning Hyperparameters in Policy-Based DRL Algorithm ...
 
Traffic Congestion using IOT
Traffic Congestion using IOTTraffic Congestion using IOT
Traffic Congestion using IOT
 
Fianl_Paper
Fianl_PaperFianl_Paper
Fianl_Paper
 
IRJET - Steering Wheel Angle Prediction for Self-Driving Cars
IRJET - Steering Wheel Angle Prediction for Self-Driving CarsIRJET - Steering Wheel Angle Prediction for Self-Driving Cars
IRJET - Steering Wheel Angle Prediction for Self-Driving Cars
 
Computational Inteligence in Racing Games
Computational Inteligence in Racing GamesComputational Inteligence in Racing Games
Computational Inteligence in Racing Games
 
Smart anti-theft system for the security of vehicles- A Review
Smart anti-theft system for the security of vehicles- A ReviewSmart anti-theft system for the security of vehicles- A Review
Smart anti-theft system for the security of vehicles- A Review
 
C3 w1
C3 w1C3 w1
C3 w1
 
Human level control through deep rl
Human level control through deep rlHuman level control through deep rl
Human level control through deep rl
 
Online learning & adaptive game playing
Online learning & adaptive game playingOnline learning & adaptive game playing
Online learning & adaptive game playing
 
Reinforcement Learning - DQN
Reinforcement Learning - DQNReinforcement Learning - DQN
Reinforcement Learning - DQN
 
Smart parking
Smart parkingSmart parking
Smart parking
 
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...
 
IRJET- Quality Inspection of Tire using Deep Learning based Computer Vision
IRJET-  	  Quality Inspection of Tire using Deep Learning based Computer VisionIRJET-  	  Quality Inspection of Tire using Deep Learning based Computer Vision
IRJET- Quality Inspection of Tire using Deep Learning based Computer Vision
 
A machine learning model for average fuel consumption in heavy vehicles
A machine learning model for average fuel consumption in heavy vehiclesA machine learning model for average fuel consumption in heavy vehicles
A machine learning model for average fuel consumption in heavy vehicles
 
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place SolutionKaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
 
Sample of project synopsis (2)
Sample of project synopsis (2)Sample of project synopsis (2)
Sample of project synopsis (2)
 

Recently uploaded

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 

Recently uploaded (20)

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 

Reinforcement Learning for Self Driving Cars

  • 1. Rl for self-driving cars Under the guidance of Prof. Srikanth Krishnamurthy Sneha Ravikumar Dhanshri More Shweta Srinivasan
  • 2. Components of Self-Driving Cars ● Obstacle detection using sensors to detect presence and notify game state ● Implemented with a pygame environment ● The second component is implementation of Deep Deterministic Policy Gradient algorithm (DDPG) to control acceleration and braking in self-driving cars. ● Implemented using Gym Torcs that was custom built
  • 3. Obstacle detection ● The goal here is to identify the presence of obstacles in the path of the car using Reinforcement Learning trained on a neural network. ● This is enabled with the help of sensors which track the obstacles and enforce a reward based system where the car learns to maneuver around such obstacles and change course of path if obstacles are encountered. ● Obstacles with car represented using a Pygame environment ● Pymunk is the physics engine used by the simulation. Pymunk along with Pygame render the environment for the game to run with obstacles and cars to detect obstacles.
  • 4. Installation Dependencies ● Python3 ● Keras with Theano backend ● Numpy, h5py ● Pygame ● Pymunk4
  • 5. Sensors ● Sensors used in this game is a set of sonar readings which returns distance for each sonar we are simulating. ● There are 3 sonars in this environment, one at the center, one on either side of the center at an angle of 45 degrees. ● Instead of a grid of boolean sensors, sonar reading returns “N” distance readings, one for each sonar we are simulating ● Distance is the count of first non-zero readings starting at the object ● In simple words, the sensor input is a reading of three distances from the car to any object it detects. ● At any given point in time, distance of 1 indicates obstacle in vicinity of car. ● These sensors get updated and distance computed for every frame with the current state of the car.
  • 6. Key components ● Game environment to render the screen and car with the obstacles. This also manages the speed, direction and control of the car. ● Learning component of the game where the heart of the Q Learning process resides. ● Neural Network model to train input from the sensors and output the action for the game state.
  • 7. Game Environment ● The car automatically moves itself forward, faster as the game progresses. If it runs into a wall or an obstacle, the game ends. ● There are three available actions at each frame: turn left(0), turn right(1), do nothing(2). ● At every frame, the game returns both a state and a reward. ● The state is a one dimensional array of sensor values, which can be 0, 1 or 2, as stated above. ● The reward is -500 if it runs into something and average of sensor readings if it doesn’t. Lower the sum of the sensors, farther away it is from running into an obstacle.
  • 8. Neural Network ● Input data for the neural network is the input from the three sensors. ● An input layer of 3 units (because there are 3 different sensors). ● 2 hidden layers of 164 and 150 units, each hidden layer is followed by a dropout layer with dropout rate 0.2 to avoid overfitting ● Output layer of 3, one for each of our possible actions (left, right, do nothing) [in that order].
  • 9. Reasons for choosing this architecture Google’s DeepMind for Atari explains in detail the advantages of this architecture as opposed to a traditional architecture for Q learning problems. Traditional approach: Input is the state and action and the output is the value of that single state action pair. DeepMind approach: Input is the state and the output is separate Q values for each possible action in its output layer. In Q learning we need to get the maxQ(S’, A’), max of Q values for every action in the new state. Instead of running the network forward for every action, just run it forward once instead.
  • 10. Implementation ● Move the car forward one frame once the game starts. ● Get the sensor readings ● Based on the readings, predict Q values which show car’s confidence to take each of the three actions. ● Epsilon Greedy methodology to explore for 10% ● Execute action and get another sensor reading and reward ● Store the original reading, action we took, reward and new reading in a buffer ● Randomly sample the above buffer to generate the training data to be fed to the neural network ● Set y value for the iteration to a prediction based on original reading ● Make a new prediction based on new reading ● Observe reward, -500 indicates crash, set y and action as -500. If not, multiply predicted Q value and gamma and set
  • 16. Software, Framework, Environment 1.Python 2.7 2.Keras 1.1.0 3.Tensorflow r0.10 4.xautomation (http://linux.die.net/man/7/xautomation) 5.OpenAI-Gym (https://github.com/openai/gym) 6.numpy 7.vtorcs-RL-color 8.plib 1.8.5, FreeGLUT
  • 17. TORCS TORCS, The Open Racing Car Simulator is a highly portable multi platform car racing simulation. It is used as ordinary car racing game, as AI racing game and as research platform. It runs on Linux (x86, AMD64 and PPC), FreeBSD, OpenSolaris, MacOSX and Windows. Why TORCS? You can visualize how the neural network learns over time and inspect its learning process, rather than just looking at the final result TORCS can help us simulate and understand machine learning technique in automated driving, which is important for self-driving car technologies
  • 18. Gym Torcs OpenAI Gym is a toolkit for building reinforcement learning (RL) algorithm Gym doesn’t have the environment set for Torcs. So the process starts from building the environment, defining rewards and then training the agent through Reinforcement Learning There are three steps to have this agent running. Server for Torcs Client for Torcs An environment, built like Gym environments that gives the observations and rewards based on the agent st.
  • 19. Server and Client v-Torcs This is an all in one package of TORCS. The link below gives a complete overview of how this can be installed and set up on a linux machine. https://github.com/giuse/vtorcs This captures various sensor information that can be used to train the agent once we build the environment SnakeOil SnakeOil is a Python library for interfacing with a TORCS race car simulator Its as simle as creating the client as shown and implementing the custom drive function Involves mechanics of driving the car & not its implementation
  • 20. More about SnakeOil These objects contain a member dictionary "d" (for data dictionary) which contain key value pairs based on the server's syntax. We can read the following: angle, curLapTime, damage, distFromStart, distRaced, focus, fuel, gear, lastLapTime, opponents, racePos, rpm, speedX, speedY, speedZ, track, trackPos, wheelSpinVel We can set the following: accel, brake, clutch, gear, steer, focus, meta https://www.youtube.com/watch?v=Bg4t16TVXew#action=share
  • 21. Defining Step() function for Gym Environment The environment's step function returns exactly what we need. It returns four values: Observation. Reward Done Info We have written functions to map the the dictionary of values we get from the client to the Gym environment
  • 22. Design of the rewards AI will try to accelerate the gas pedal very hard (to get maximum reward) and it hits the edge and the episode terminated very quickly. Therefore, the neural network stuck in a very poor local minimum. we want to maximum longitudinal velocity , minimize transverse velocity, and we also penalize the AI if it constantly drives very off center of the track. We found the new reward function greatly improves the stability and the learning time of TORCS.
  • 24. Objective Implement Deep Deterministic Policy Gradient algorithm (DDPG) to control acceleration and braking in self driving cars
  • 25. Choice of algorithm: DQN solves problems with high-dimensional observation spaces, but it can only handle discrete and low-dimensional action spaces. DQN cannot be straightforwardly applied to continuous domains An obvious approach to adapt DQN to continuous domains is to simply discretize the action space but this result in dimensionality explosion. Solution: Google Deepmind has innovated a new algorithm to tackle the continuous action space problem by DDPG algorithm
  • 26. DDPG algorithm: Google Deepmind has developed a new algorithm to tackle the continuous action space problem by combining 3 techniques together 1. Deterministic Policy-Gradient Algorithms 2. Actor-Critic Methods 3. Deep Q-Network DDPG is a policy gradient algorithm that uses a stochastic behavior policy for good exploration but estimates a deterministic target policy, which is much easier to learn. Policy gradient algorithms utilize a form of policy iteration: they evaluate the policy, and then follow the policy gradient to maximize performance.
  • 27. Self driving reinforcement learning in Torcs The code receives the sensor input in the form of array from gym_torcs environment Input: network will take states of game ie. XSpeed, YSpeed, angle between car and track, position of car and so on as explained earlier. The sensor input will be fed into our Neural Network, and the network will output 3 real numbers (value of the steering, acceleration and brake) The network will be trained many times, via the Deep Deterministic Policy Gradient, to maximize the future expected reward. Output: the action such as steer left or right, hit the gas pedal or hit the brake Self driving in Torcs environment
  • 28. Policy objective function: reinforcement technique can be used to find π​θ​​(s,a) total discount future reward An intuitive policy objective function will be the expectation of the total discount reward where the expectations of the total reward R is calculated under some probability distribution p(x∣θ)
  • 29. Actor-Critic Algorithm The Actor-Critic Algorithm is essentially a hybrid method to combine the policy gradient method and the value function method together. Actor : policy function Critic: value function Essentially, the actor produces the action for given the current state of the environment s, while the critic produces a signal to criticizes the actions made by the actor.
  • 30. Actor Network We used 2 hidden layers with 300 and 600 hidden units respectively. The output consist of 3 continuous actions: 1. Steering, which is a single unit with tanh activation function (where -1 means max right turn and +1 means max left turn) 2. Acceleration, which is a single unit with sigmoid activation function (where 0 means no gas, 1 means full gas) 3. Brake, another single unit with sigmoid activation function (where 0 means no brake, 1 bull brake)
  • 31. Critic Network Critic network takes both the states and the action as inputs. According to the DDPG paper, the actions were not included until the 2nd hidden layer of Q-network.
  • 32. Target Network 1. Directly implementing Q-learning with neural networks is unstable in environments like TORCS. 2. Deepmind team came up the solution to the problem is to use a target network, where we created a copy of the actor and critic networks respectively, that are used for calculating the target values. The weights of these target networks are then updated by having them slowly track the learned networks: 1. where τ≪1, here 0.0001 2. This means that the target values are constrained to change slowly, greatly improving
  • 33. Policy Now we can use the inputs above to feed into the neural network. for j in range(max_steps): a_t = actor.model.predict(s_t.reshape(1, s_t.shape[0])) ob, r_t, done, info = env.step(a_t[0])
  • 34. Design of the rewards AI will try to accelerate the gas pedal very hard (to get maximum reward) and it hits the edge and the episode terminated very quickly. Therefore, the neural network stuck in a very poor local minimum. we want to maximum longitudinal velocity , minimize transverse velocity, and we also penalize the AI if it constantly drives very off center of the track. We found the new reward function greatly improves the stability and the learning time of TORCS.
  • 35. Design of the exploration algorithm We have used ϵ greedy policy in our RL problem like pac man, atari breakout where the agent to try a random action some percentage of the time. Above approach does not work very well in TORCS because we have 3 actions (steering,acceleration,brake). If we just randomly choose the action from uniform random distribution we will generate some the combinations like value of the brake is greater than the value of acceleration and the car simply not moving. Therefore, we add the noise using Ornstein-Uhlenbeck process to do the exploration. Ornstein-Uhlenbeck process is a stochastic process which has mean-reverting properties.
  • 36. Braking Mechanism To train AI to learn how to brake is much harder than steering or acceleration, because 1.Reward decreases 2.Exploration phase can give break and acceleration at same time 3.Chances of getting stuck in local minima is more. stochastic brake: allows AI agent accelerate very fast in a straight line and brake properly before the turn. I like this driving action as it is much more closer to human
  • 37. Training we first update the critic by minimizing the loss, Then the actor policy is updated using the sampled policy gradient, Then update the target network
  • 38. Conclusion Implemented obstacle detection using reinforcement learning on a pygame environment. We have implemented Deep deterministic policy gradient algorithm to train a car to drive in torcs environment. We used Ornstein-Uhlenbeck process to perform the exploration. This helped to stabilize the policy in continuous domain like driving the vehicle.
  • 39. Future Work: Build all the modules to deploy RL model. We have successfully build obstacle detection module, We are aiming to build module to detect exact position of car on road, road edge detection to get angle between car and road.

Editor's Notes

  1. Readings returns the current state of the car.
  2. if steering wheel angle from -90 to +90 degrees, we can discretize to 5 degrees each and acceleration from 0km to 200km in 5km each, your output combinations will be 36 steering states times 40 velocity states which equals to 1440 possible combinations.
  3. DQN is able to learn value functions using such function approximators in a stable and robust way due to two innovations: 1. the network is trained off-policy with samples from a replay buffer to minimize correlations between samples; 2. the network is trained with a target Q network to give consistent targets during temporal difference backups.