SlideShare a Scribd company logo
1 of 36
Deep parking:
an implementation of automatic parking
with deep reinforcement learning
Shintaro Shiba, Feb.2016-Dec.2016
Engineer Internship at Preferred Networks
Mentor: Abe-san, Fujita-san
1
About me
Shintaro Shiba
• Graduate student at the University of
Tokyo
– Major in neuroscience and animal behavior
• Part-time engineer (internship) at
Preferred Networks, Inc.
– Blog post URL:
https://research.preferred.jp/2017/03/deep-
parking/
2
Contents
• Original Idea
• Background: DQN and Double-DQN
• Task definition
– Environment: car simulator
– Agents
1. Coordinate
2. Bird‘s-eye view
3. Subjective view
• Discussion
• Summary
3
Achievement
Trajectory of the car agent Subjective view (Input for DQN)
0 deg
-120 deg
+120 deg
4
Original Idea: DQN for parking
https://research.preferred.jp/2016/01/ces2016/
https://research.preferred.jp/2015/06/distributed-deep-reinforcement-learning/
Succeeded in driving smoothly with DQN
Input: 32 virtual sensors, 3 previous actions + Current speed and steering
Output: 9 actions
Is it possible to learn for car agent to park itself,
with inputs of images from camera?
5
Reinforcement learning
Environment
Agent
action
state
reward
Learning algorithm
6
DQN: Deep-Q Network
Volodymyr Mnih et al. 2015
each episode >>
each action >>
update Q function >>
7
Double DQN
Preventing overestimation of Q values
Hado van Hasselt et al. 2015
8
Reinforcement learning in this project
Environment
Car simulator
Agent
Different sensor +
different neural network
action
state = sensor input
reward
9
Environment:
Car simulator
Forces of …
• Traction
• Air resistance
• Rolling resistance
• Centrifugal force
• Brake
• Cornering force
F = Ftraction + Faero + Frr + Fc + Fbrake + Fcf
10
Common specifications:
state, action, reward
Input (States)
– Features specific to each agent + car speed, car steering
Output (Actions)
– 9: accelerate, decelerate, steer right, steer left, throw (do
nothing), accelerate + steer right, accelerate + steer left,
decelerate + steer right, decelerate + steer left
Reward
– +1 when the car is in the goal
– -1 when the car is out of the field
– 0.01 - 0.01 * distance_to_goal otherwise (changed afterward)
Goal
– Car inside the goal region, no other conditions like car direction
Terminate
– Time up: 500 times of actions (changed to 450 afterward)
– Field out: Out of the field
11
Common specifications:
hyperparameters
Maximum episode: 50,000
Gamma: 0.97
Optimizer: RMSpropGraves
– lr=0.00015, alpha=0.95, momentum=0.95,
eps=0.01
– changed afterward: lr=0.00015, alpha=0.95,
momentum=0, eps=0.01
Batchsize: 50 or 64
Epsilon: 0.1 at last
– linearly decreased from 1.0 at first
12
Agents
1. Coordinate
2. Bird’s-eye view
3. Subjective view
– Three cameras
– Four cameras
13
Coordinate agent
Input features
– Relative coordinate value from the car to the
goal
(80, 300)
goal
car
14
input shape: (2, )
normalized
Coordinate agent
Neural Network
– only full-connected layers (3)
n of actions (9)
n of car
parameters (2)
coordinates (2)
64 64
15
Coordinate agent
Result
16
Bird’s-eye view agent
Input features
– Bird’s-eye image of the whole field
input size: 80 x 80
normalized
17
Bird’s-eye view agent
Neural Network
80
80
128
192
n of actions
n of car
parameters (2)
64
400
18
Conv
Bird’s-eye view agent
Neural Network
80
80
128
192
n of actions
n of car
parameters (2)
64
400
19
Conv
Bird’s-eye view agent
Result: 18k episodes
20
Bird’s-eye view agent
Result: after 18k episodes ?
But we had already spent about 6 month for this agent so moved to the next…21
Subjective view agent
Input features
– N_of_camera images of subjective view from
the car
– Number of cameras…Three or Four
– FoV = 120 deg
camera
ex. Input images for four camera agent
front
+0
back
+180
right
+90
left
+270
22
Subjective view agent
Neural Network
Conv
80
80
200 x 3
400
256
n of actions
n of car
parameters (2)
64 23
Subjective view agent
Neural Network
Conv
80
80
200 x 3
400
256
n of actions
n of car
parameters (2)
64 24
Subjective view agent
Problem
– Calculation time (GeForce GTX TITAN X)
• At first… 3 [min/ep] x 50k [ep] = 100 days
• Reviewed by Abe-san… 1.6 [min/ep] x 50k [ep] = 55
days
– Because of copy and synchronization between GPU and
CPU
– Learning interrupted as soon as divergence of DNN output
– (Fortunately) agent “learned” goal by ~10k episodes in
some trials
– Memory usage
• In DQN, we need to store 1M previous input data
– 1M x (80 x 80 x 3 ch x 4 cameras)
• Save images to disk and access every time
25
Subjective view agent
Result: three cameras, 6k episodes
0 deg
-120 deg
+120 deg
Trajectory of the car agent Subjective view (Input for DQN)
26
Subjective view agent
Result: three cameras, 50k episodes
The policy “move anyways” ?
>> Reward setting
Seems not able to goal every time
Only “easy” goal to achieve
>> Variable task difficulty (curriculum
Frequent goals here
27
Subjective view agent
Four camera at 30k ep.
28
Modify reward
Previous
– +1 when the car is in the goal
– -1 when the car is out of the field
– 0.01 - 0.01 * distance_to_goal otherwise
New
– +1 - speed when the car is in the goal
• in order to stop the car
– -1 when the car is out of the field
– -0.005
29
Modify difficulty
Difficulty: Initial car direction & position
– Constraint
• Car always starts near the middle of the field
• Car always starts with face toward center:
– Curriculum
• Car direction:
– where n = currriculum
• Criteria:
– 0.6 of mean reward over 100 episodes
±
p
12
n
±
p
4
Goal
n = 1
n = 2
30
Subjective view agent:
modifications
N cameras Reward Difficulty Learning result
3 Default Default about 6k: o
50k: x
3 modified Default about 16k: o
3 modified Constraint ? (still learning)
3 modified Curriculum o
(though curriculum 1
yet)
4 Default Default x
4 modified Curriculum △ (not bad, but not
successful yet at 6k)
31
Subjective view agent:
modifications
Curriculum + Three cameras
@curriculum 1. Criteria needs to be modified
reward mean reward sum
1.0
0.0
500
0
n episode
0 10k 20k
n episode
0 10k 20k
32
Discussion
1. Initial settings included the situation
where car cannot reach the goal
– e.g. Start towards the edge of the field
– This made learning unstable
2. Why successful for coordinate agent?
– In spite there could be such situations?
33
Discussion
3. Comparison with three and four cameras
– Considering success rate and execution time,
three camera is better
– Why not successful in four cameras?
– Need several trials?
4. DQN often diverged
– every three times in personal feeling
• four cameras is slightly more oftern
– Importance of dataset for learning
• memory size, batch size
34
Discussion
5. Curriculum
– Ideally better to quantify “difficulty of the task”
• In this case, maybe it is roughly represented as
“bias of distribution” of the selected actions?
accelerate
decelerate
throw (do nothing)
steer right
steer left
accelerate + steer right
accelerate + steer left
decelerate + steer right
decelerate + steer left
same times for each actions >> go straight
biased distribution of selected actions >> go right/lef
35
Summary
• Car agent can park itself with subjective view
of cameras, though not always stable
learning
• Trade-off between reward design and
learning difficulty
– Simple reward: difficult to learn
• Try other algorithms like A3C
– Complex reward: difficult to set
• Other setting for distance_to_goal
36

More Related Content

What's hot

Image Classification with Deep Learning | DevFest + GDay, George Town, Mala...
Image Classification with Deep Learning  |  DevFest + GDay, George Town, Mala...Image Classification with Deep Learning  |  DevFest + GDay, George Town, Mala...
Image Classification with Deep Learning | DevFest + GDay, George Town, Mala...
Virot "Ta" Chiraphadhanakul
 
07_PhysX 강체물리 입문
07_PhysX 강체물리 입문07_PhysX 강체물리 입문
07_PhysX 강체물리 입문
noerror
 

What's hot (20)

Motion blur
Motion blurMotion blur
Motion blur
 
Deep reinforcement learning from scratch
Deep reinforcement learning from scratchDeep reinforcement learning from scratch
Deep reinforcement learning from scratch
 
밑바닥부터시작하는360뷰어
밑바닥부터시작하는360뷰어밑바닥부터시작하는360뷰어
밑바닥부터시작하는360뷰어
 
Hierarchical Reinforcement Learning
Hierarchical Reinforcement LearningHierarchical Reinforcement Learning
Hierarchical Reinforcement Learning
 
Refelction의 개념과 RTTR 라이브러리
Refelction의 개념과 RTTR 라이브러리Refelction의 개념과 RTTR 라이브러리
Refelction의 개념과 RTTR 라이브러리
 
Reinforcement Learning 1. Introduction
Reinforcement Learning 1. IntroductionReinforcement Learning 1. Introduction
Reinforcement Learning 1. Introduction
 
쉐도우맵을 압축하여 대규모씬에 라이팅을 적용해보자
쉐도우맵을 압축하여 대규모씬에 라이팅을 적용해보자쉐도우맵을 압축하여 대규모씬에 라이팅을 적용해보자
쉐도우맵을 압축하여 대규모씬에 라이팅을 적용해보자
 
Deep Multi-agent Reinforcement Learning
Deep Multi-agent Reinforcement LearningDeep Multi-agent Reinforcement Learning
Deep Multi-agent Reinforcement Learning
 
DQN (Deep Q-Network)
DQN (Deep Q-Network)DQN (Deep Q-Network)
DQN (Deep Q-Network)
 
Compute shader DX11
Compute shader DX11Compute shader DX11
Compute shader DX11
 
Combinatorial optimization and deep reinforcement learning
Combinatorial optimization and deep reinforcement learningCombinatorial optimization and deep reinforcement learning
Combinatorial optimization and deep reinforcement learning
 
Image Classification with Deep Learning | DevFest + GDay, George Town, Mala...
Image Classification with Deep Learning  |  DevFest + GDay, George Town, Mala...Image Classification with Deep Learning  |  DevFest + GDay, George Town, Mala...
Image Classification with Deep Learning | DevFest + GDay, George Town, Mala...
 
TensorRT survey
TensorRT surveyTensorRT survey
TensorRT survey
 
Cascade Shadow Mapping
Cascade Shadow MappingCascade Shadow Mapping
Cascade Shadow Mapping
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement Learning
 
Soft Actor-Critic Algorithms and Applications 한국어 리뷰
Soft Actor-Critic Algorithms and Applications 한국어 리뷰Soft Actor-Critic Algorithms and Applications 한국어 리뷰
Soft Actor-Critic Algorithms and Applications 한국어 리뷰
 
Tips and experience_of_dx12_engine_development._ver_1.2
Tips and experience_of_dx12_engine_development._ver_1.2Tips and experience_of_dx12_engine_development._ver_1.2
Tips and experience_of_dx12_engine_development._ver_1.2
 
Normalization 방법
Normalization 방법 Normalization 방법
Normalization 방법
 
07_PhysX 강체물리 입문
07_PhysX 강체물리 입문07_PhysX 강체물리 입문
07_PhysX 강체물리 입문
 
RLCode와 A3C 쉽고 깊게 이해하기
RLCode와 A3C 쉽고 깊게 이해하기RLCode와 A3C 쉽고 깊게 이해하기
RLCode와 A3C 쉽고 깊게 이해하기
 

Viewers also liked

Viewers also liked (20)

PFN Spring Internship Final Report: Autonomous Drive by Deep RL
PFN Spring Internship Final Report: Autonomous Drive by Deep RLPFN Spring Internship Final Report: Autonomous Drive by Deep RL
PFN Spring Internship Final Report: Autonomous Drive by Deep RL
 
NVIDIA 更新情報: Tesla P100 PCIe/cuDNN 5.1
NVIDIA 更新情報: Tesla P100 PCIe/cuDNN 5.1NVIDIA 更新情報: Tesla P100 PCIe/cuDNN 5.1
NVIDIA 更新情報: Tesla P100 PCIe/cuDNN 5.1
 
Chainer Update v1.8.0 -> v1.10.0+
Chainer Update v1.8.0 -> v1.10.0+Chainer Update v1.8.0 -> v1.10.0+
Chainer Update v1.8.0 -> v1.10.0+
 
Chainerを使って細胞を数えてみた
Chainerを使って細胞を数えてみたChainerを使って細胞を数えてみた
Chainerを使って細胞を数えてみた
 
On the benchmark of Chainer
On the benchmark of ChainerOn the benchmark of Chainer
On the benchmark of Chainer
 
深層学習ライブラリの環境問題Chainer Meetup2016 07-02
深層学習ライブラリの環境問題Chainer Meetup2016 07-02深層学習ライブラリの環境問題Chainer Meetup2016 07-02
深層学習ライブラリの環境問題Chainer Meetup2016 07-02
 
ヤフー音声認識サービスでのディープラーニングとGPU利用事例
ヤフー音声認識サービスでのディープラーニングとGPU利用事例ヤフー音声認識サービスでのディープラーニングとGPU利用事例
ヤフー音声認識サービスでのディープラーニングとGPU利用事例
 
俺のtensorが全然flowしないのでみんなchainer使おう by DEEPstation
俺のtensorが全然flowしないのでみんなchainer使おう by DEEPstation俺のtensorが全然flowしないのでみんなchainer使おう by DEEPstation
俺のtensorが全然flowしないのでみんなchainer使おう by DEEPstation
 
Chainer, Cupy入門
Chainer, Cupy入門Chainer, Cupy入門
Chainer, Cupy入門
 
マシンパーセプション研究におけるChainer活用事例
マシンパーセプション研究におけるChainer活用事例マシンパーセプション研究におけるChainer活用事例
マシンパーセプション研究におけるChainer活用事例
 
Chainer入門と最近の機能
Chainer入門と最近の機能Chainer入門と最近の機能
Chainer入門と最近の機能
 
Chainer meetup lt
Chainer meetup ltChainer meetup lt
Chainer meetup lt
 
CuPy解説
CuPy解説CuPy解説
CuPy解説
 
Chainer Development Plan 2015/12
Chainer Development Plan 2015/12Chainer Development Plan 2015/12
Chainer Development Plan 2015/12
 
Capitalicoでのchainer 1.1 → 1.5 バージョンアップ事例
Capitalicoでのchainer 1.1 → 1.5 バージョンアップ事例Capitalicoでのchainer 1.1 → 1.5 バージョンアップ事例
Capitalicoでのchainer 1.1 → 1.5 バージョンアップ事例
 
深層学習ライブラリのプログラミングモデル
深層学習ライブラリのプログラミングモデル深層学習ライブラリのプログラミングモデル
深層学習ライブラリのプログラミングモデル
 
Lighting talk chainer hands on
Lighting talk chainer hands onLighting talk chainer hands on
Lighting talk chainer hands on
 
ディープラーニングにおける学習の高速化の重要性とその手法
ディープラーニングにおける学習の高速化の重要性とその手法ディープラーニングにおける学習の高速化の重要性とその手法
ディープラーニングにおける学習の高速化の重要性とその手法
 
ボケるRNNを学習したい (Chainer meetup 01)
ボケるRNNを学習したい (Chainer meetup 01)ボケるRNNを学習したい (Chainer meetup 01)
ボケるRNNを学習したい (Chainer meetup 01)
 
Chainer Contribution Guide
Chainer Contribution GuideChainer Contribution Guide
Chainer Contribution Guide
 

Similar to Deep parking

Predicting Optimal Parallelism for Data Analytics
Predicting Optimal Parallelism for Data AnalyticsPredicting Optimal Parallelism for Data Analytics
Predicting Optimal Parallelism for Data Analytics
Databricks
 
Julie Michelman - Pandas, Pipelines, and Custom Transformers
Julie Michelman - Pandas, Pipelines, and Custom TransformersJulie Michelman - Pandas, Pipelines, and Custom Transformers
Julie Michelman - Pandas, Pipelines, and Custom Transformers
PyData
 
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdf
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdfAuto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdf
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdf
Kundjanasith Thonglek
 

Similar to Deep parking (20)

Reinforcement Learning and Artificial Neural Nets
Reinforcement Learning and Artificial Neural NetsReinforcement Learning and Artificial Neural Nets
Reinforcement Learning and Artificial Neural Nets
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Parking space detect
Parking space detectParking space detect
Parking space detect
 
TensorFlow and Deep Learning Tips and Tricks
TensorFlow and Deep Learning Tips and TricksTensorFlow and Deep Learning Tips and Tricks
TensorFlow and Deep Learning Tips and Tricks
 
Scaling out logistic regression with Spark
Scaling out logistic regression with SparkScaling out logistic regression with Spark
Scaling out logistic regression with Spark
 
181123 asynchronous method for deep reinforcement learning seunghyeok back
181123 asynchronous method for deep reinforcement learning seunghyeok back181123 asynchronous method for deep reinforcement learning seunghyeok back
181123 asynchronous method for deep reinforcement learning seunghyeok back
 
Predicting Optimal Parallelism for Data Analytics
Predicting Optimal Parallelism for Data AnalyticsPredicting Optimal Parallelism for Data Analytics
Predicting Optimal Parallelism for Data Analytics
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learning
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
1571 mean
1571 mean1571 mean
1571 mean
 
Cutting Edge Computer Vision for Everyone
Cutting Edge Computer Vision for EveryoneCutting Edge Computer Vision for Everyone
Cutting Edge Computer Vision for Everyone
 
Kaggle review Planet: Understanding the Amazon from Space
Kaggle reviewPlanet: Understanding the Amazon from SpaceKaggle reviewPlanet: Understanding the Amazon from Space
Kaggle review Planet: Understanding the Amazon from Space
 
Salt Identification Challenge
Salt Identification ChallengeSalt Identification Challenge
Salt Identification Challenge
 
Julie Michelman - Pandas, Pipelines, and Custom Transformers
Julie Michelman - Pandas, Pipelines, and Custom TransformersJulie Michelman - Pandas, Pipelines, and Custom Transformers
Julie Michelman - Pandas, Pipelines, and Custom Transformers
 
Face Recognition: From Scratch To Hatch
Face Recognition: From Scratch To HatchFace Recognition: From Scratch To Hatch
Face Recognition: From Scratch To Hatch
 
Face Recognition: From Scratch To Hatch / Эдуард Тянтов (Mail.ru Group)
Face Recognition: From Scratch To Hatch / Эдуард Тянтов (Mail.ru Group)Face Recognition: From Scratch To Hatch / Эдуард Тянтов (Mail.ru Group)
Face Recognition: From Scratch To Hatch / Эдуард Тянтов (Mail.ru Group)
 
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdf
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdfAuto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdf
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdf
 
Reinforcement Learning for Marlo
Reinforcement Learning for MarloReinforcement Learning for Marlo
Reinforcement Learning for Marlo
 
Using The New Flash Stage3D Web Technology To Build Your Own Next 3D Browser ...
Using The New Flash Stage3D Web Technology To Build Your Own Next 3D Browser ...Using The New Flash Stage3D Web Technology To Build Your Own Next 3D Browser ...
Using The New Flash Stage3D Web Technology To Build Your Own Next 3D Browser ...
 
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflowNVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
 

Recently uploaded

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Recently uploaded (20)

Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

Deep parking

  • 1. Deep parking: an implementation of automatic parking with deep reinforcement learning Shintaro Shiba, Feb.2016-Dec.2016 Engineer Internship at Preferred Networks Mentor: Abe-san, Fujita-san 1
  • 2. About me Shintaro Shiba • Graduate student at the University of Tokyo – Major in neuroscience and animal behavior • Part-time engineer (internship) at Preferred Networks, Inc. – Blog post URL: https://research.preferred.jp/2017/03/deep- parking/ 2
  • 3. Contents • Original Idea • Background: DQN and Double-DQN • Task definition – Environment: car simulator – Agents 1. Coordinate 2. Bird‘s-eye view 3. Subjective view • Discussion • Summary 3
  • 4. Achievement Trajectory of the car agent Subjective view (Input for DQN) 0 deg -120 deg +120 deg 4
  • 5. Original Idea: DQN for parking https://research.preferred.jp/2016/01/ces2016/ https://research.preferred.jp/2015/06/distributed-deep-reinforcement-learning/ Succeeded in driving smoothly with DQN Input: 32 virtual sensors, 3 previous actions + Current speed and steering Output: 9 actions Is it possible to learn for car agent to park itself, with inputs of images from camera? 5
  • 7. DQN: Deep-Q Network Volodymyr Mnih et al. 2015 each episode >> each action >> update Q function >> 7
  • 8. Double DQN Preventing overestimation of Q values Hado van Hasselt et al. 2015 8
  • 9. Reinforcement learning in this project Environment Car simulator Agent Different sensor + different neural network action state = sensor input reward 9
  • 10. Environment: Car simulator Forces of … • Traction • Air resistance • Rolling resistance • Centrifugal force • Brake • Cornering force F = Ftraction + Faero + Frr + Fc + Fbrake + Fcf 10
  • 11. Common specifications: state, action, reward Input (States) – Features specific to each agent + car speed, car steering Output (Actions) – 9: accelerate, decelerate, steer right, steer left, throw (do nothing), accelerate + steer right, accelerate + steer left, decelerate + steer right, decelerate + steer left Reward – +1 when the car is in the goal – -1 when the car is out of the field – 0.01 - 0.01 * distance_to_goal otherwise (changed afterward) Goal – Car inside the goal region, no other conditions like car direction Terminate – Time up: 500 times of actions (changed to 450 afterward) – Field out: Out of the field 11
  • 12. Common specifications: hyperparameters Maximum episode: 50,000 Gamma: 0.97 Optimizer: RMSpropGraves – lr=0.00015, alpha=0.95, momentum=0.95, eps=0.01 – changed afterward: lr=0.00015, alpha=0.95, momentum=0, eps=0.01 Batchsize: 50 or 64 Epsilon: 0.1 at last – linearly decreased from 1.0 at first 12
  • 13. Agents 1. Coordinate 2. Bird’s-eye view 3. Subjective view – Three cameras – Four cameras 13
  • 14. Coordinate agent Input features – Relative coordinate value from the car to the goal (80, 300) goal car 14 input shape: (2, ) normalized
  • 15. Coordinate agent Neural Network – only full-connected layers (3) n of actions (9) n of car parameters (2) coordinates (2) 64 64 15
  • 17. Bird’s-eye view agent Input features – Bird’s-eye image of the whole field input size: 80 x 80 normalized 17
  • 18. Bird’s-eye view agent Neural Network 80 80 128 192 n of actions n of car parameters (2) 64 400 18 Conv
  • 19. Bird’s-eye view agent Neural Network 80 80 128 192 n of actions n of car parameters (2) 64 400 19 Conv
  • 21. Bird’s-eye view agent Result: after 18k episodes ? But we had already spent about 6 month for this agent so moved to the next…21
  • 22. Subjective view agent Input features – N_of_camera images of subjective view from the car – Number of cameras…Three or Four – FoV = 120 deg camera ex. Input images for four camera agent front +0 back +180 right +90 left +270 22
  • 23. Subjective view agent Neural Network Conv 80 80 200 x 3 400 256 n of actions n of car parameters (2) 64 23
  • 24. Subjective view agent Neural Network Conv 80 80 200 x 3 400 256 n of actions n of car parameters (2) 64 24
  • 25. Subjective view agent Problem – Calculation time (GeForce GTX TITAN X) • At first… 3 [min/ep] x 50k [ep] = 100 days • Reviewed by Abe-san… 1.6 [min/ep] x 50k [ep] = 55 days – Because of copy and synchronization between GPU and CPU – Learning interrupted as soon as divergence of DNN output – (Fortunately) agent “learned” goal by ~10k episodes in some trials – Memory usage • In DQN, we need to store 1M previous input data – 1M x (80 x 80 x 3 ch x 4 cameras) • Save images to disk and access every time 25
  • 26. Subjective view agent Result: three cameras, 6k episodes 0 deg -120 deg +120 deg Trajectory of the car agent Subjective view (Input for DQN) 26
  • 27. Subjective view agent Result: three cameras, 50k episodes The policy “move anyways” ? >> Reward setting Seems not able to goal every time Only “easy” goal to achieve >> Variable task difficulty (curriculum Frequent goals here 27
  • 28. Subjective view agent Four camera at 30k ep. 28
  • 29. Modify reward Previous – +1 when the car is in the goal – -1 when the car is out of the field – 0.01 - 0.01 * distance_to_goal otherwise New – +1 - speed when the car is in the goal • in order to stop the car – -1 when the car is out of the field – -0.005 29
  • 30. Modify difficulty Difficulty: Initial car direction & position – Constraint • Car always starts near the middle of the field • Car always starts with face toward center: – Curriculum • Car direction: – where n = currriculum • Criteria: – 0.6 of mean reward over 100 episodes ± p 12 n ± p 4 Goal n = 1 n = 2 30
  • 31. Subjective view agent: modifications N cameras Reward Difficulty Learning result 3 Default Default about 6k: o 50k: x 3 modified Default about 16k: o 3 modified Constraint ? (still learning) 3 modified Curriculum o (though curriculum 1 yet) 4 Default Default x 4 modified Curriculum △ (not bad, but not successful yet at 6k) 31
  • 32. Subjective view agent: modifications Curriculum + Three cameras @curriculum 1. Criteria needs to be modified reward mean reward sum 1.0 0.0 500 0 n episode 0 10k 20k n episode 0 10k 20k 32
  • 33. Discussion 1. Initial settings included the situation where car cannot reach the goal – e.g. Start towards the edge of the field – This made learning unstable 2. Why successful for coordinate agent? – In spite there could be such situations? 33
  • 34. Discussion 3. Comparison with three and four cameras – Considering success rate and execution time, three camera is better – Why not successful in four cameras? – Need several trials? 4. DQN often diverged – every three times in personal feeling • four cameras is slightly more oftern – Importance of dataset for learning • memory size, batch size 34
  • 35. Discussion 5. Curriculum – Ideally better to quantify “difficulty of the task” • In this case, maybe it is roughly represented as “bias of distribution” of the selected actions? accelerate decelerate throw (do nothing) steer right steer left accelerate + steer right accelerate + steer left decelerate + steer right decelerate + steer left same times for each actions >> go straight biased distribution of selected actions >> go right/lef 35
  • 36. Summary • Car agent can park itself with subjective view of cameras, though not always stable learning • Trade-off between reward design and learning difficulty – Simple reward: difficult to learn • Try other algorithms like A3C – Complex reward: difficult to set • Other setting for distance_to_goal 36

Editor's Notes

  1. 学習率をもっと小さくするのか A3C
  2. 線よりも平均で書くか点で書く TRPO