DeepVO - Towards Visual Odometry with Deep Learning

•

5 likes•1,064 views

Author: Sen Wang1,2, Ronald Clark2, Hongkai Wen2 and Niki Trigoni2 1. Edinburgh Centre for Robotics, Heriot-Watt University, UK 2. University of Oxford, UK Download this paper: http://senwang.gitlab.io/DeepVO/#paper Watch video: http://senwang.gitlab.io/DeepVO/#video

Engineering

DeepVO
Towards End-to-End Visual Odometry with Deep
Recurrent Convolutional Neural Networks
National Chung Cheng University, Taiwan
Robot Vision Laboratory
2017/11/08
Jacky Liu

About this work
DeepVO : Towards Visual Odometry with Deep Learning
Sen Wang1,2, Ronald Clark2, Hongkai Wen2 and Niki Trigoni2
1. Edinburgh Centre for Robotics, Heriot-Watt University, UK
2. University of Oxford, UK
Download this paper: http://senwang.gitlab.io/DeepVO/#paper
Watch video: http://senwang.gitlab.io/DeepVO/#video
2
DeepVO : Towards Visual Odometry with Deep Learning

Contributions
1. Proving that
Monocular VO could
be build by End-to-
End training
2. RCNN architecture
could generalized to
unseen environment
3. Complex movement
could be modeled by
RCNN
3
DeepVO : Towards Visual Odometry with Deep Learning

Related works
4
Visual odometry
Geometric
Sparse Direct
Learning

Related works
Sparse
 PTAM
 ORB-SLAM
Direct
 DTAM
5
Network
 CNN
 RNN
 LSTM

Network design
1. Traditional computer vision learn knowledge from
appearance and image context
2. Visual odometry should learn from geometry.
This is what RCNN tried to address
6
DeepVO : Towards Visual Odometry with Deep Learning

Network design
7
DeepVO : Towards Visual Odometry with Deep Learning

8
DeepVO : Towards Visual Odometry with Deep Learning

Preprocessing
 Normalizing inputs (speed up training)
=> subtracting the mean RGB values of the
training set
 Resize image to 64x
 Stack two images to form a tensor
9
DeepVO : Towards Visual Odometry with Deep Learning

CNN
 What this research mean by learning
“geometric” feature?
=> They stacking two RGB images and feed it
into CNN. Expecting the network to perform
feature extraction on the concatenation of
two consecutive monocular RGB images.
10
DeepVO : Towards Visual Odometry with Deep Learning

RNN
 RNN is not suitable to directly learn sequential
representation from high-dimensional raw
data, such as images.
 Hidden state:
ℎ 𝑘 = ℋ 𝑊𝑥ℎ 𝑥 𝑘 + 𝑊ℎℎℎ 𝑘−1 + 𝑏ℎ
 Output:
𝑦 𝑘 = 𝑊ℎ𝑦ℎ 𝑘 + 𝑏 𝑦
11
DeepVO : Towards Visual Odometry with Deep Learning
𝑏: bias vector𝑊: weight matrix
𝑘: time index ℋ: activation function
Vanishing gradient
problem

LSTM (Long short-term memory)
12
DeepVO : Towards Visual Odometry with Deep Learning
Need depth to
learn high level
representation

13
DeepVO : Towards Visual Odometry with Deep Learning

14
Cost function
𝜃∗
= argmin
𝜃
1
𝑁
෍
𝑖=1
𝑁
෍
𝑘=1
𝑡
Ƹ𝑝 𝑘 − 𝑝 𝑘 2
2
+ 𝜘 ො𝜑 𝑘 − 𝜑 𝑘 2
2
Conditional probability of pose
𝑝 𝑌𝑡 𝑋𝑡 = 𝑝(𝑦1, … , 𝑦𝑡|𝑥1, … , 𝑥𝑡)
𝜃∗
= argmin
𝜃
𝑝(𝑌𝑡|𝑋𝑡; 𝜃)
Ground truth pose (𝑝 𝑘, 𝜑 𝑘) = (position, orientation)
𝑠𝑐𝑎𝑙𝑒 𝑓𝑎𝑐𝑡𝑜𝑟

Training & testing
1. Dataset: KITTI VO/SLAM benchmark
(22 sequences of images / 10fps / dynamic object)
2. 7410 training samples (image and trajectory pair)
3. Implemented based on Theano
4. Hardware: Nvidia Tesla K40 GPU
5. 200 epochs
6. Learning rate 0.001
7. Regularization: dropout / early stopping
8. CNN: transfer learning from FlowNet
16

overfitting
 Orientation is more
prone to overfitting
17
DeepVO : Towards Visual Odometry with Deep Learning

Compare with
traditional VO
 Open-source VO library
LIBVISO2
 Monocular / Stereo
18
DeepVO : Towards Visual Odometry with Deep Learning

Trajectory (1/2)
19
DeepVO : Towards Visual Odometry with Deep Learning

Trajectory (2/2)
 No ground truth:
Seq11~19
20
DeepVO : Towards Visual Odometry with Deep Learning

21
DeepVO : Towards Visual Odometry with Deep Learning

Dynamic
 This research don’t
know how to deal
with this issue
 Traditional VO –
RANSAC (remove
outlier)
 Get more training
data
22
DeepVO : Towards Visual Odometry with Deep Learning

Conclusion
23
 End-to-end monocular VO based on Deep learning
 Deep RCNN
 No need to carefully tune the parameters of the
VO system
 It is not expected as a replacement to the classic
geometry based approach

What's hot

Lec9: Medical Image Segmentation (III) (Fuzzy Connected Image Segmentation)Ulaş Bağcı

Presentation of Visual TrackingYu-Sheng (Yosen) Chen

Medical image processingDr G R Sinha

Window to Viewport Transformation in Computer Graphics with.pptxDolchandra

Software development plan siabmBayu Pamungkas

CV_2 Fourier_Transformation Khushali Kathiriya

GIS On Demand DeckRoy Tertman

Image segmentationGayan Sampath

HUMAN FACE IDENTIFICATION bhupesh lahare

Computer visionwahyu Wahyuutias

A Beginner's Guide to Monocular Depth EstimationRyo Takahashi

[Mmlab seminar 2016] deep learning for human pose estimationWei Yang

Cv_Chap 4 SegmentationKhushali Kathiriya

Machine Vision In Electronic & Semiconductor IndustryFrancy Abraham

camera calibrationSandeep Sasidharan

Visual Object Tracking: reviewDmytro Mishkin

Digitasi peta sig dasarTedi Eka

Multi Object Tracking | Presentation 1 | ID 103001Md. Minhazul Haque

[RPL2] Activity Diagramrizki adam kurniawan

Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)Universitat Politècnica de Catalunya

What's hot (20)

Lec9: Medical Image Segmentation (III) (Fuzzy Connected Image Segmentation)

Presentation of Visual Tracking

Medical image processing

Window to Viewport Transformation in Computer Graphics with.pptx

Software development plan siabm

CV_2 Fourier_Transformation

GIS On Demand Deck

Image segmentation

HUMAN FACE IDENTIFICATION

Computer vision

A Beginner's Guide to Monocular Depth Estimation

[Mmlab seminar 2016] deep learning for human pose estimation

Cv_Chap 4 Segmentation

Machine Vision In Electronic & Semiconductor Industry

camera calibration

Visual Object Tracking: review

Digitasi peta sig dasar

Multi Object Tracking | Presentation 1 | ID 103001

[RPL2] Activity Diagram

Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)

Similar to DeepVO - Towards Visual Odometry with Deep Learning

(Research Note) Delving deeper into convolutional neural networks for camera ...Jacky Liu

Video Saliency Prediction with Deep Neural Networks - Juan Jose Nieto - DCU 2019Universitat Politècnica de Catalunya

Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon TransformFadwa Fouad

Deep Learning Hardware: Past, Present, & FutureRouyun Pan

Review of Pose Recognition Systemsvivatechijri

Details of Lazy Deep Learning for Images Recognition in ZZ Photo appPAY2 YOU

Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...Luba Elliott

Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...CSCJournals

H2O Distributed Deep Learning by Arno Candel 071614Sri Ambati

Iciap 2Ionut Mironica

Human Action Recognition Based on Spacio-temporal features-Posternikhilus85

Sparse representation based human action recognition using an action region-a...Wesley De Neve

Action Genome: Action As Composition of Spatio Temporal Scene GraphsSangmin Woo

Exploring visual and motion saliency for automatic video object extractionMuthu Samy

Sub-sampled dictionaries for coarse-to-fine sparse representation-based human...Wesley De Neve

lec_11_self_supervised_learning.pdfAlamgirAkash3

Particle filter framework for salient object detection in videosProjectsatbangalore

最近の研究情勢についていくために - Deep Learningを中心に - Hiroshi Fukui

Multispectral Purkinje ImagingPetteriTeikariPhD

Similar to DeepVO - Towards Visual Odometry with Deep Learning (20)

(Research Note) Delving deeper into convolutional neural networks for camera ...

Video Saliency Prediction with Deep Neural Networks - Juan Jose Nieto - DCU 2019

Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Deep Learning Hardware: Past, Present, & Future

Review of Pose Recognition Systems

Details of Lazy Deep Learning for Images Recognition in ZZ Photo app

Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...

Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...

H2O Distributed Deep Learning by Arno Candel 071614

Iciap 2

Human Action Recognition Based on Spacio-temporal features-Poster

Sparse representation based human action recognition using an action region-a...

Action Genome: Action As Composition of Spatio Temporal Scene Graphs

Exploring visual and motion saliency for automatic video object extraction

Sub-sampled dictionaries for coarse-to-fine sparse representation-based human...

lec_11_self_supervised_learning.pdf

Particle filter framework for salient object detection in videos

最近の研究情勢についていくために - Deep Learningを中心に -

Multispectral Purkinje Imaging

Recently uploaded

result management system report for college projectTonystark477637

Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile

VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor

Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile

AKTU Computer Networks notes --- Unit 3.pdfankushspencer015

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile

Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan

College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile

Introduction to Multiple Access Protocol.pptxupamatechverse

HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla

VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor

SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome

Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi

MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N

UNIT-II FMM-Flow Through Circular Conduitsrknatarajan

(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat

Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis

Porous Ceramics seminar and technical writingrakeshbaidya232001

The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat

Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia

Recently uploaded (20)

result management system report for college project

Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts

VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130

Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...

AKTU Computer Networks notes --- Unit 3.pdf

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts

Coefficient of Thermal Expansion and their Importance.pptx

College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik

Introduction to Multiple Access Protocol.pptx

HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS

VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130

SPICE PARK APR2024 ( 6,793 SPICE Models )

Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...

MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS

UNIT-II FMM-Flow Through Circular Conduits

(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...

Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...

Porous Ceramics seminar and technical writing

The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...

Software Development Life Cycle By Team Orange (Dept. of Pharmacy)

DeepVO - Towards Visual Odometry with Deep Learning

1. DeepVO Towards End-to-End Visual Odometry with Deep Recurrent Convolutional Neural Networks National Chung Cheng University, Taiwan Robot Vision Laboratory 2017/11/08 Jacky Liu

2. About this work DeepVO : Towards Visual Odometry with Deep Learning Sen Wang1,2, Ronald Clark2, Hongkai Wen2 and Niki Trigoni2 1. Edinburgh Centre for Robotics, Heriot-Watt University, UK 2. University of Oxford, UK Download this paper: http://senwang.gitlab.io/DeepVO/#paper Watch video: http://senwang.gitlab.io/DeepVO/#video 2 DeepVO : Towards Visual Odometry with Deep Learning

3. Contributions 1. Proving that Monocular VO could be build by End-to- End training 2. RCNN architecture could generalized to unseen environment 3. Complex movement could be modeled by RCNN 3 DeepVO : Towards Visual Odometry with Deep Learning

5. Related works Sparse  PTAM  ORB-SLAM Direct  DTAM 5 Network  CNN  RNN  LSTM

6. Network design 1. Traditional computer vision learn knowledge from appearance and image context 2. Visual odometry should learn from geometry. This is what RCNN tried to address 6 DeepVO : Towards Visual Odometry with Deep Learning

7. Network design 7 DeepVO : Towards Visual Odometry with Deep Learning

8. 8 DeepVO : Towards Visual Odometry with Deep Learning

9. Preprocessing  Normalizing inputs (speed up training) => subtracting the mean RGB values of the training set  Resize image to 64x  Stack two images to form a tensor 9 DeepVO : Towards Visual Odometry with Deep Learning

10. CNN  What this research mean by learning “geometric” feature? => They stacking two RGB images and feed it into CNN. Expecting the network to perform feature extraction on the concatenation of two consecutive monocular RGB images. 10 DeepVO : Towards Visual Odometry with Deep Learning

11. RNN  RNN is not suitable to directly learn sequential representation from high-dimensional raw data, such as images.  Hidden state: ℎ 𝑘 = ℋ 𝑊𝑥ℎ 𝑥 𝑘 + 𝑊ℎℎℎ 𝑘−1 + 𝑏ℎ  Output: 𝑦 𝑘 = 𝑊ℎ𝑦ℎ 𝑘 + 𝑏 𝑦 11 DeepVO : Towards Visual Odometry with Deep Learning 𝑏: bias vector𝑊: weight matrix 𝑘: time index ℋ: activation function Vanishing gradient problem

12. LSTM (Long short-term memory) 12 DeepVO : Towards Visual Odometry with Deep Learning Need depth to learn high level representation

13. 13 DeepVO : Towards Visual Odometry with Deep Learning

14. 14 Cost function 𝜃∗ = argmin 𝜃 1 𝑁 ෍ 𝑖=1 𝑁 ෍ 𝑘=1 𝑡 Ƹ𝑝 𝑘 − 𝑝 𝑘 2 2 + 𝜘 ො𝜑 𝑘 − 𝜑 𝑘 2 2 Conditional probability of pose 𝑝 𝑌𝑡 𝑋𝑡 = 𝑝(𝑦1, … , 𝑦𝑡|𝑥1, … , 𝑥𝑡) 𝜃∗ = argmin 𝜃 𝑝(𝑌𝑡|𝑋𝑡; 𝜃) Ground truth pose (𝑝 𝑘, 𝜑 𝑘) = (position, orientation) 𝑠𝑐𝑎𝑙𝑒 𝑓𝑎𝑐𝑡𝑜𝑟

15. Experimental results DeepVO VISO2 15

16. Training & testing 1. Dataset: KITTI VO/SLAM benchmark (22 sequences of images / 10fps / dynamic object) 2. 7410 training samples (image and trajectory pair) 3. Implemented based on Theano 4. Hardware: Nvidia Tesla K40 GPU 5. 200 epochs 6. Learning rate 0.001 7. Regularization: dropout / early stopping 8. CNN: transfer learning from FlowNet 16

17. overfitting  Orientation is more prone to overfitting 17 DeepVO : Towards Visual Odometry with Deep Learning

18. Compare with traditional VO  Open-source VO library LIBVISO2  Monocular / Stereo 18 DeepVO : Towards Visual Odometry with Deep Learning

19. Trajectory (1/2) 19 DeepVO : Towards Visual Odometry with Deep Learning

20. Trajectory (2/2)  No ground truth: Seq11~19 20 DeepVO : Towards Visual Odometry with Deep Learning

21. 21 DeepVO : Towards Visual Odometry with Deep Learning

22. Dynamic  This research don’t know how to deal with this issue  Traditional VO – RANSAC (remove outlier)  Get more training data 22 DeepVO : Towards Visual Odometry with Deep Learning

23. Conclusion 23  End-to-end monocular VO based on Deep learning  Deep RCNN  No need to carefully tune the parameters of the VO system  It is not expected as a replacement to the classic geometry based approach

DeepVO - Towards Visual Odometry with Deep Learning

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to DeepVO - Towards Visual Odometry with Deep Learning

Similar to DeepVO - Towards Visual Odometry with Deep Learning (20)

Recently uploaded

Recently uploaded (20)

DeepVO - Towards Visual Odometry with Deep Learning