SlideShare ist ein Scribd-Unternehmen logo
1 von 35
1
Fast Multi-frame Stereo Scene Flow
with Motion Segmentation
Tatsunori Taniai*
RIKEN AIP
Sudipta N. Sinha
Microsoft Research
Yoichi Sato
The University of Tokyo
CVPR 2017 Paper
* Work done during internship at Microsoft Research and partly at the University of Tokyo.
2
3
Contributions
• New unified framework
– Stereo (depth / disparity)
– Optical flow (2D motion field)
– Motion segmentation (binary mask of moving objects)
– Visual odometry (6 DoF camera ego-motion)
In our framework
• Result of each task benefits others, leading to higher accuracy and efficiency
• Joint task is decomposed into simple optimization problems (in contrast to
existing joint methods)
Results
• Accurate: achieved 3rd rank on KITTI benchmark
• Fast: 10~1000x faster than state-of-the-art methods
5
Scene Flow: Problem Definition
𝑿 𝑡 = 𝑥 𝑡, 𝑦𝑡, 𝑧𝑡
𝑝𝑝′
𝐼𝑡
0
𝐼𝑡
1
Stereo disparity
1D horizontal translation
by object depth 𝑧
𝑝′
6
Scene Flow: Problem Definition
𝐼𝑡
0
𝐼𝑡+1
0
𝐼𝑡
1
𝐼𝑡+1
1
𝑿 𝑡
𝑝
𝑝′
Optical flow
2D translation
by camera and
object motions
𝑝′
𝑿 𝑡+1
7
Scene Flow: Problem Definition
𝑿 𝑡
Stereo disparity
1D horizontal translation
by object depth 𝑧
𝐼𝑡
0
𝐼𝑡+1
0
𝑿 𝑡+1
Optical flow
2D translation
by camera and
object motions
All together implicitly
represent 3D
motions of points
8
Applications
Autonomous driving
[Menze+ CVPR 15]
Action recognition
[Wang+ CVPR 11]
Depth and flow map sequences are useful in many applications
But optical flow estimation is VERY SLOW.
9
Overview
• Introduction
• Motivation
• Proposed method
• Experiments
10
Optical Flow vs Stereo
Optical flow Stereo matching
 1D translation 2D translationSearch space
Motion factor  Object motion,
Ego-motion, etc.
 Object depth
Optical flow is much more difficult & expensive than stereo
11
Dominant Rigid Scene Assumption
Most of the points are static.
Their flows are due to camera motions.
12
Flow Estimation by Depth and Camera Motion
𝐼𝑡
Rigid flow map Ground truth flow map Error map of rigid flow
Surface 𝐷𝑡
𝐼𝑡+1
Surface 𝐷𝑡+1
𝐼𝑡 𝐼𝑡+1
Given rigid flow map,
we only need to
recompute flow for
moving objects.
13
Overview
• Introduction
• Motivation
• Proposed method
• Experiments
14
Proposed Approach
Visual
odometry
Initial motion
segmentation
Optical
flow
𝐼𝑡
0
, 𝐼𝑡+1
0
Frig
Rigid flow S
Init. seg.
Epipolar
stereo
𝐼𝑡±1
0,1
, 𝐼𝑡
0
, 𝐼𝑡
1
Flow fusion
Fnon
Non-rigid flow
𝐼𝑡
0
, 𝐼𝑡+1
0
+ D + 𝐏, D+ 𝐏
𝐏
Ego-motion
D
Disparity
+ S
Binocular
stereo
𝐼𝑡
0
, 𝐼𝑡
1
D
Init. disparity
𝐼𝑡±1
0,1
, 𝐼𝑡
0
, 𝐼𝑡
1 𝐼𝑡
0
, 𝐼𝑡+1
0
+Frig, Fnon
F
Flow
S
Motion seg.
Input
15
Optimization Strategy
𝐸 𝚯 =
𝑝
𝐼𝑡
0
𝑝 − 𝐼𝑡+1
0
𝑤(𝑝; 𝚯)
Minimize image residuals
16
Optimization Strategy
𝐸 D, 𝐏, S, Fnon =
𝑝
𝐼𝑡
0
𝑝 − 𝐼𝑡+1
0
𝑤 𝑝; D, 𝐏, S, Fnon
𝑤 𝑝; D, 𝐏𝑤 𝑝 𝑤 𝑝; D, 𝐏, S, F 𝑛𝑜𝑛
Minimize image residuals
by gradually increasing complexity of the warping model.
Visual
odometry
Initial motion
segmentation
Optical
flow
Epipolar
stereo
Flow fusion
Binocular
stereo
Rigid warping Partially non-rigid warping
17
Intermediate Step: Binocular Stereo
Visual
odometry
Initial motion
segmentation
Optical
flow
Epipolar
stereo
Flow fusion
Binocular
stereo
D
Init. disparity
18
Intermediate Step: Binocular Stereo
Visual
odometry
Initial motion
segmentation
Optical
flow
Epipolar
stereo
Flow fusion
Binocular
stereo
Initial disparity map Left-right occlusion map Uncertainty map
• SGM stereo
• NCC-based matching costs
• Left-right consistency check • Using [Drory 2014]
(no computational overhead)
19
Intermediate Step: Visual Odometry
Visual
odometry
Initial motion
segmentation
Optical
flow
Epipolar
stereo
Flow fusion
𝐏
Ego-motion
Binocular
stereo
20
Intermediate Step: Visual Odometry
Visual
odometry
Initial motion
segmentation
Optical
flow
Epipolar
stereo
Flow fusion
Binocular
stereo
min
𝐏
𝐸 𝐏|D =
𝑝
𝑤 𝑝 𝜌 𝐼𝑡 𝑝 − 𝐼𝑡+1 𝑤 𝑝; D, 𝐏
Use [Alismail+ CMU-TR14]
• Estimate the 6DoF camera motion by directly minimizing image residuals
• Iteratively reweighted least squares (Lucas-Kanade + inverse compositional)
+ Down-weight moving object regions predicted by flow F 𝑡−1 and mask S 𝑡−1
𝜌: robust penalty function
Rigid warping
21
Intermediate Step: Epipolar Stereo
Visual
odometry
Initial motion
segmentation
Optical
flow
Epipolar
stereo
Flow fusion
D
Disparity
Binocular
stereo
22
Intermediate Step: Epipolar Stereo
Visual
odometry
Initial motion
segmentation
Optical
flow
Epipolar
stereo
Flow fusion
Binocular
stereo
Left-right 𝐼𝑡
0
, 𝐼𝑡
1
matching is unreliable by occlusion
𝐼𝑡
0
𝐼𝑡
1
𝐼𝑡+1
0
𝐼𝑡+1
1
𝐼𝑡−1
0
𝐼𝑡−1
1
• Blend matching costs with four adjacent frames
(using estimated poses 𝐏𝑡, 𝐏𝑡−1)
• High uncertainty → high weights on adjacent
frame matching
Occlusion map
Uncertainty map
23
Intermediate Step: Epipolar Stereo
Visual
odometry
Initial motion
segmentation
Optical
flow
Epipolar
stereo
Flow fusion
Binocular
stereo
Final disparity mapInitial disparity map
• Run SGM stereo again using blended matching costs
• Disparities are improved at occluded regions
Ground truth
24
Intermediate Step: Initial Motion Segmentation
Visual
odometry
Initial motion
segmentation
Optical
flow
Frig
Rigid flow
S
Epipolar
stereo
Flow fusion
Binocular
stereo
Initial segmentation
25
Intermediate Step: Initial Motion Segmentation
Visual
odometry
Initial motion
segmentation
Optical
flow
Epipolar
stereo
Flow fusion
Binocular
stereo
• Predict moving-object regions where rigid flow proposal is inaccurate
Ground truthRigid flow proposal Initial segmentation
26
Intermediate Step: Initial Motion Segmentation
Visual
odometry
Initial motion
segmentation
Optical
flow
Epipolar
stereo
Flow fusion
Binocular
stereo
• Predict moving-object regions where rigid flow proposal is inaccurate
• Use image redisuals as soft seeds in GragCut-based segmentation
Image residualRigid flow proposal Initial segmentation
27
Intermediate Step: Optical Flow
Visual
odometry
Initial motion
segmentation
Optical
flow
Epipolar
stereo
Flow fusion
Fnon
Non-rigid flow
Binocular
stereo
28
Intermediate Step: Optical Flow
Visual
odometry
Initial motion
segmentation
Optical
flow
Epipolar
stereo
Flow fusion
Binocular
stereo
Initial segmentation Non-rigid flow proposal
• Estimate non-rigid flow for predicted moving-object regions
• Extend the SGM algorithm to optical flow
29
Intermediate Step: Flow Fusion
Visual
odometry
Initial motion
segmentation
Optical
flow
Epipolar
stereo
Flow fusion
Binocular
stereo
F
Flow
S
Motion seg.
30
Intermediate Step: Flow Fusion
Visual
odometry
Initial motion
segmentation
Optical
flow
Epipolar
stereo
Flow fusion
Binocular
stereo
Final flow map
Final motion segmentation
Rigid flow proposal
Non-rigid flow proposal
Fusion
Binary labeling
white: non-rigid
black: rigid
31
Overview
• Introduction
• Motivation
• Proposed method
• Experiments
32
KITTI 2015 Scene Flow Benchmark
Our method is ranked 3rd (November 2016)
200 road scenes with multiple moving objects
34
35
Summary of This Research
• New unified framework
– Stereo (depth / disparity)
– Optical flow (2D motion field)
– Motion segmentation (binary mask of moving objects)
– Visual odometry (6 DoF camera ego-motion)
• Accurate: achieved 3rd rank on KITTI benchmark
• Fast: 10 - 1000x faster than state-of-the-art methods
36
RIKEN AIP is a wonderful place
for young researchers and students.
Contact me for internship opportunities
Take home message
日本橋オフィス
DGX-1 x 24
37

Weitere ähnliche Inhalte

Was ist angesagt?

局所特徴量と統計学習手法による物体検出
局所特徴量と統計学習手法による物体検出局所特徴量と統計学習手法による物体検出
局所特徴量と統計学習手法による物体検出
MPRG_Chubu_University
 
画像認識の初歩、SIFT,SURF特徴量
画像認識の初歩、SIFT,SURF特徴量画像認識の初歩、SIFT,SURF特徴量
画像認識の初歩、SIFT,SURF特徴量
takaya imai
 
ガイデットフィルタとその周辺
ガイデットフィルタとその周辺ガイデットフィルタとその周辺
ガイデットフィルタとその周辺
Norishige Fukushima
 

Was ist angesagt? (20)

【論文読み会】Deep Clustering for Unsupervised Learning of Visual Features
【論文読み会】Deep Clustering for Unsupervised Learning of Visual Features【論文読み会】Deep Clustering for Unsupervised Learning of Visual Features
【論文読み会】Deep Clustering for Unsupervised Learning of Visual Features
 
局所特徴量と統計学習手法による物体検出
局所特徴量と統計学習手法による物体検出局所特徴量と統計学習手法による物体検出
局所特徴量と統計学習手法による物体検出
 
画像認識の初歩、SIFT,SURF特徴量
画像認識の初歩、SIFT,SURF特徴量画像認識の初歩、SIFT,SURF特徴量
画像認識の初歩、SIFT,SURF特徴量
 
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
 
Slideshare unsupervised learning of depth and ego motion from video
Slideshare unsupervised learning of depth and ego motion from videoSlideshare unsupervised learning of depth and ego motion from video
Slideshare unsupervised learning of depth and ego motion from video
 
Lucas kanade法について
Lucas kanade法についてLucas kanade法について
Lucas kanade法について
 
[DL輪読会]MetaFormer is Actually What You Need for Vision
[DL輪読会]MetaFormer is Actually What You Need for Vision[DL輪読会]MetaFormer is Actually What You Need for Vision
[DL輪読会]MetaFormer is Actually What You Need for Vision
 
Depth Estimation論文紹介
Depth Estimation論文紹介Depth Estimation論文紹介
Depth Estimation論文紹介
 
SSII2018TS: コンピュテーショナルイルミネーション
SSII2018TS: コンピュテーショナルイルミネーションSSII2018TS: コンピュテーショナルイルミネーション
SSII2018TS: コンピュテーショナルイルミネーション
 
[DL輪読会]画像を使ったSim2Realの現況
[DL輪読会]画像を使ったSim2Realの現況[DL輪読会]画像を使ったSim2Realの現況
[DL輪読会]画像を使ったSim2Realの現況
 
物体検出の歴史(R-CNNからSSD・YOLOまで)
物体検出の歴史(R-CNNからSSD・YOLOまで)物体検出の歴史(R-CNNからSSD・YOLOまで)
物体検出の歴史(R-CNNからSSD・YOLOまで)
 
Structured Light 技術俯瞰
Structured Light 技術俯瞰Structured Light 技術俯瞰
Structured Light 技術俯瞰
 
Semi supervised, weakly-supervised, unsupervised, and active learning
Semi supervised, weakly-supervised, unsupervised, and active learningSemi supervised, weakly-supervised, unsupervised, and active learning
Semi supervised, weakly-supervised, unsupervised, and active learning
 
[DL輪読会]End-to-End Object Detection with Transformers
[DL輪読会]End-to-End Object Detection with Transformers[DL輪読会]End-to-End Object Detection with Transformers
[DL輪読会]End-to-End Object Detection with Transformers
 
三次元点群を取り扱うニューラルネットワークのサーベイ
三次元点群を取り扱うニューラルネットワークのサーベイ三次元点群を取り扱うニューラルネットワークのサーベイ
三次元点群を取り扱うニューラルネットワークのサーベイ
 
ガイデットフィルタとその周辺
ガイデットフィルタとその周辺ガイデットフィルタとその周辺
ガイデットフィルタとその周辺
 
[DL輪読会]Domain Adaptive Faster R-CNN for Object Detection in the Wild
[DL輪読会]Domain Adaptive Faster R-CNN for Object Detection in the Wild[DL輪読会]Domain Adaptive Faster R-CNN for Object Detection in the Wild
[DL輪読会]Domain Adaptive Faster R-CNN for Object Detection in the Wild
 
2015年12月PRMU研究会 対応点探索のための特徴量表現
2015年12月PRMU研究会 対応点探索のための特徴量表現2015年12月PRMU研究会 対応点探索のための特徴量表現
2015年12月PRMU研究会 対応点探索のための特徴量表現
 
[DL輪読会]NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
[DL輪読会]NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis[DL輪読会]NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
[DL輪読会]NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
 
(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot Learning(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot Learning
 

Ähnlich wie Fast Multi-frame Stereo Scene Flow with Motion Segmentation (CVPR 2017)

Action Recognition (Thesis presentation)
Action Recognition (Thesis presentation)Action Recognition (Thesis presentation)
Action Recognition (Thesis presentation)
nikhilus85
 
Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform
Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon TransformHuman Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform
Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform
Fadwa Fouad
 
Stixel based real time object detection for ADAS using surface normal
Stixel based real time object detection for ADAS using surface normalStixel based real time object detection for ADAS using surface normal
Stixel based real time object detection for ADAS using surface normal
TaeKang Woo
 
FV_IGARSS11.ppt
FV_IGARSS11.pptFV_IGARSS11.ppt
FV_IGARSS11.ppt
grssieee
 
FV_IGARSS11.ppt
FV_IGARSS11.pptFV_IGARSS11.ppt
FV_IGARSS11.ppt
grssieee
 
FV_IGARSS11.ppt
FV_IGARSS11.pptFV_IGARSS11.ppt
FV_IGARSS11.ppt
grssieee
 
FV_IGARSS11.ppt
FV_IGARSS11.pptFV_IGARSS11.ppt
FV_IGARSS11.ppt
grssieee
 
CaoTupinThursday20110722.ppt
CaoTupinThursday20110722.pptCaoTupinThursday20110722.ppt
CaoTupinThursday20110722.ppt
grssieee
 
BallCatchingRobot
BallCatchingRobotBallCatchingRobot
BallCatchingRobot
gauravbrd
 

Ähnlich wie Fast Multi-frame Stereo Scene Flow with Motion Segmentation (CVPR 2017) (20)

論文読み会@AIST (Deep Virtual Stereo Odometry [ECCV2018])
論文読み会@AIST (Deep Virtual Stereo Odometry [ECCV2018])論文読み会@AIST (Deep Virtual Stereo Odometry [ECCV2018])
論文読み会@AIST (Deep Virtual Stereo Odometry [ECCV2018])
 
Action Recognition (Thesis presentation)
Action Recognition (Thesis presentation)Action Recognition (Thesis presentation)
Action Recognition (Thesis presentation)
 
Visual odometry & slam utilizing indoor structured environments
Visual odometry & slam utilizing indoor structured environmentsVisual odometry & slam utilizing indoor structured environments
Visual odometry & slam utilizing indoor structured environments
 
presentation.ppt
presentation.pptpresentation.ppt
presentation.ppt
 
Depth-Based Real Time Head Motion Tracking Using 3D Template Matching
Depth-Based Real Time Head Motion Tracking Using 3D Template MatchingDepth-Based Real Time Head Motion Tracking Using 3D Template Matching
Depth-Based Real Time Head Motion Tracking Using 3D Template Matching
 
Fisheye Omnidirectional View in Autonomous Driving
Fisheye Omnidirectional View in Autonomous DrivingFisheye Omnidirectional View in Autonomous Driving
Fisheye Omnidirectional View in Autonomous Driving
 
Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform
Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon TransformHuman Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform
Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform
 
Optic flow estimation with deep learning
Optic flow estimation with deep learningOptic flow estimation with deep learning
Optic flow estimation with deep learning
 
Stixel based real time object detection for ADAS using surface normal
Stixel based real time object detection for ADAS using surface normalStixel based real time object detection for ADAS using surface normal
Stixel based real time object detection for ADAS using surface normal
 
Object Tracking with Instance Matching and Online Learning
Object Tracking with Instance Matching and Online LearningObject Tracking with Instance Matching and Online Learning
Object Tracking with Instance Matching and Online Learning
 
FV_IGARSS11.ppt
FV_IGARSS11.pptFV_IGARSS11.ppt
FV_IGARSS11.ppt
 
FV_IGARSS11.ppt
FV_IGARSS11.pptFV_IGARSS11.ppt
FV_IGARSS11.ppt
 
FV_IGARSS11.ppt
FV_IGARSS11.pptFV_IGARSS11.ppt
FV_IGARSS11.ppt
 
FV_IGARSS11.ppt
FV_IGARSS11.pptFV_IGARSS11.ppt
FV_IGARSS11.ppt
 
IRJET - Dehazing of Single Nighttime Haze Image using Superpixel Method
IRJET -  	  Dehazing of Single Nighttime Haze Image using Superpixel MethodIRJET -  	  Dehazing of Single Nighttime Haze Image using Superpixel Method
IRJET - Dehazing of Single Nighttime Haze Image using Superpixel Method
 
CaoTupinThursday20110722.ppt
CaoTupinThursday20110722.pptCaoTupinThursday20110722.ppt
CaoTupinThursday20110722.ppt
 
Introduction to Binocular Stereo in Computer Vision
Introduction to Binocular Stereo in Computer VisionIntroduction to Binocular Stereo in Computer Vision
Introduction to Binocular Stereo in Computer Vision
 
cohenmedioni.ppt
cohenmedioni.pptcohenmedioni.ppt
cohenmedioni.ppt
 
BallCatchingRobot
BallCatchingRobotBallCatchingRobot
BallCatchingRobot
 
Near Surface Geoscience Conference 2015, Turin - A Spatial Velocity Analysis ...
Near Surface Geoscience Conference 2015, Turin - A Spatial Velocity Analysis ...Near Surface Geoscience Conference 2015, Turin - A Spatial Velocity Analysis ...
Near Surface Geoscience Conference 2015, Turin - A Spatial Velocity Analysis ...
 

Kürzlich hochgeladen

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Kürzlich hochgeladen (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 

Fast Multi-frame Stereo Scene Flow with Motion Segmentation (CVPR 2017)

  • 1. 1 Fast Multi-frame Stereo Scene Flow with Motion Segmentation Tatsunori Taniai* RIKEN AIP Sudipta N. Sinha Microsoft Research Yoichi Sato The University of Tokyo CVPR 2017 Paper * Work done during internship at Microsoft Research and partly at the University of Tokyo.
  • 2. 2
  • 3. 3 Contributions • New unified framework – Stereo (depth / disparity) – Optical flow (2D motion field) – Motion segmentation (binary mask of moving objects) – Visual odometry (6 DoF camera ego-motion) In our framework • Result of each task benefits others, leading to higher accuracy and efficiency • Joint task is decomposed into simple optimization problems (in contrast to existing joint methods) Results • Accurate: achieved 3rd rank on KITTI benchmark • Fast: 10~1000x faster than state-of-the-art methods
  • 4. 5 Scene Flow: Problem Definition 𝑿 𝑡 = 𝑥 𝑡, 𝑦𝑡, 𝑧𝑡 𝑝𝑝′ 𝐼𝑡 0 𝐼𝑡 1 Stereo disparity 1D horizontal translation by object depth 𝑧 𝑝′
  • 5. 6 Scene Flow: Problem Definition 𝐼𝑡 0 𝐼𝑡+1 0 𝐼𝑡 1 𝐼𝑡+1 1 𝑿 𝑡 𝑝 𝑝′ Optical flow 2D translation by camera and object motions 𝑝′ 𝑿 𝑡+1
  • 6. 7 Scene Flow: Problem Definition 𝑿 𝑡 Stereo disparity 1D horizontal translation by object depth 𝑧 𝐼𝑡 0 𝐼𝑡+1 0 𝑿 𝑡+1 Optical flow 2D translation by camera and object motions All together implicitly represent 3D motions of points
  • 7. 8 Applications Autonomous driving [Menze+ CVPR 15] Action recognition [Wang+ CVPR 11] Depth and flow map sequences are useful in many applications But optical flow estimation is VERY SLOW.
  • 8. 9 Overview • Introduction • Motivation • Proposed method • Experiments
  • 9. 10 Optical Flow vs Stereo Optical flow Stereo matching  1D translation 2D translationSearch space Motion factor  Object motion, Ego-motion, etc.  Object depth Optical flow is much more difficult & expensive than stereo
  • 10. 11 Dominant Rigid Scene Assumption Most of the points are static. Their flows are due to camera motions.
  • 11. 12 Flow Estimation by Depth and Camera Motion 𝐼𝑡 Rigid flow map Ground truth flow map Error map of rigid flow Surface 𝐷𝑡 𝐼𝑡+1 Surface 𝐷𝑡+1 𝐼𝑡 𝐼𝑡+1 Given rigid flow map, we only need to recompute flow for moving objects.
  • 12. 13 Overview • Introduction • Motivation • Proposed method • Experiments
  • 13. 14 Proposed Approach Visual odometry Initial motion segmentation Optical flow 𝐼𝑡 0 , 𝐼𝑡+1 0 Frig Rigid flow S Init. seg. Epipolar stereo 𝐼𝑡±1 0,1 , 𝐼𝑡 0 , 𝐼𝑡 1 Flow fusion Fnon Non-rigid flow 𝐼𝑡 0 , 𝐼𝑡+1 0 + D + 𝐏, D+ 𝐏 𝐏 Ego-motion D Disparity + S Binocular stereo 𝐼𝑡 0 , 𝐼𝑡 1 D Init. disparity 𝐼𝑡±1 0,1 , 𝐼𝑡 0 , 𝐼𝑡 1 𝐼𝑡 0 , 𝐼𝑡+1 0 +Frig, Fnon F Flow S Motion seg. Input
  • 14. 15 Optimization Strategy 𝐸 𝚯 = 𝑝 𝐼𝑡 0 𝑝 − 𝐼𝑡+1 0 𝑤(𝑝; 𝚯) Minimize image residuals
  • 15. 16 Optimization Strategy 𝐸 D, 𝐏, S, Fnon = 𝑝 𝐼𝑡 0 𝑝 − 𝐼𝑡+1 0 𝑤 𝑝; D, 𝐏, S, Fnon 𝑤 𝑝; D, 𝐏𝑤 𝑝 𝑤 𝑝; D, 𝐏, S, F 𝑛𝑜𝑛 Minimize image residuals by gradually increasing complexity of the warping model. Visual odometry Initial motion segmentation Optical flow Epipolar stereo Flow fusion Binocular stereo Rigid warping Partially non-rigid warping
  • 16. 17 Intermediate Step: Binocular Stereo Visual odometry Initial motion segmentation Optical flow Epipolar stereo Flow fusion Binocular stereo D Init. disparity
  • 17. 18 Intermediate Step: Binocular Stereo Visual odometry Initial motion segmentation Optical flow Epipolar stereo Flow fusion Binocular stereo Initial disparity map Left-right occlusion map Uncertainty map • SGM stereo • NCC-based matching costs • Left-right consistency check • Using [Drory 2014] (no computational overhead)
  • 18. 19 Intermediate Step: Visual Odometry Visual odometry Initial motion segmentation Optical flow Epipolar stereo Flow fusion 𝐏 Ego-motion Binocular stereo
  • 19. 20 Intermediate Step: Visual Odometry Visual odometry Initial motion segmentation Optical flow Epipolar stereo Flow fusion Binocular stereo min 𝐏 𝐸 𝐏|D = 𝑝 𝑤 𝑝 𝜌 𝐼𝑡 𝑝 − 𝐼𝑡+1 𝑤 𝑝; D, 𝐏 Use [Alismail+ CMU-TR14] • Estimate the 6DoF camera motion by directly minimizing image residuals • Iteratively reweighted least squares (Lucas-Kanade + inverse compositional) + Down-weight moving object regions predicted by flow F 𝑡−1 and mask S 𝑡−1 𝜌: robust penalty function Rigid warping
  • 20. 21 Intermediate Step: Epipolar Stereo Visual odometry Initial motion segmentation Optical flow Epipolar stereo Flow fusion D Disparity Binocular stereo
  • 21. 22 Intermediate Step: Epipolar Stereo Visual odometry Initial motion segmentation Optical flow Epipolar stereo Flow fusion Binocular stereo Left-right 𝐼𝑡 0 , 𝐼𝑡 1 matching is unreliable by occlusion 𝐼𝑡 0 𝐼𝑡 1 𝐼𝑡+1 0 𝐼𝑡+1 1 𝐼𝑡−1 0 𝐼𝑡−1 1 • Blend matching costs with four adjacent frames (using estimated poses 𝐏𝑡, 𝐏𝑡−1) • High uncertainty → high weights on adjacent frame matching Occlusion map Uncertainty map
  • 22. 23 Intermediate Step: Epipolar Stereo Visual odometry Initial motion segmentation Optical flow Epipolar stereo Flow fusion Binocular stereo Final disparity mapInitial disparity map • Run SGM stereo again using blended matching costs • Disparities are improved at occluded regions Ground truth
  • 23. 24 Intermediate Step: Initial Motion Segmentation Visual odometry Initial motion segmentation Optical flow Frig Rigid flow S Epipolar stereo Flow fusion Binocular stereo Initial segmentation
  • 24. 25 Intermediate Step: Initial Motion Segmentation Visual odometry Initial motion segmentation Optical flow Epipolar stereo Flow fusion Binocular stereo • Predict moving-object regions where rigid flow proposal is inaccurate Ground truthRigid flow proposal Initial segmentation
  • 25. 26 Intermediate Step: Initial Motion Segmentation Visual odometry Initial motion segmentation Optical flow Epipolar stereo Flow fusion Binocular stereo • Predict moving-object regions where rigid flow proposal is inaccurate • Use image redisuals as soft seeds in GragCut-based segmentation Image residualRigid flow proposal Initial segmentation
  • 26. 27 Intermediate Step: Optical Flow Visual odometry Initial motion segmentation Optical flow Epipolar stereo Flow fusion Fnon Non-rigid flow Binocular stereo
  • 27. 28 Intermediate Step: Optical Flow Visual odometry Initial motion segmentation Optical flow Epipolar stereo Flow fusion Binocular stereo Initial segmentation Non-rigid flow proposal • Estimate non-rigid flow for predicted moving-object regions • Extend the SGM algorithm to optical flow
  • 28. 29 Intermediate Step: Flow Fusion Visual odometry Initial motion segmentation Optical flow Epipolar stereo Flow fusion Binocular stereo F Flow S Motion seg.
  • 29. 30 Intermediate Step: Flow Fusion Visual odometry Initial motion segmentation Optical flow Epipolar stereo Flow fusion Binocular stereo Final flow map Final motion segmentation Rigid flow proposal Non-rigid flow proposal Fusion Binary labeling white: non-rigid black: rigid
  • 30. 31 Overview • Introduction • Motivation • Proposed method • Experiments
  • 31. 32 KITTI 2015 Scene Flow Benchmark Our method is ranked 3rd (November 2016) 200 road scenes with multiple moving objects
  • 32. 34
  • 33. 35 Summary of This Research • New unified framework – Stereo (depth / disparity) – Optical flow (2D motion field) – Motion segmentation (binary mask of moving objects) – Visual odometry (6 DoF camera ego-motion) • Accurate: achieved 3rd rank on KITTI benchmark • Fast: 10 - 1000x faster than state-of-the-art methods
  • 34. 36 RIKEN AIP is a wonderful place for young researchers and students. Contact me for internship opportunities Take home message 日本橋オフィス DGX-1 x 24
  • 35. 37