SlideShare a Scribd company logo
1 of 40
Download to read offline
STEREO MATCHING BY DEEP
LEARNING
Yu Huang
Yu.huang07@gmail.com
Sunnyvale, California
Outline
◦ Self-Supervised Learning for Stereo Matching with Self-Improving Ability
◦ Unsupervised Learning of Stereo Matching
◦ Pyramid Stereo Matching Network
◦ Learning for Disparity Estimation through Feature Constancy
◦ Deep Material-aware Cross-spectral Stereo Matching
◦ SegStereo: Exploiting Semantic Information for Disparity Estimation
◦ DispSegNet: Leveraging Semantics for End-to-End Learning of Disparity
Estimation from Stereo Imagery
◦ Group-wise Correlation Stereo Network
Self-Supervised Learning for Stereo
Matching with Self-Improving Ability
◦ A simple CNN architecture that is able to learn to compute dense disparity
maps directly from the stereo inputs.
◦ Training is performed in an e2e fashion without the need of ground-truth
disparity maps.
◦ The idea is to use image warping error (instead of disparity-map residuals) as
the loss function to drive the learning process, aiming to find a depth-map that
minimizes the warping error.
◦ The network is self-adaptive to different unseen imageries as well as to different
camera settings.
Self-Supervised Learning for Stereo
Matching with Self-Improving Ability
The self-supervised deep stereo matching network architecture. The network consists of five modules,
feature extraction, cross feature volume, 3D feature matching, soft-argmin, and warping loss evaluation.
Self-Supervised Learning for Stereo
Matching with Self-Improving Ability
Feature Volume Construction. The cross feature volume is
constructed by concatenating the learned features extracted
from the left and right images correspondingly. The blue
rectangle represents a feature map from the left image, the
stacked orange rectangle set represents traversed right
feature maps from 0 toward a preset disparity range D.
Different intensities correspond to different level of disparity.
Note that the left feature map is copied D + 1 times to match
the traversed right feature maps.
Self-Supervised Learning for Stereo
Matching with Self-Improving Ability
Diagram of our res-TDM module for 3D feature matching with learned regularization. It takes
cross feature volume as an input, and is followed by a series of 3D convolution and deconvolution.
The output of this module is a 3D disparity volume of dimension H × W × (D + 1).
Self-Supervised Learning for Stereo
Matching with Self-Improving Ability
KITTI-2012
Self-Supervised Learning for Stereo
Matching with Self-Improving Ability
KITTI-2015
Unsupervised Learning of Stereo
Matching
◦ A framework for learning stereo matching costs
without human supervision.
◦ This method updates network parameters in an
iterative manner.
◦ It starts with a randomly initialized network.
◦ Left-right check is adopted to guide the training.
◦ Suitable matching is picked and used as training
data in following iterations.
◦ The system finally converges to a stable state.
Unsupervised Learning of Stereo
Matching
The learning network takes stereo images as input, and generates a disparity map. The architecture is with
two branches where the first is for computing the cost-volume and the other is for jointly filtering the volume.
Unsupervised Learning of Stereo
Matching
Configuration of each component, cost-volume branch
(CVB), image feature branch (IFB) and joint filtering
branch (JF), of our network. Torch notations (channels,
kernel, stride) are used to define the convolutional layers.
Unsupervised Learning of Stereo
Matching
The iterative unsupervised training framework consists of four parts: disparity
prediction, confidence map estimation, training data selection and network training.
Unsupervised Learning of Stereo
Matching
KITTI 2015
Pyramid Stereo Matching Network
◦ Current architectures rely on patch-based Siamese networks, lacking the
means to exploit context info. for finding correspondence in ill- posed regions.
◦ To tackle this problem, PSM-Net, a pyramid stereo matching network,
consisting of two main modules: spatial pyramid pooling and 3D CNN.
◦ The spatial pyramid pooling module takes advantage of the capacity of
global context information by aggregating context in different scales and
locations to form a cost volume.
◦ The 3D CNN learns to regularize cost volume using stacked multiple hourglass
networks in conjunction with intermediate supervision.
◦ Codes of PSMNet: https://github.com/JiaRenChang/PSMNet.
Pyramid Stereo Matching Network
Architecture overview of
proposed PSMNet. The left and
right input stereo images are
fed to two weight-sharing
pipelines consisting of a CNN
for feature maps calculation, an
SPP module for feature
harvesting by concatenating
representations from sub-
regions with different sizes, and
a convolution layer for feature
fusion. The left and right image
features are then used to form
a 4D cost volume, which is fed
into a 3D CNN for cost volume
regularization and disparity
regression.
Pyramid Stereo Matching Network
Table 1. Parameters of the proposed PSMNet architecture. Construction of residual blocks are designated in brackets with the
number of stacked blocks. Downsampling is performed by conv0 1 and conv2 1 with stride of 2. The usage of batch
normalization and ReLU follows ResNet, with exception that PSMNet does not apply ReLU after summation.
Pyramid Stereo Matching Network
KITTI 2015
Pyramid Stereo Matching Network
KITTI 2012
Learning for Disparity Estimation
through Feature Constancy
◦ A network architecture to incorporate all steps: matching cost calculation,
matching cost aggregation, disparity calculation, and disparity refinement.
◦ The network consists of three parts.
◦ 1) calculates the multi-scale shared features.
◦ 2) performs matching cost calculation, matching cost aggregation and disparity
calculation to estimate the initial disparity using shared features.
◦ Note: The initial disparity and the shared features are used to calculate the feature
constancy that measures correctness of the correspondence between two input images.
◦ 3) The initial disparity and the feature constancy are then fed into a sub-network to refine
the initial disparity.
◦ Source code: http://github.com/leonzfa/iResNet.
Learning for Disparity Estimation
through Feature Constancy
The architecture. It incorporates all of the four steps for stereo matching into a single network. Note that, the
skip connections between encoder and decoder at different scales are omitted here for better visualization.
Learning for Disparity Estimation
through Feature Constancy
Learning for Disparity Estimation
through Feature Constancy
Comparison with other
state-of-the-art
methods on the KITTI
2015 dataset.
SegStereo: Exploiting Semantic
Information for Disparity
◦ Appropriate incorporation of semantic cues can greatly rectify prediction in
commonly-used disparity estimation frameworks.
◦ This method conducts semantic feature embedding and regularizes semantic
cues as the loss term to improve learning disparity.
◦ The unified model SegStereo employs semantic features from segmentation
and introduces semantic softmax loss, which helps improve the prediction
accuracy of disparity maps.
◦ The semantic cues work well in both unsupervised and supervised manners.
SegStereo: Exploiting Semantic
Information for Disparity
Extract intermediate features from
stereo input. Calculate the cost
volume via the correlation operator.
The left segmentation feature map is
aggregated into disparity branch as
semantic feature embedding. The
right segmentation feature mapis
warped to left view for per-pixel
semantic prediction with softmax
loss regularization. Both steps
incorporate semantic info. to
improve disparity estimation. The
SegStereo framework enables both
unsupervised and supervised
learning, using photometric loss or
disparity regression loss.
SegStereo: Exploiting Semantic
Information for Disparity
unsupervised
SegStereo
models
SegStereo: Exploiting Semantic
Information for Disparity
Supervised-learning
Deep Material-aware Cross-spectral
Stereo Matching
◦ Cross-spectral imaging provides benefits for recognition and detection tasks.
◦ Stereo matching also provides an opportunity to obtain depth without an
active projector source.
◦ Matching images from different spectral bands is challenging because of
large appearance variations.
◦ A deep learning framework to simultaneously transform images across spectral
bands and estimate disparity.
◦ A material-aware loss function is incorporated within the disparity prediction
network to handle regions with unreliable matching such as light sources, glass
windshields and glossy surfaces.
◦ No depth supervision is required.
Deep Material-aware Cross-spectral
Stereo Matching
The disparity prediction network (DPN) predicts left-right disparity for a RGB-NIR stereo input. The spectral
translation network (STN) converts the left RGB image into a pseudo-NIR image. The two networks are
trained simultaneously with reprojection error. The symmetric CNN in (b) prevents the STN learning disparity.
Deep Material-aware Cross-spectral
Stereo Matching
Intermediate results. (a) Left image. (b) material recognition from DeepLab. (c) RGB-to-NIR filters
corrected by exposure and white balancing. The R,G,B values represent the weights of R,G,B channels.
Deep Material-aware Cross-spectral
Stereo Matching
DispSegNet: Leveraging Semantics for End-to-End
Learning of Disparity Estimation from Stereo Imagery
◦ A CNN architecture improves the quality and accuracy of disparity estimation
with the help of semantic segmentation.
◦ A network structure in which these two tasks are highly coupled.
◦ The two-stage refinement process.
◦ Initial disparity estimates are refined with an embedding learned from the
semantic segmentation branch of the network.
◦ The model is trained using an unsupervised approach, in which images from one
of the stereo pair are warped and compared against images from the other.
◦ A single network is capable of outputting disparity estimates and semantic labels.
◦ Leveraging embedding learned from semantic segmentation improves the
performance of disparity estimation.
DispSegNet: Leveraging Semantics for End-to-End
Learning of Disparity Estimation from Stereo Imagery
Architecture. The pipeline consists of: (a) rectified input stereo images. (b) useful features are extracted from input stereo
images. (c) cost volume is formed by concatenating corresponding features from both sides. (d) initial disparity is estimated
from cost volume using 3D convolution. (e) initial disparity is further improved by fusing segment embedding. The PSP
(Pyramid scene parsing) incorporates more context info. for the semantic segmentation task. (f) estimated disparity and
semantic segmentation from both left and right views are generated from the model.
DispSegNet: Leveraging Semantics for End-to-End
Learning of Disparity Estimation from Stereo Imagery
disparity prediction
DispSegNet: Leveraging Semantics for End-to-End
Learning of Disparity Estimation from Stereo Imagery
3D semantic results
Group-wise Correlation Stereo Network
◦ This method tries to construct the cost volume by group-wise correlation.
◦ The left features and the right features are divided into groups along the
channel dimension, and correlation maps are computed among each group to
obtain multiple matching cost proposals, then packed into a cost volume.
◦ Group-wise correlation provides efficient representations for measuring feature
similarities and will not lose too much information like full correlation.
◦ It also preserves better performance when reducing parameters.
◦ The code is available at https://github.com/xy-guo/GwcNet.
Group-wise Correlation Stereo Network
The pipeline of the proposed group-wise correlation network. The whole network consists of four parts, unary
feature extraction, cost volume construction, 3D convolution aggregation, and disparity prediction. The cost
volume is divided into two parts, concatenation volume (Cat) and group-wise correlation volume (Gwc).
Concatenation volume is built by concatenating the compressed left and right features.
Group-wise Correlation Stereo Network
The structure of 3D aggregation network. The network consists of a pre-hourglass module (four
convolutions at the beginning) and three stacked 3D hourglass networks. Compared with PSMNet,
remove the shortcut connections between different hourglass modules and output modules, thus output
modules 0,1,2 can be removed during inference to save time. 1×1×1 3D convolutions are added to the
shortcut connections within hourglass modules.
Group-wise Correlation Stereo Network
Group-wise Correlation Stereo Network
Table: Structure details of the modules. H,
W represents the height and the width of
the input image. S1/2 denotes the
convolution stride. If not specified, each
3D convolution is with a batch
normalization and ReLU.
* denotes the ReLU is not included.
** denotes convolution only.
Thanks

More Related Content

What's hot

Depth Fusion from RGB and Depth Sensors II
Depth Fusion from RGB and Depth Sensors IIDepth Fusion from RGB and Depth Sensors II
Depth Fusion from RGB and Depth Sensors IIYu Huang
 
Understanding Black Box Models with Shapley Values
Understanding Black Box Models with Shapley ValuesUnderstanding Black Box Models with Shapley Values
Understanding Black Box Models with Shapley ValuesJonathan Bechtel
 
Pose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learningPose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learningYu Huang
 
Lec8: Medical Image Segmentation (II) (Region Growing/Merging)
Lec8: Medical Image Segmentation (II) (Region Growing/Merging)Lec8: Medical Image Segmentation (II) (Region Growing/Merging)
Lec8: Medical Image Segmentation (II) (Region Growing/Merging)Ulaş Bağcı
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networksSi Haem
 
Chapter10 image segmentation
Chapter10 image segmentationChapter10 image segmentation
Chapter10 image segmentationasodariyabhavesh
 
PR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision LearnersPR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision LearnersJinwon Lee
 
Image Interpolation Techniques with Optical and Digital Zoom Concepts
Image Interpolation Techniques with Optical and Digital Zoom ConceptsImage Interpolation Techniques with Optical and Digital Zoom Concepts
Image Interpolation Techniques with Optical and Digital Zoom Conceptsmmjalbiaty
 
Transfer Learning: An overview
Transfer Learning: An overviewTransfer Learning: An overview
Transfer Learning: An overviewjins0618
 
Transfer Learning and Fine-tuning Deep Neural Networks
 Transfer Learning and Fine-tuning Deep Neural Networks Transfer Learning and Fine-tuning Deep Neural Networks
Transfer Learning and Fine-tuning Deep Neural NetworksPyData
 
PR-386: Light Field Networks: Neural Scene Representations with Single-Evalua...
PR-386: Light Field Networks: Neural Scene Representations with Single-Evalua...PR-386: Light Field Networks: Neural Scene Representations with Single-Evalua...
PR-386: Light Field Networks: Neural Scene Representations with Single-Evalua...Hyeongmin Lee
 
Chapter 3 image enhancement (spatial domain)
Chapter 3 image enhancement (spatial domain)Chapter 3 image enhancement (spatial domain)
Chapter 3 image enhancement (spatial domain)asodariyabhavesh
 
U-Netpresentation.pptx
U-Netpresentation.pptxU-Netpresentation.pptx
U-Netpresentation.pptxNoorUlHaq47
 
Image Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyImage Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyNUPUR YADAV
 
Connected component labeling algorithm
Connected component labeling algorithmConnected component labeling algorithm
Connected component labeling algorithmManas Mantri
 

What's hot (20)

Depth Fusion from RGB and Depth Sensors II
Depth Fusion from RGB and Depth Sensors IIDepth Fusion from RGB and Depth Sensors II
Depth Fusion from RGB and Depth Sensors II
 
Understanding Black Box Models with Shapley Values
Understanding Black Box Models with Shapley ValuesUnderstanding Black Box Models with Shapley Values
Understanding Black Box Models with Shapley Values
 
Image captioning
Image captioningImage captioning
Image captioning
 
Pose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learningPose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learning
 
Lec8: Medical Image Segmentation (II) (Region Growing/Merging)
Lec8: Medical Image Segmentation (II) (Region Growing/Merging)Lec8: Medical Image Segmentation (II) (Region Growing/Merging)
Lec8: Medical Image Segmentation (II) (Region Growing/Merging)
 
Depth estimation using deep learning
Depth estimation using deep learningDepth estimation using deep learning
Depth estimation using deep learning
 
Computer Vision Crash Course
Computer Vision Crash CourseComputer Vision Crash Course
Computer Vision Crash Course
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networks
 
Chapter10 image segmentation
Chapter10 image segmentationChapter10 image segmentation
Chapter10 image segmentation
 
PR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision LearnersPR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision Learners
 
Image Interpolation Techniques with Optical and Digital Zoom Concepts
Image Interpolation Techniques with Optical and Digital Zoom ConceptsImage Interpolation Techniques with Optical and Digital Zoom Concepts
Image Interpolation Techniques with Optical and Digital Zoom Concepts
 
Transfer Learning: An overview
Transfer Learning: An overviewTransfer Learning: An overview
Transfer Learning: An overview
 
Transfer Learning and Fine-tuning Deep Neural Networks
 Transfer Learning and Fine-tuning Deep Neural Networks Transfer Learning and Fine-tuning Deep Neural Networks
Transfer Learning and Fine-tuning Deep Neural Networks
 
PR-386: Light Field Networks: Neural Scene Representations with Single-Evalua...
PR-386: Light Field Networks: Neural Scene Representations with Single-Evalua...PR-386: Light Field Networks: Neural Scene Representations with Single-Evalua...
PR-386: Light Field Networks: Neural Scene Representations with Single-Evalua...
 
Chapter 3 image enhancement (spatial domain)
Chapter 3 image enhancement (spatial domain)Chapter 3 image enhancement (spatial domain)
Chapter 3 image enhancement (spatial domain)
 
U-Netpresentation.pptx
U-Netpresentation.pptxU-Netpresentation.pptx
U-Netpresentation.pptx
 
Video Processing Applications
Video Processing ApplicationsVideo Processing Applications
Video Processing Applications
 
Image Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyImage Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A survey
 
Connected component labeling algorithm
Connected component labeling algorithmConnected component labeling algorithm
Connected component labeling algorithm
 
Digital Image Fundamentals - II
Digital Image Fundamentals - IIDigital Image Fundamentals - II
Digital Image Fundamentals - II
 

Similar to Stereo Matching by Deep Learning

3-d interpretation from stereo images for autonomous driving
3-d interpretation from stereo images for autonomous driving3-d interpretation from stereo images for autonomous driving
3-d interpretation from stereo images for autonomous drivingYu Huang
 
A deep learning based stereo matching model for autonomous vehicle
A deep learning based stereo matching model for autonomous vehicleA deep learning based stereo matching model for autonomous vehicle
A deep learning based stereo matching model for autonomous vehicleIAESIJAI
 
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdfmokamojah
 
Paper 58 disparity-of_stereo_images_by_self_adaptive_algorithm
Paper 58 disparity-of_stereo_images_by_self_adaptive_algorithmPaper 58 disparity-of_stereo_images_by_self_adaptive_algorithm
Paper 58 disparity-of_stereo_images_by_self_adaptive_algorithmMDABDULMANNANMONDAL
 
Unsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object trackingUnsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object trackingYu Huang
 
A Review On Single Image Depth Prediction with Wavelet Decomposition
A Review On Single Image Depth Prediction with Wavelet DecompositionA Review On Single Image Depth Prediction with Wavelet Decomposition
A Review On Single Image Depth Prediction with Wavelet DecompositionIRJET Journal
 
IRJET- Automatic Data Collection from Forms using Optical Character Recognition
IRJET- Automatic Data Collection from Forms using Optical Character RecognitionIRJET- Automatic Data Collection from Forms using Optical Character Recognition
IRJET- Automatic Data Collection from Forms using Optical Character RecognitionIRJET Journal
 
Deep learning for 3-D Scene Reconstruction and Modeling
Deep learning for 3-D Scene Reconstruction and Modeling Deep learning for 3-D Scene Reconstruction and Modeling
Deep learning for 3-D Scene Reconstruction and Modeling Yu Huang
 
Video Annotation for Visual Tracking via Selection and Refinement_tran.pptx
Video Annotation for Visual Tracking via Selection and Refinement_tran.pptxVideo Annotation for Visual Tracking via Selection and Refinement_tran.pptx
Video Annotation for Visual Tracking via Selection and Refinement_tran.pptxAlyaaMachi
 
mvitelli_ee367_final_report
mvitelli_ee367_final_reportmvitelli_ee367_final_report
mvitelli_ee367_final_reportMatt Vitelli
 
Locate, Size and Count: Accurately Resolving People in Dense Crowds via Detec...
Locate, Size and Count: Accurately Resolving People in Dense Crowds via Detec...Locate, Size and Count: Accurately Resolving People in Dense Crowds via Detec...
Locate, Size and Count: Accurately Resolving People in Dense Crowds via Detec...IRJET Journal
 
Decomposing image generation into layout priction and conditional synthesis
Decomposing image generation into layout priction and conditional synthesisDecomposing image generation into layout priction and conditional synthesis
Decomposing image generation into layout priction and conditional synthesisNaeem Shehzad
 
Automatic Detection of Window Regions in Indoor Point Clouds Using R-CNN
Automatic Detection of Window Regions in Indoor Point Clouds Using R-CNNAutomatic Detection of Window Regions in Indoor Point Clouds Using R-CNN
Automatic Detection of Window Regions in Indoor Point Clouds Using R-CNNZihao(Gerald) Zhang
 
Video Stitching using Improved RANSAC and SIFT
Video Stitching using Improved RANSAC and SIFTVideo Stitching using Improved RANSAC and SIFT
Video Stitching using Improved RANSAC and SIFTIRJET Journal
 
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...IRJET Journal
 
Stereo Correspondence Algorithms for Robotic Applications Under Ideal And Non...
Stereo Correspondence Algorithms for Robotic Applications Under Ideal And Non...Stereo Correspondence Algorithms for Robotic Applications Under Ideal And Non...
Stereo Correspondence Algorithms for Robotic Applications Under Ideal And Non...CSCJournals
 
IMPROVEMENT IN IMAGE DENOISING OF HANDWRITTEN DIGITS USING AUTOENCODERS IN DE...
IMPROVEMENT IN IMAGE DENOISING OF HANDWRITTEN DIGITS USING AUTOENCODERS IN DE...IMPROVEMENT IN IMAGE DENOISING OF HANDWRITTEN DIGITS USING AUTOENCODERS IN DE...
IMPROVEMENT IN IMAGE DENOISING OF HANDWRITTEN DIGITS USING AUTOENCODERS IN DE...IRJET Journal
 

Similar to Stereo Matching by Deep Learning (20)

3-d interpretation from stereo images for autonomous driving
3-d interpretation from stereo images for autonomous driving3-d interpretation from stereo images for autonomous driving
3-d interpretation from stereo images for autonomous driving
 
A deep learning based stereo matching model for autonomous vehicle
A deep learning based stereo matching model for autonomous vehicleA deep learning based stereo matching model for autonomous vehicle
A deep learning based stereo matching model for autonomous vehicle
 
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
 
32
3232
32
 
Paper 58 disparity-of_stereo_images_by_self_adaptive_algorithm
Paper 58 disparity-of_stereo_images_by_self_adaptive_algorithmPaper 58 disparity-of_stereo_images_by_self_adaptive_algorithm
Paper 58 disparity-of_stereo_images_by_self_adaptive_algorithm
 
Unsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object trackingUnsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object tracking
 
A Review On Single Image Depth Prediction with Wavelet Decomposition
A Review On Single Image Depth Prediction with Wavelet DecompositionA Review On Single Image Depth Prediction with Wavelet Decomposition
A Review On Single Image Depth Prediction with Wavelet Decomposition
 
IRJET- Automatic Data Collection from Forms using Optical Character Recognition
IRJET- Automatic Data Collection from Forms using Optical Character RecognitionIRJET- Automatic Data Collection from Forms using Optical Character Recognition
IRJET- Automatic Data Collection from Forms using Optical Character Recognition
 
Deep learning for 3-D Scene Reconstruction and Modeling
Deep learning for 3-D Scene Reconstruction and Modeling Deep learning for 3-D Scene Reconstruction and Modeling
Deep learning for 3-D Scene Reconstruction and Modeling
 
Video Annotation for Visual Tracking via Selection and Refinement_tran.pptx
Video Annotation for Visual Tracking via Selection and Refinement_tran.pptxVideo Annotation for Visual Tracking via Selection and Refinement_tran.pptx
Video Annotation for Visual Tracking via Selection and Refinement_tran.pptx
 
mvitelli_ee367_final_report
mvitelli_ee367_final_reportmvitelli_ee367_final_report
mvitelli_ee367_final_report
 
Ay33292297
Ay33292297Ay33292297
Ay33292297
 
Ay33292297
Ay33292297Ay33292297
Ay33292297
 
Locate, Size and Count: Accurately Resolving People in Dense Crowds via Detec...
Locate, Size and Count: Accurately Resolving People in Dense Crowds via Detec...Locate, Size and Count: Accurately Resolving People in Dense Crowds via Detec...
Locate, Size and Count: Accurately Resolving People in Dense Crowds via Detec...
 
Decomposing image generation into layout priction and conditional synthesis
Decomposing image generation into layout priction and conditional synthesisDecomposing image generation into layout priction and conditional synthesis
Decomposing image generation into layout priction and conditional synthesis
 
Automatic Detection of Window Regions in Indoor Point Clouds Using R-CNN
Automatic Detection of Window Regions in Indoor Point Clouds Using R-CNNAutomatic Detection of Window Regions in Indoor Point Clouds Using R-CNN
Automatic Detection of Window Regions in Indoor Point Clouds Using R-CNN
 
Video Stitching using Improved RANSAC and SIFT
Video Stitching using Improved RANSAC and SIFTVideo Stitching using Improved RANSAC and SIFT
Video Stitching using Improved RANSAC and SIFT
 
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
 
Stereo Correspondence Algorithms for Robotic Applications Under Ideal And Non...
Stereo Correspondence Algorithms for Robotic Applications Under Ideal And Non...Stereo Correspondence Algorithms for Robotic Applications Under Ideal And Non...
Stereo Correspondence Algorithms for Robotic Applications Under Ideal And Non...
 
IMPROVEMENT IN IMAGE DENOISING OF HANDWRITTEN DIGITS USING AUTOENCODERS IN DE...
IMPROVEMENT IN IMAGE DENOISING OF HANDWRITTEN DIGITS USING AUTOENCODERS IN DE...IMPROVEMENT IN IMAGE DENOISING OF HANDWRITTEN DIGITS USING AUTOENCODERS IN DE...
IMPROVEMENT IN IMAGE DENOISING OF HANDWRITTEN DIGITS USING AUTOENCODERS IN DE...
 

More from Yu Huang

Application of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingApplication of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingYu Huang
 
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...Yu Huang
 
Data Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous DrivingData Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous DrivingYu Huang
 
Techniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingTechniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingYu Huang
 
BEV Joint Detection and Segmentation
BEV Joint Detection and SegmentationBEV Joint Detection and Segmentation
BEV Joint Detection and SegmentationYu Huang
 
BEV Object Detection and Prediction
BEV Object Detection and PredictionBEV Object Detection and Prediction
BEV Object Detection and PredictionYu Huang
 
Fisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIFisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIYu Huang
 
Fisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VFisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VYu Huang
 
Fisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IVFisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IVYu Huang
 
Prediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduPrediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduYu Huang
 
Cruise AI under the Hood
Cruise AI under the HoodCruise AI under the Hood
Cruise AI under the HoodYu Huang
 
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)Yu Huang
 
Scenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingScenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingYu Huang
 
How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?Yu Huang
 
Annotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingAnnotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingYu Huang
 
Simulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgSimulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgYu Huang
 
Multi sensor calibration by deep learning
Multi sensor calibration by deep learningMulti sensor calibration by deep learning
Multi sensor calibration by deep learningYu Huang
 
Prediction and planning for self driving at waymo
Prediction and planning for self driving at waymoPrediction and planning for self driving at waymo
Prediction and planning for self driving at waymoYu Huang
 
Jointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planningJointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planningYu Huang
 
Data pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingData pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingYu Huang
 

More from Yu Huang (20)

Application of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingApplication of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous Driving
 
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
 
Data Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous DrivingData Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous Driving
 
Techniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingTechniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous Driving
 
BEV Joint Detection and Segmentation
BEV Joint Detection and SegmentationBEV Joint Detection and Segmentation
BEV Joint Detection and Segmentation
 
BEV Object Detection and Prediction
BEV Object Detection and PredictionBEV Object Detection and Prediction
BEV Object Detection and Prediction
 
Fisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIFisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VI
 
Fisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VFisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving V
 
Fisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IVFisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IV
 
Prediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduPrediction,Planninng & Control at Baidu
Prediction,Planninng & Control at Baidu
 
Cruise AI under the Hood
Cruise AI under the HoodCruise AI under the Hood
Cruise AI under the Hood
 
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
 
Scenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingScenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous Driving
 
How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?
 
Annotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingAnnotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous Driving
 
Simulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgSimulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atg
 
Multi sensor calibration by deep learning
Multi sensor calibration by deep learningMulti sensor calibration by deep learning
Multi sensor calibration by deep learning
 
Prediction and planning for self driving at waymo
Prediction and planning for self driving at waymoPrediction and planning for self driving at waymo
Prediction and planning for self driving at waymo
 
Jointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planningJointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planning
 
Data pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingData pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous driving
 

Recently uploaded

PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLManishPatel169454
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICSUNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICSrknatarajan
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756dollysharma2066
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesPrabhanshu Chaturvedi
 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Christo Ananth
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . pptDineshKumar4165
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdfSuman Jyoti
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdfKamal Acharya
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringmulugeta48
 

Recently uploaded (20)

PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICSUNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and Properties
 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 

Stereo Matching by Deep Learning

  • 1. STEREO MATCHING BY DEEP LEARNING Yu Huang Yu.huang07@gmail.com Sunnyvale, California
  • 2. Outline ◦ Self-Supervised Learning for Stereo Matching with Self-Improving Ability ◦ Unsupervised Learning of Stereo Matching ◦ Pyramid Stereo Matching Network ◦ Learning for Disparity Estimation through Feature Constancy ◦ Deep Material-aware Cross-spectral Stereo Matching ◦ SegStereo: Exploiting Semantic Information for Disparity Estimation ◦ DispSegNet: Leveraging Semantics for End-to-End Learning of Disparity Estimation from Stereo Imagery ◦ Group-wise Correlation Stereo Network
  • 3. Self-Supervised Learning for Stereo Matching with Self-Improving Ability ◦ A simple CNN architecture that is able to learn to compute dense disparity maps directly from the stereo inputs. ◦ Training is performed in an e2e fashion without the need of ground-truth disparity maps. ◦ The idea is to use image warping error (instead of disparity-map residuals) as the loss function to drive the learning process, aiming to find a depth-map that minimizes the warping error. ◦ The network is self-adaptive to different unseen imageries as well as to different camera settings.
  • 4. Self-Supervised Learning for Stereo Matching with Self-Improving Ability The self-supervised deep stereo matching network architecture. The network consists of five modules, feature extraction, cross feature volume, 3D feature matching, soft-argmin, and warping loss evaluation.
  • 5. Self-Supervised Learning for Stereo Matching with Self-Improving Ability Feature Volume Construction. The cross feature volume is constructed by concatenating the learned features extracted from the left and right images correspondingly. The blue rectangle represents a feature map from the left image, the stacked orange rectangle set represents traversed right feature maps from 0 toward a preset disparity range D. Different intensities correspond to different level of disparity. Note that the left feature map is copied D + 1 times to match the traversed right feature maps.
  • 6. Self-Supervised Learning for Stereo Matching with Self-Improving Ability Diagram of our res-TDM module for 3D feature matching with learned regularization. It takes cross feature volume as an input, and is followed by a series of 3D convolution and deconvolution. The output of this module is a 3D disparity volume of dimension H × W × (D + 1).
  • 7. Self-Supervised Learning for Stereo Matching with Self-Improving Ability KITTI-2012
  • 8. Self-Supervised Learning for Stereo Matching with Self-Improving Ability KITTI-2015
  • 9. Unsupervised Learning of Stereo Matching ◦ A framework for learning stereo matching costs without human supervision. ◦ This method updates network parameters in an iterative manner. ◦ It starts with a randomly initialized network. ◦ Left-right check is adopted to guide the training. ◦ Suitable matching is picked and used as training data in following iterations. ◦ The system finally converges to a stable state.
  • 10. Unsupervised Learning of Stereo Matching The learning network takes stereo images as input, and generates a disparity map. The architecture is with two branches where the first is for computing the cost-volume and the other is for jointly filtering the volume.
  • 11. Unsupervised Learning of Stereo Matching Configuration of each component, cost-volume branch (CVB), image feature branch (IFB) and joint filtering branch (JF), of our network. Torch notations (channels, kernel, stride) are used to define the convolutional layers.
  • 12. Unsupervised Learning of Stereo Matching The iterative unsupervised training framework consists of four parts: disparity prediction, confidence map estimation, training data selection and network training.
  • 13. Unsupervised Learning of Stereo Matching KITTI 2015
  • 14. Pyramid Stereo Matching Network ◦ Current architectures rely on patch-based Siamese networks, lacking the means to exploit context info. for finding correspondence in ill- posed regions. ◦ To tackle this problem, PSM-Net, a pyramid stereo matching network, consisting of two main modules: spatial pyramid pooling and 3D CNN. ◦ The spatial pyramid pooling module takes advantage of the capacity of global context information by aggregating context in different scales and locations to form a cost volume. ◦ The 3D CNN learns to regularize cost volume using stacked multiple hourglass networks in conjunction with intermediate supervision. ◦ Codes of PSMNet: https://github.com/JiaRenChang/PSMNet.
  • 15. Pyramid Stereo Matching Network Architecture overview of proposed PSMNet. The left and right input stereo images are fed to two weight-sharing pipelines consisting of a CNN for feature maps calculation, an SPP module for feature harvesting by concatenating representations from sub- regions with different sizes, and a convolution layer for feature fusion. The left and right image features are then used to form a 4D cost volume, which is fed into a 3D CNN for cost volume regularization and disparity regression.
  • 16. Pyramid Stereo Matching Network Table 1. Parameters of the proposed PSMNet architecture. Construction of residual blocks are designated in brackets with the number of stacked blocks. Downsampling is performed by conv0 1 and conv2 1 with stride of 2. The usage of batch normalization and ReLU follows ResNet, with exception that PSMNet does not apply ReLU after summation.
  • 17. Pyramid Stereo Matching Network KITTI 2015
  • 18. Pyramid Stereo Matching Network KITTI 2012
  • 19. Learning for Disparity Estimation through Feature Constancy ◦ A network architecture to incorporate all steps: matching cost calculation, matching cost aggregation, disparity calculation, and disparity refinement. ◦ The network consists of three parts. ◦ 1) calculates the multi-scale shared features. ◦ 2) performs matching cost calculation, matching cost aggregation and disparity calculation to estimate the initial disparity using shared features. ◦ Note: The initial disparity and the shared features are used to calculate the feature constancy that measures correctness of the correspondence between two input images. ◦ 3) The initial disparity and the feature constancy are then fed into a sub-network to refine the initial disparity. ◦ Source code: http://github.com/leonzfa/iResNet.
  • 20. Learning for Disparity Estimation through Feature Constancy The architecture. It incorporates all of the four steps for stereo matching into a single network. Note that, the skip connections between encoder and decoder at different scales are omitted here for better visualization.
  • 21. Learning for Disparity Estimation through Feature Constancy
  • 22. Learning for Disparity Estimation through Feature Constancy Comparison with other state-of-the-art methods on the KITTI 2015 dataset.
  • 23. SegStereo: Exploiting Semantic Information for Disparity ◦ Appropriate incorporation of semantic cues can greatly rectify prediction in commonly-used disparity estimation frameworks. ◦ This method conducts semantic feature embedding and regularizes semantic cues as the loss term to improve learning disparity. ◦ The unified model SegStereo employs semantic features from segmentation and introduces semantic softmax loss, which helps improve the prediction accuracy of disparity maps. ◦ The semantic cues work well in both unsupervised and supervised manners.
  • 24. SegStereo: Exploiting Semantic Information for Disparity Extract intermediate features from stereo input. Calculate the cost volume via the correlation operator. The left segmentation feature map is aggregated into disparity branch as semantic feature embedding. The right segmentation feature mapis warped to left view for per-pixel semantic prediction with softmax loss regularization. Both steps incorporate semantic info. to improve disparity estimation. The SegStereo framework enables both unsupervised and supervised learning, using photometric loss or disparity regression loss.
  • 25. SegStereo: Exploiting Semantic Information for Disparity unsupervised SegStereo models
  • 26. SegStereo: Exploiting Semantic Information for Disparity Supervised-learning
  • 27. Deep Material-aware Cross-spectral Stereo Matching ◦ Cross-spectral imaging provides benefits for recognition and detection tasks. ◦ Stereo matching also provides an opportunity to obtain depth without an active projector source. ◦ Matching images from different spectral bands is challenging because of large appearance variations. ◦ A deep learning framework to simultaneously transform images across spectral bands and estimate disparity. ◦ A material-aware loss function is incorporated within the disparity prediction network to handle regions with unreliable matching such as light sources, glass windshields and glossy surfaces. ◦ No depth supervision is required.
  • 28. Deep Material-aware Cross-spectral Stereo Matching The disparity prediction network (DPN) predicts left-right disparity for a RGB-NIR stereo input. The spectral translation network (STN) converts the left RGB image into a pseudo-NIR image. The two networks are trained simultaneously with reprojection error. The symmetric CNN in (b) prevents the STN learning disparity.
  • 29. Deep Material-aware Cross-spectral Stereo Matching Intermediate results. (a) Left image. (b) material recognition from DeepLab. (c) RGB-to-NIR filters corrected by exposure and white balancing. The R,G,B values represent the weights of R,G,B channels.
  • 31. DispSegNet: Leveraging Semantics for End-to-End Learning of Disparity Estimation from Stereo Imagery ◦ A CNN architecture improves the quality and accuracy of disparity estimation with the help of semantic segmentation. ◦ A network structure in which these two tasks are highly coupled. ◦ The two-stage refinement process. ◦ Initial disparity estimates are refined with an embedding learned from the semantic segmentation branch of the network. ◦ The model is trained using an unsupervised approach, in which images from one of the stereo pair are warped and compared against images from the other. ◦ A single network is capable of outputting disparity estimates and semantic labels. ◦ Leveraging embedding learned from semantic segmentation improves the performance of disparity estimation.
  • 32. DispSegNet: Leveraging Semantics for End-to-End Learning of Disparity Estimation from Stereo Imagery Architecture. The pipeline consists of: (a) rectified input stereo images. (b) useful features are extracted from input stereo images. (c) cost volume is formed by concatenating corresponding features from both sides. (d) initial disparity is estimated from cost volume using 3D convolution. (e) initial disparity is further improved by fusing segment embedding. The PSP (Pyramid scene parsing) incorporates more context info. for the semantic segmentation task. (f) estimated disparity and semantic segmentation from both left and right views are generated from the model.
  • 33. DispSegNet: Leveraging Semantics for End-to-End Learning of Disparity Estimation from Stereo Imagery disparity prediction
  • 34. DispSegNet: Leveraging Semantics for End-to-End Learning of Disparity Estimation from Stereo Imagery 3D semantic results
  • 35. Group-wise Correlation Stereo Network ◦ This method tries to construct the cost volume by group-wise correlation. ◦ The left features and the right features are divided into groups along the channel dimension, and correlation maps are computed among each group to obtain multiple matching cost proposals, then packed into a cost volume. ◦ Group-wise correlation provides efficient representations for measuring feature similarities and will not lose too much information like full correlation. ◦ It also preserves better performance when reducing parameters. ◦ The code is available at https://github.com/xy-guo/GwcNet.
  • 36. Group-wise Correlation Stereo Network The pipeline of the proposed group-wise correlation network. The whole network consists of four parts, unary feature extraction, cost volume construction, 3D convolution aggregation, and disparity prediction. The cost volume is divided into two parts, concatenation volume (Cat) and group-wise correlation volume (Gwc). Concatenation volume is built by concatenating the compressed left and right features.
  • 37. Group-wise Correlation Stereo Network The structure of 3D aggregation network. The network consists of a pre-hourglass module (four convolutions at the beginning) and three stacked 3D hourglass networks. Compared with PSMNet, remove the shortcut connections between different hourglass modules and output modules, thus output modules 0,1,2 can be removed during inference to save time. 1×1×1 3D convolutions are added to the shortcut connections within hourglass modules.
  • 39. Group-wise Correlation Stereo Network Table: Structure details of the modules. H, W represents the height and the width of the input image. S1/2 denotes the convolution stride. If not specified, each 3D convolution is with a batch normalization and ReLU. * denotes the ReLU is not included. ** denotes convolution only.