SlideShare ist ein Scribd-Unternehmen logo
1 von 40
Downloaden Sie, um offline zu lesen
Depth Fusion from RGB and
Depth Sensors III
Yu Huang
Yu.huang07@gmail.com
Sunnyvale, California
Outline
• Propagating Confidences through CNNs for Sparse Data Regression
• Sparse and Dense Data with CNNs: Depth Completion and Semantic Segmentation
• High-precision Depth Estimation with the 3D LiDAR and Stereo Fusion
• Learn Morph. Operators for Depth Completion
• DeepLiDAR: Deep Surface Normal Guided Depth Prediction from LiDAR and Color Image
• Dense Depth Posterior (DDP) from Single Image and Sparse Range
• DFuseNet: Fusion of RGB and Sparse Depth for Image Guided Dense Depth Completion
• 3D LiDAR and Stereo Fusion using Stereo Matching Network with Conditional Cost Volume
Normalization
• Sparse and noisy LiDAR completion with RGB guidance and uncertainty
Propagating Confidences through CNNs for
Sparse Data Regression
• An algebraically-constrained convolution layer for CNNs with sparse input;
• Strategies for determining the confidence from the convolution operation and propagating
it to consecutive layers.
• An objective function that simultaneously minimizes the data error while maximizing the
output confidence.
• This approach produces a continuous pixel-wise confidence map enabling information
fusion, state inference, and decision support.
Propagating Confidences through CNNs for
Sparse Data Regression
The multi-scale architecture for the task of scene depth completion which utilizes normalized convolution layers
Propagating Confidences through CNNs for
Sparse Data Regression
Top-left : input RGB image, top-right : projected LiDAR point cloud
bottom- left : output from our method, bottom-right : error map in logarithmic scale.
Sparse and Dense Data with CNNs: Depth
Completion and Semantic Segmentation
• CNNs are designed for dense data, but vision data is often sparse (stereo depth, point
clouds, pen stroke, etc.).
• A method to handle sparse depth data with optional dense RGB, and accomplish depth
completion and semantic segmentation changing only the last layer.
• It is a sparse training strategy and a late fusion scheme for dense RGB + sparse depth.
• Following study of sparse data and validity mask, decide not to use any additional mask
proving that the network learns sparsity invariant features by itself.
• Ensure network robustness to varying input sparsities.
• It even works with densities as low as 0.8% (8 layer lidar), and outperforms all published SoA
on the Kitti depth completion benchmark.
• Changing only the last layer, we also performed semantic segmentation on synthetic and
real datasets.
Sparse and Dense Data with CNNs: Depth
Completion and Semantic Segmentation
A network architecture adapted from NASNetNot use a validity mask
Sparse and Dense Data with CNNs: Depth
Completion and Semantic Segmentation
A validity mask is a binary matrix of same size as the input data, with ones indicating available input
data and zeros elsewhere.
However, the validity information is quickly lost in the later layers.
This is a consequence of the normalization phase on the number of valid pixels, which processes a
mask with only one valid pixel in the same way as a fully valid mask.
Another consequence is that the network tends to produce blurry outputs.
Sparse and Dense Data with CNNs: Depth
Completion and Semantic Segmentation
A naive strategy consists of averaging separate predictions from each modality. An alternative is to
apply an early fusion modalities are simply concatenated channel-wise and fed to the network.
It appears preferable to transform different representations (RGB intensities, distance values) to a
similar feature space before fusing them (known as late fusion).
Sparse and Dense Data with CNNs: Depth
Completion and Semantic Segmentation
sD = sparse depth
Sparse and Dense Data with CNNs: Depth
Completion and Semantic Segmentation
High-precision Depth Estimation with the
3D LiDAR and Stereo Fusion
• A deep CNN architecture for high-precision depth estimation by jointly utilizing sparse 3D
LiDAR and dense stereo depth information.
• In this network, the complementary characteristics of sparse 3D LiDAR and dense stereo
depth are simultaneously encoded in a boosting manner.
• Tailored to the LiDAR and stereo fusion problem, this network differs from previous CNNs in
the incorporation of a compact convolution module, which can be deployed with the
constraints of mobile devices.
• As training data for the LiDAR and stereo fusion is rather limited, a simple yet effective
approach for reproducing the raw KITTI dataset is used.
• The raw LIDAR scans are augmented by adapting an off-the-shelf stereo algorithm and a
confidence measure.
High-precision Depth Estimation with the
3D LiDAR and Stereo Fusion
High-precision Depth Estimation with the
3D LiDAR and Stereo Fusion
LiDAR and stereo fusion: (from top to down) Input color image, LiDAR disparity, the result of SGM and fusion.
High-precision Depth Estimation with the
3D LiDAR and Stereo Fusion
High-precision Depth Estimation with the
3D LiDAR and Stereo Fusion
Learning Morphological Operators for
Depth Completion
• A method for completing sparse depth images in a semantically accurate manner by training
a novel morphological NN.
• It approximates morphological operations by Contra-harmonic Mean Filter layers which are
trained in a contemporary NN framework.
• An early fusion U-Net architecture then combines dilated depth channels and RGB.
• Using a large scale RGB-D dataset to learn the optimal morphological and convolutional
filter shapes that produce a fully sampled depth image at the output.
• The resulting depth images is used to augment intelligent vehicles perception systems.
Learning Morphological Operators for
Depth Completion
Learning Morphological Operators for
Depth Completion
• Morphological operators are the foundation of many image segmentation algorithms.
• Using so called “structuring elements” they represent non-linear operations which
compute the minimum, maximum or the combination of both within the element.
• In the context of depth completion, it is of interest to learn the shape and the operation type
that fits best the data.
• The approximation of morphological operators by the contra-harmonic mean (CHM) filter is
the best founded technique which can easily be integrated in a deep learning framework.
The contra-harmonic mean filter function ψk (x) is modeled as the power-weighted 2D
convolution of the image f (x) and a filter w representing the structuring element
Learning Morphological Operators for
Depth Completion
DeepLiDAR: Deep Surface Normal Guided Depth Prediction for
Outdoor Scene from Sparse LiDAR Data and Single Color Image
• A deep learning architecture that produces accurate dense depth for the
outdoor scene from a single color image and a sparse depth.
• This network estimates surface normals as the intermediate
representation to produce dense depth, and can be trained end-to-end.
• With a modified encoder-decoder structure, this network effectively fuses
the dense color image and the sparse LiDAR depth.
• To address outdoor specific challenges, it predicts a confidence mask to
handle mixed LiDAR signals near FG boundaries due to occlusion, and
combines estimates from the color image and surface normals with
learned attention maps to improve the depth accuracy especially for
distant areas.
• Comprehensive analysis shows that this model generalizes well to the
input with higher sparsity or from indoor scenes.
DeepLiDAR: Deep Surface Normal Guided Depth Prediction for
Outdoor Scene from Sparse LiDAR Data and Single Color Image
It takes as input a color image
and a sparse depth image
from the LiDAR (Row 1), and
output a dense depth map
(Row 2). It estimates surface
normals (Row 3) as the
intermediate representation.
DeepLiDAR: Deep Surface Normal Guided Depth Prediction for
Outdoor Scene from Sparse LiDAR Data and Single Color Image
The pipeline. It consists of two pathways. Both starting from a RGB image, a sparse depth, and a binary mask as the
inputs, the surface normal pathway produces a pixel-wise surface normal, further combined with the sparse depth and
a confidence mask from the color pathway to produce a dense depth. The color pathway produces a dense depth too.
The final dense depth output is the weighted sum of the depths from two pathways using the estimated attention map.
DeepLiDAR: Deep Surface Normal Guided Depth Prediction for
Outdoor Scene from Sparse LiDAR Data and Single Color Image
Detailed architecture of deep completion unit. Occlusion and learned confidence.
DeepLiDAR: Deep Surface Normal Guided Depth Prediction for
Outdoor Scene from Sparse LiDAR Data and Single Color Image
DeepLiDAR: Deep Surface Normal Guided Depth Prediction for
Outdoor Scene from Sparse LiDAR Data and Single Color Image
Dense Depth Posterior (DDP) from Single
Image and Sparse Range
• A deep learning system to infer the posterior distribution of a dense depth
map associated with an image, by exploiting sparse range measurements, for
instance from a lidar.
• While the lidar may provide a depth value for a small percentage of the
pixels, exploit regularities reflected in the training set to complete the map so
as to have a probability over depth for each pixel in the image.
• To exploit a Conditional Prior Network, that allows associating a probability
to each depth value given an image, and combine it with a likelihood term
that uses the sparse measurements.
• Optionally exploit the availability of stereo during training, but in any case
only require a single image and a sparse point cloud at run-time.
Dense Depth Posterior (DDP) from Single
Image and Sparse Range
(A): Architecture of a Conditional Prior Network (CPN) to learn the conditional of the
dense depth given a single image. (B): Depth Completion Network (DCN) for learning
the mapping from sparse depth map and image to dense depth map.
Dense Depth Posterior (DDP) from Single
Image and Sparse Range
An image (top) is insufficient to determine the
geometry of the scene
A point cloud alone (middle) is similarly ambiguous.
Combining a single image, the lidar point cloud, and
previously seen scenes allows inferring a dense
depth map (bottom) with high confidence
Color bar from left to right: zero to infinity.
DFuseNet: Deep Fusion of RGB and Sparse Depth
Information for Image Guided Dense Depth Completion
• A CNN that is designed to upsample a series of sparse range measurements based on the
contextual cues gleaned from a HR intensity image.
• It draws inspiration from related work on SR and inpainting.
• An architecture that seeks to pull contextual cues separately from the intensity image and the
depth features and then fuse them later in the network.
• It effectively exploits the relationship between the two modalities and produces accurate
results while respecting salient image structures.
Input color image
LiDAR scan mask
DFuseNet
DFuseNet: Deep Fusion of RGB and Sparse Depth
Information for Image Guided Dense Depth Completion
The network architecture uses two input branches for RGB depth input respectively. Spatial
Pyramid Pooling (SPP) blocks are used in the encoder and a hierarchical representation of
decoder features are used to predict dense depth images.
DFuseNet: Deep Fusion of RGB and Sparse Depth
Information for Image Guided Dense Depth Completion
Spatial Pyramid Pooling blocks
used in the encoder architecture
Input image
predicted depth
without stereo term
prediction with
stereo term
Learning to extrapolate better using available info.:
By adding a stereo depth based loss term, able to
make better extrapolations in regions where no
ground truth or LiDAR exists.
3D LiDAR and Stereo Fusion using Stereo Matching
Network with Conditional Cost Volume Normalization
• The complementary characteristics of active and passive depth sensing techniques motivate
the fusion of the LiDAR sensor and stereo camera for improved depth perception.
• Recent SoA on deep models of stereo matching are composed of two main components:
matching cost computation and cost volume regularization.
• Instead of directly fusing estimated depths across LiDAR and stereo modalities, improve
stereo matching network with two enhanced techniques: Input Fusion to incorporate the
geometric info from sparse LiDAR depth with the RGB images for learning joint feature
representations and Conditional Cost Volume Normalization (CCVNorm) to adaptively
regularize cost volume optimization in dependence on LiDAR measurements.
• The framework is generic and closely integrated with the cost volume component that is
commonly utilized in stereo matching neural networks.
• With a hierarchical extension of CCVNorm, the method brings only slight overhead to the
stereo matching network in terms of computation time and model size.
3D LiDAR and Stereo Fusion using Stereo Matching
Network with Conditional Cost Volume Normalization
Overview of 3D LiDAR and stereo fusion framework: (1) Input Fusion that incorporates the geometric information
from sparse LiDAR depth with the RGB images as the input for the Cost Computation phase to learn joint feature
representations, and (2) CCVNorm that replaces batch normalization (BN) layer and modulates the cost volume
features F with being conditioned on LiDAR data, in the Cost Regularization phase of stereo matching network.
3D LiDAR and Stereo Fusion using Stereo Matching
Network with Conditional Cost Volume Normalization
Conditional Cost Volume Normalization.
At each pixel (red dashed bounding box),
based on the discretized disparity value of
corresponding LiDAR data, categorical
CCVNorm selects the modulation
parameters γ from a Dˆ-entry lookup table,
while the LiDAR points with invalid value
are separately handled with an additional
set of parameters (in gray color). On the
other hand, HierCCVNorm produces γ by a
hierarchical modulation of 2 steps.
3D LiDAR and Stereo Fusion using Stereo Matching
Network with Conditional Cost Volume Normalization
Comparing to other baselines and variants, this method captures details in complex structure area (the white
dashed bounding box) by leveraging complementary characteristics of LiDAR and stereo modalities.
Sparse and noisy LiDAR completion with
RGB guidance and uncertainty
• It proposes a method to accurately complete sparse LiDAR maps guided by RGB images.
• Mono depth prediction methods fail to generate absolute and precise depth maps.
• Stereoscopic approaches are still significantly outperformed by LiDAR based approaches.
• The goal of the depth completion task is to generate dense depth predictions from sparse
and irregular point clouds which are mapped to a 2D plane.
• A framework extracts both global and local information in order to produce proper depth
maps.
• Simple depth completion does not require a deep network; additionally a fusion method
with RGB guidance from a monocular camera in order to leverage object information and to
correct mistakes in the sparse input.
• Confidence masks are exploited in order to take into account the uncertainty in the depth
predictions from each modality.
• Code with visualizations is: https://github.com/wvangansbeke/Sparse-Depth-Completion.
Sparse and noisy LiDAR completion with
RGB guidance and uncertainty
The framework consists of two
parts: the global branch on top
and the local branch below. The
global path outputs three maps:
a guidance map, global depth
map and a confidence map. The
local map predicts a confidence
map and a local map by also
taking into account the guidance
map of the global network. The
framework fuses global and
local information based on the
confidence maps in a late fusion
approach.
Sparse and noisy LiDAR completion with
RGB guidance and uncertainty
The green box shows that the framework successfully
corrects the mistakes in the sparse LiDAR input frame.
Thanks

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Lidar for Autonomous Driving II (via Deep Learning)
Lidar for Autonomous Driving II (via Deep Learning)Lidar for Autonomous Driving II (via Deep Learning)
Lidar for Autonomous Driving II (via Deep Learning)
 
Pedestrian behavior/intention modeling for autonomous driving V
Pedestrian behavior/intention modeling for autonomous driving VPedestrian behavior/intention modeling for autonomous driving V
Pedestrian behavior/intention modeling for autonomous driving V
 
Pose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learningPose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learning
 
camera-based Lane detection by deep learning
camera-based Lane detection by deep learningcamera-based Lane detection by deep learning
camera-based Lane detection by deep learning
 
Deep Learning’s Application in Radar Signal Data II
Deep Learning’s Application in Radar Signal Data IIDeep Learning’s Application in Radar Signal Data II
Deep Learning’s Application in Radar Signal Data II
 
Deep vo and slam ii
Deep vo and slam iiDeep vo and slam ii
Deep vo and slam ii
 
Depth Fusion from RGB and Depth Sensors by Deep Learning
Depth Fusion from RGB and Depth Sensors by Deep LearningDepth Fusion from RGB and Depth Sensors by Deep Learning
Depth Fusion from RGB and Depth Sensors by Deep Learning
 
Deep learning for 3-D Scene Reconstruction and Modeling
Deep learning for 3-D Scene Reconstruction and Modeling Deep learning for 3-D Scene Reconstruction and Modeling
Deep learning for 3-D Scene Reconstruction and Modeling
 
fusion of Camera and lidar for autonomous driving II
fusion of Camera and lidar for autonomous driving IIfusion of Camera and lidar for autonomous driving II
fusion of Camera and lidar for autonomous driving II
 
3-d interpretation from single 2-d image III
3-d interpretation from single 2-d image III3-d interpretation from single 2-d image III
3-d interpretation from single 2-d image III
 
Deep VO and SLAM
Deep VO and SLAMDeep VO and SLAM
Deep VO and SLAM
 
Deep Learning’s Application in Radar Signal Data
Deep Learning’s Application in Radar Signal DataDeep Learning’s Application in Radar Signal Data
Deep Learning’s Application in Radar Signal Data
 
Deep VO and SLAM IV
Deep VO and SLAM IVDeep VO and SLAM IV
Deep VO and SLAM IV
 
3-d interpretation from stereo images for autonomous driving
3-d interpretation from stereo images for autonomous driving3-d interpretation from stereo images for autonomous driving
3-d interpretation from stereo images for autonomous driving
 
Multi sensor calibration by deep learning
Multi sensor calibration by deep learningMulti sensor calibration by deep learning
Multi sensor calibration by deep learning
 
Driving behaviors for adas and autonomous driving XII
Driving behaviors for adas and autonomous driving XIIDriving behaviors for adas and autonomous driving XII
Driving behaviors for adas and autonomous driving XII
 
fusion of Camera and lidar for autonomous driving I
fusion of Camera and lidar for autonomous driving Ifusion of Camera and lidar for autonomous driving I
fusion of Camera and lidar for autonomous driving I
 
BEV Semantic Segmentation
BEV Semantic SegmentationBEV Semantic Segmentation
BEV Semantic Segmentation
 
Deep learning for image video processing
Deep learning for image video processingDeep learning for image video processing
Deep learning for image video processing
 
3-d interpretation from single 2-d image for autonomous driving
3-d interpretation from single 2-d image for autonomous driving3-d interpretation from single 2-d image for autonomous driving
3-d interpretation from single 2-d image for autonomous driving
 

Ähnlich wie Depth Fusion from RGB and Depth Sensors III

10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
mokamojah
 
Automatic Detection of Window Regions in Indoor Point Clouds Using R-CNN
Automatic Detection of Window Regions in Indoor Point Clouds Using R-CNNAutomatic Detection of Window Regions in Indoor Point Clouds Using R-CNN
Automatic Detection of Window Regions in Indoor Point Clouds Using R-CNN
Zihao(Gerald) Zhang
 
12-Image enhancement and filtering.ppt
12-Image enhancement and filtering.ppt12-Image enhancement and filtering.ppt
12-Image enhancement and filtering.ppt
AJAYMALIK97
 

Ähnlich wie Depth Fusion from RGB and Depth Sensors III (20)

Depth Fusion from RGB and Depth Sensors II
Depth Fusion from RGB and Depth Sensors IIDepth Fusion from RGB and Depth Sensors II
Depth Fusion from RGB and Depth Sensors II
 
Single Image Depth Estimation using frequency domain analysis and Deep learning
Single Image Depth Estimation using frequency domain analysis and Deep learningSingle Image Depth Estimation using frequency domain analysis and Deep learning
Single Image Depth Estimation using frequency domain analysis and Deep learning
 
GIS fundamentals - raster
GIS fundamentals - rasterGIS fundamentals - raster
GIS fundamentals - raster
 
Learning RGB-D Salient Object Detection using background enclosure, depth con...
Learning RGB-D Salient Object Detection using background enclosure, depth con...Learning RGB-D Salient Object Detection using background enclosure, depth con...
Learning RGB-D Salient Object Detection using background enclosure, depth con...
 
The single image dehazing based on efficient transmission estimation
The single image dehazing based on efficient transmission estimationThe single image dehazing based on efficient transmission estimation
The single image dehazing based on efficient transmission estimation
 
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
 
Image Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyImage Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A survey
 
PhD_ppt_2012
PhD_ppt_2012PhD_ppt_2012
PhD_ppt_2012
 
WT in IP.ppt
WT in IP.pptWT in IP.ppt
WT in IP.ppt
 
LIDAR- Light Detection and Ranging.
LIDAR- Light Detection and Ranging.LIDAR- Light Detection and Ranging.
LIDAR- Light Detection and Ranging.
 
Digital image classification
Digital image classificationDigital image classification
Digital image classification
 
Semi-Automatic Classification Algorithm: The differences between Minimum Dist...
Semi-Automatic Classification Algorithm: The differences between Minimum Dist...Semi-Automatic Classification Algorithm: The differences between Minimum Dist...
Semi-Automatic Classification Algorithm: The differences between Minimum Dist...
 
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
 
TransNeRF
TransNeRFTransNeRF
TransNeRF
 
A Review On Single Image Depth Prediction with Wavelet Decomposition
A Review On Single Image Depth Prediction with Wavelet DecompositionA Review On Single Image Depth Prediction with Wavelet Decomposition
A Review On Single Image Depth Prediction with Wavelet Decomposition
 
Large scale 3 d point cloud compression using adaptive radial distance predic...
Large scale 3 d point cloud compression using adaptive radial distance predic...Large scale 3 d point cloud compression using adaptive radial distance predic...
Large scale 3 d point cloud compression using adaptive radial distance predic...
 
Automatic Detection of Window Regions in Indoor Point Clouds Using R-CNN
Automatic Detection of Window Regions in Indoor Point Clouds Using R-CNNAutomatic Detection of Window Regions in Indoor Point Clouds Using R-CNN
Automatic Detection of Window Regions in Indoor Point Clouds Using R-CNN
 
12-Image enhancement and filtering.ppt
12-Image enhancement and filtering.ppt12-Image enhancement and filtering.ppt
12-Image enhancement and filtering.ppt
 
[DL輪読会]ClearGrasp
[DL輪読会]ClearGrasp[DL輪読会]ClearGrasp
[DL輪読会]ClearGrasp
 
EfficientNet
EfficientNetEfficientNet
EfficientNet
 

Mehr von Yu Huang

Mehr von Yu Huang (20)

Application of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingApplication of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous Driving
 
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
 
Data Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous DrivingData Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous Driving
 
Techniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingTechniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous Driving
 
BEV Joint Detection and Segmentation
BEV Joint Detection and SegmentationBEV Joint Detection and Segmentation
BEV Joint Detection and Segmentation
 
BEV Object Detection and Prediction
BEV Object Detection and PredictionBEV Object Detection and Prediction
BEV Object Detection and Prediction
 
Fisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIFisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VI
 
Fisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VFisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving V
 
Fisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IVFisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IV
 
Prediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduPrediction,Planninng & Control at Baidu
Prediction,Planninng & Control at Baidu
 
Cruise AI under the Hood
Cruise AI under the HoodCruise AI under the Hood
Cruise AI under the Hood
 
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
 
Scenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingScenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous Driving
 
How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?
 
Annotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingAnnotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous Driving
 
Simulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgSimulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atg
 
Prediction and planning for self driving at waymo
Prediction and planning for self driving at waymoPrediction and planning for self driving at waymo
Prediction and planning for self driving at waymo
 
Jointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planningJointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planning
 
Data pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingData pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous driving
 
Open Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planningOpen Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planning
 

Kürzlich hochgeladen

Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
Kamal Acharya
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Neometrix_Engineering_Pvt_Ltd
 

Kürzlich hochgeladen (20)

FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLEGEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and properties
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 

Depth Fusion from RGB and Depth Sensors III

  • 1. Depth Fusion from RGB and Depth Sensors III Yu Huang Yu.huang07@gmail.com Sunnyvale, California
  • 2. Outline • Propagating Confidences through CNNs for Sparse Data Regression • Sparse and Dense Data with CNNs: Depth Completion and Semantic Segmentation • High-precision Depth Estimation with the 3D LiDAR and Stereo Fusion • Learn Morph. Operators for Depth Completion • DeepLiDAR: Deep Surface Normal Guided Depth Prediction from LiDAR and Color Image • Dense Depth Posterior (DDP) from Single Image and Sparse Range • DFuseNet: Fusion of RGB and Sparse Depth for Image Guided Dense Depth Completion • 3D LiDAR and Stereo Fusion using Stereo Matching Network with Conditional Cost Volume Normalization • Sparse and noisy LiDAR completion with RGB guidance and uncertainty
  • 3. Propagating Confidences through CNNs for Sparse Data Regression • An algebraically-constrained convolution layer for CNNs with sparse input; • Strategies for determining the confidence from the convolution operation and propagating it to consecutive layers. • An objective function that simultaneously minimizes the data error while maximizing the output confidence. • This approach produces a continuous pixel-wise confidence map enabling information fusion, state inference, and decision support.
  • 4. Propagating Confidences through CNNs for Sparse Data Regression The multi-scale architecture for the task of scene depth completion which utilizes normalized convolution layers
  • 5. Propagating Confidences through CNNs for Sparse Data Regression Top-left : input RGB image, top-right : projected LiDAR point cloud bottom- left : output from our method, bottom-right : error map in logarithmic scale.
  • 6. Sparse and Dense Data with CNNs: Depth Completion and Semantic Segmentation • CNNs are designed for dense data, but vision data is often sparse (stereo depth, point clouds, pen stroke, etc.). • A method to handle sparse depth data with optional dense RGB, and accomplish depth completion and semantic segmentation changing only the last layer. • It is a sparse training strategy and a late fusion scheme for dense RGB + sparse depth. • Following study of sparse data and validity mask, decide not to use any additional mask proving that the network learns sparsity invariant features by itself. • Ensure network robustness to varying input sparsities. • It even works with densities as low as 0.8% (8 layer lidar), and outperforms all published SoA on the Kitti depth completion benchmark. • Changing only the last layer, we also performed semantic segmentation on synthetic and real datasets.
  • 7. Sparse and Dense Data with CNNs: Depth Completion and Semantic Segmentation A network architecture adapted from NASNetNot use a validity mask
  • 8. Sparse and Dense Data with CNNs: Depth Completion and Semantic Segmentation A validity mask is a binary matrix of same size as the input data, with ones indicating available input data and zeros elsewhere. However, the validity information is quickly lost in the later layers. This is a consequence of the normalization phase on the number of valid pixels, which processes a mask with only one valid pixel in the same way as a fully valid mask. Another consequence is that the network tends to produce blurry outputs.
  • 9. Sparse and Dense Data with CNNs: Depth Completion and Semantic Segmentation A naive strategy consists of averaging separate predictions from each modality. An alternative is to apply an early fusion modalities are simply concatenated channel-wise and fed to the network. It appears preferable to transform different representations (RGB intensities, distance values) to a similar feature space before fusing them (known as late fusion).
  • 10. Sparse and Dense Data with CNNs: Depth Completion and Semantic Segmentation sD = sparse depth
  • 11. Sparse and Dense Data with CNNs: Depth Completion and Semantic Segmentation
  • 12. High-precision Depth Estimation with the 3D LiDAR and Stereo Fusion • A deep CNN architecture for high-precision depth estimation by jointly utilizing sparse 3D LiDAR and dense stereo depth information. • In this network, the complementary characteristics of sparse 3D LiDAR and dense stereo depth are simultaneously encoded in a boosting manner. • Tailored to the LiDAR and stereo fusion problem, this network differs from previous CNNs in the incorporation of a compact convolution module, which can be deployed with the constraints of mobile devices. • As training data for the LiDAR and stereo fusion is rather limited, a simple yet effective approach for reproducing the raw KITTI dataset is used. • The raw LIDAR scans are augmented by adapting an off-the-shelf stereo algorithm and a confidence measure.
  • 13. High-precision Depth Estimation with the 3D LiDAR and Stereo Fusion
  • 14. High-precision Depth Estimation with the 3D LiDAR and Stereo Fusion LiDAR and stereo fusion: (from top to down) Input color image, LiDAR disparity, the result of SGM and fusion.
  • 15. High-precision Depth Estimation with the 3D LiDAR and Stereo Fusion
  • 16. High-precision Depth Estimation with the 3D LiDAR and Stereo Fusion
  • 17. Learning Morphological Operators for Depth Completion • A method for completing sparse depth images in a semantically accurate manner by training a novel morphological NN. • It approximates morphological operations by Contra-harmonic Mean Filter layers which are trained in a contemporary NN framework. • An early fusion U-Net architecture then combines dilated depth channels and RGB. • Using a large scale RGB-D dataset to learn the optimal morphological and convolutional filter shapes that produce a fully sampled depth image at the output. • The resulting depth images is used to augment intelligent vehicles perception systems.
  • 18. Learning Morphological Operators for Depth Completion
  • 19. Learning Morphological Operators for Depth Completion • Morphological operators are the foundation of many image segmentation algorithms. • Using so called “structuring elements” they represent non-linear operations which compute the minimum, maximum or the combination of both within the element. • In the context of depth completion, it is of interest to learn the shape and the operation type that fits best the data. • The approximation of morphological operators by the contra-harmonic mean (CHM) filter is the best founded technique which can easily be integrated in a deep learning framework. The contra-harmonic mean filter function ψk (x) is modeled as the power-weighted 2D convolution of the image f (x) and a filter w representing the structuring element
  • 20. Learning Morphological Operators for Depth Completion
  • 21. DeepLiDAR: Deep Surface Normal Guided Depth Prediction for Outdoor Scene from Sparse LiDAR Data and Single Color Image • A deep learning architecture that produces accurate dense depth for the outdoor scene from a single color image and a sparse depth. • This network estimates surface normals as the intermediate representation to produce dense depth, and can be trained end-to-end. • With a modified encoder-decoder structure, this network effectively fuses the dense color image and the sparse LiDAR depth. • To address outdoor specific challenges, it predicts a confidence mask to handle mixed LiDAR signals near FG boundaries due to occlusion, and combines estimates from the color image and surface normals with learned attention maps to improve the depth accuracy especially for distant areas. • Comprehensive analysis shows that this model generalizes well to the input with higher sparsity or from indoor scenes.
  • 22. DeepLiDAR: Deep Surface Normal Guided Depth Prediction for Outdoor Scene from Sparse LiDAR Data and Single Color Image It takes as input a color image and a sparse depth image from the LiDAR (Row 1), and output a dense depth map (Row 2). It estimates surface normals (Row 3) as the intermediate representation.
  • 23. DeepLiDAR: Deep Surface Normal Guided Depth Prediction for Outdoor Scene from Sparse LiDAR Data and Single Color Image The pipeline. It consists of two pathways. Both starting from a RGB image, a sparse depth, and a binary mask as the inputs, the surface normal pathway produces a pixel-wise surface normal, further combined with the sparse depth and a confidence mask from the color pathway to produce a dense depth. The color pathway produces a dense depth too. The final dense depth output is the weighted sum of the depths from two pathways using the estimated attention map.
  • 24. DeepLiDAR: Deep Surface Normal Guided Depth Prediction for Outdoor Scene from Sparse LiDAR Data and Single Color Image Detailed architecture of deep completion unit. Occlusion and learned confidence.
  • 25. DeepLiDAR: Deep Surface Normal Guided Depth Prediction for Outdoor Scene from Sparse LiDAR Data and Single Color Image
  • 26. DeepLiDAR: Deep Surface Normal Guided Depth Prediction for Outdoor Scene from Sparse LiDAR Data and Single Color Image
  • 27. Dense Depth Posterior (DDP) from Single Image and Sparse Range • A deep learning system to infer the posterior distribution of a dense depth map associated with an image, by exploiting sparse range measurements, for instance from a lidar. • While the lidar may provide a depth value for a small percentage of the pixels, exploit regularities reflected in the training set to complete the map so as to have a probability over depth for each pixel in the image. • To exploit a Conditional Prior Network, that allows associating a probability to each depth value given an image, and combine it with a likelihood term that uses the sparse measurements. • Optionally exploit the availability of stereo during training, but in any case only require a single image and a sparse point cloud at run-time.
  • 28. Dense Depth Posterior (DDP) from Single Image and Sparse Range (A): Architecture of a Conditional Prior Network (CPN) to learn the conditional of the dense depth given a single image. (B): Depth Completion Network (DCN) for learning the mapping from sparse depth map and image to dense depth map.
  • 29. Dense Depth Posterior (DDP) from Single Image and Sparse Range An image (top) is insufficient to determine the geometry of the scene A point cloud alone (middle) is similarly ambiguous. Combining a single image, the lidar point cloud, and previously seen scenes allows inferring a dense depth map (bottom) with high confidence Color bar from left to right: zero to infinity.
  • 30. DFuseNet: Deep Fusion of RGB and Sparse Depth Information for Image Guided Dense Depth Completion • A CNN that is designed to upsample a series of sparse range measurements based on the contextual cues gleaned from a HR intensity image. • It draws inspiration from related work on SR and inpainting. • An architecture that seeks to pull contextual cues separately from the intensity image and the depth features and then fuse them later in the network. • It effectively exploits the relationship between the two modalities and produces accurate results while respecting salient image structures. Input color image LiDAR scan mask DFuseNet
  • 31. DFuseNet: Deep Fusion of RGB and Sparse Depth Information for Image Guided Dense Depth Completion The network architecture uses two input branches for RGB depth input respectively. Spatial Pyramid Pooling (SPP) blocks are used in the encoder and a hierarchical representation of decoder features are used to predict dense depth images.
  • 32. DFuseNet: Deep Fusion of RGB and Sparse Depth Information for Image Guided Dense Depth Completion Spatial Pyramid Pooling blocks used in the encoder architecture Input image predicted depth without stereo term prediction with stereo term Learning to extrapolate better using available info.: By adding a stereo depth based loss term, able to make better extrapolations in regions where no ground truth or LiDAR exists.
  • 33. 3D LiDAR and Stereo Fusion using Stereo Matching Network with Conditional Cost Volume Normalization • The complementary characteristics of active and passive depth sensing techniques motivate the fusion of the LiDAR sensor and stereo camera for improved depth perception. • Recent SoA on deep models of stereo matching are composed of two main components: matching cost computation and cost volume regularization. • Instead of directly fusing estimated depths across LiDAR and stereo modalities, improve stereo matching network with two enhanced techniques: Input Fusion to incorporate the geometric info from sparse LiDAR depth with the RGB images for learning joint feature representations and Conditional Cost Volume Normalization (CCVNorm) to adaptively regularize cost volume optimization in dependence on LiDAR measurements. • The framework is generic and closely integrated with the cost volume component that is commonly utilized in stereo matching neural networks. • With a hierarchical extension of CCVNorm, the method brings only slight overhead to the stereo matching network in terms of computation time and model size.
  • 34. 3D LiDAR and Stereo Fusion using Stereo Matching Network with Conditional Cost Volume Normalization Overview of 3D LiDAR and stereo fusion framework: (1) Input Fusion that incorporates the geometric information from sparse LiDAR depth with the RGB images as the input for the Cost Computation phase to learn joint feature representations, and (2) CCVNorm that replaces batch normalization (BN) layer and modulates the cost volume features F with being conditioned on LiDAR data, in the Cost Regularization phase of stereo matching network.
  • 35. 3D LiDAR and Stereo Fusion using Stereo Matching Network with Conditional Cost Volume Normalization Conditional Cost Volume Normalization. At each pixel (red dashed bounding box), based on the discretized disparity value of corresponding LiDAR data, categorical CCVNorm selects the modulation parameters γ from a Dˆ-entry lookup table, while the LiDAR points with invalid value are separately handled with an additional set of parameters (in gray color). On the other hand, HierCCVNorm produces γ by a hierarchical modulation of 2 steps.
  • 36. 3D LiDAR and Stereo Fusion using Stereo Matching Network with Conditional Cost Volume Normalization Comparing to other baselines and variants, this method captures details in complex structure area (the white dashed bounding box) by leveraging complementary characteristics of LiDAR and stereo modalities.
  • 37. Sparse and noisy LiDAR completion with RGB guidance and uncertainty • It proposes a method to accurately complete sparse LiDAR maps guided by RGB images. • Mono depth prediction methods fail to generate absolute and precise depth maps. • Stereoscopic approaches are still significantly outperformed by LiDAR based approaches. • The goal of the depth completion task is to generate dense depth predictions from sparse and irregular point clouds which are mapped to a 2D plane. • A framework extracts both global and local information in order to produce proper depth maps. • Simple depth completion does not require a deep network; additionally a fusion method with RGB guidance from a monocular camera in order to leverage object information and to correct mistakes in the sparse input. • Confidence masks are exploited in order to take into account the uncertainty in the depth predictions from each modality. • Code with visualizations is: https://github.com/wvangansbeke/Sparse-Depth-Completion.
  • 38. Sparse and noisy LiDAR completion with RGB guidance and uncertainty The framework consists of two parts: the global branch on top and the local branch below. The global path outputs three maps: a guidance map, global depth map and a confidence map. The local map predicts a confidence map and a local map by also taking into account the guidance map of the global network. The framework fuses global and local information based on the confidence maps in a late fusion approach.
  • 39. Sparse and noisy LiDAR completion with RGB guidance and uncertainty The green box shows that the framework successfully corrects the mistakes in the sparse LiDAR input frame.