1. Experiments and Evaluation
Note: Depth Input: [train]/[test]. i.e. O/Noisy: sparse (~7%
available) depth without depth as train input and sparse &
noisy depth as test input.
KITTI Dataset unit: [mm]
TUM SLAM Dataset unit: [mm]
DFineNet: Ego-Motion Estimation and Depth Refinement from Sparse, Noisy
Depth Input with RGB Guidance
Yilun Zhang, Ty Nguyen, Ian D Miller, Shreyas Shivakumar, Steven Chen, Camillo J Taylor, Vijay Kumar
University of Pennsylvania
Method Depth Input RMSE MAE iRMSE iMAE
Sfmlearner O/O 3436.53 2839.16 2659.01 2577.77
Ma et al. O/O 119.53 60.01 15.23 8.97
DFineNet(rgbd) O/O 118.34 59.82 14.77 8.88
Method ATE[m] RE Photometric Loss
Sfmlearner 0.0179±0.0110 0.0018±0.0009 0.1843
Ma et al. 0.0105±0.0082 0.0011±0.0006 0.1052
Ground truth 0.±0. 0.±0. 0.1921
DFineNet(rgbd) 0.0170±0.0094 0.0046±0.0031 0.0726
Method Depth Input
Sfmlearner O/O
Ma et al. O/O
DFineNet(rgbd) O/O
Method
Sfmlearner
Ma et al.
Ground truth
DFineNet(rgbd)
Method Depth Input RMSE MAE iRMSE iMAE
Sfmlearner Noisy/Noisy 3574.98 2943.25 2712.34 2645.31
Ma et al. Noisy/Noisy 209.81 122.24 60.75 27.05
DFineNet(rgbd) O/Noisy 779.582 579.18 116.95 83.81
DFineNet(rgbd) Noisy/Noisy 180.63 100.20 45.54 21.08
Method ATE[m] RE Photometric Loss
Sfmlearner 0.0152±0.0092 0.0033±0.0021 0.1043
Ma et al. 0.0116±0.0067 0.0025±0.0022 0.0820
Ground truth 0.±0. 0.±0. 0.0533
DFineNet(rgbd) 0.0101±0.0051 0.0021±0.0015 0.0369
Method Depth Input
Sfmlearner Noisy/Noisy
Ma et al. Noisy/Noisy
DFineNet(rgbd) O/Noisy
DFineNet(rgbd) Noisy/Noisy
Method
Sfmlearner
Ma et al.
Ground truth
DFineNet(rgbd)
Summary
Experiment setting:
System Input:
Sparse (~7%) & noisy (error up to 50%) depth measurement
RGB
System Output:
refined dense depth
relative camera ego-motion
Training and inference with Nvidia Tesla DGX-1 V100:
Training takes ~2.5h on single Nvidia Tesla DGX-1 V100;
Inference at ~20 fps on single Nvidia Tesla DGX-1 V100;
Impelementated with Pytorch 1.0.0
Conclusions:
An end-to-end network capable of refining sparse & noisy depth measurement as well
as producing camera ego-motion
Demonstrated to work with multiple types of depth sensors (i.e. Lidar, stereo-camera)
DFineNet GuideNet [1st] Ma et al. [19th]
Dense depth estimation is essential for self-driving cars,
autonomous robots and augmented reality.
Sensors providing with dense depth are often expensive and bulky.
We provide with a fusion mechanisim that takes sparse, noisy depth
from Lidar or stereo cameras, and produces dense depth with RGB
guidance.
Our end-to-end network model is capable of using noisy, as
sparse as ~7% depth measurement, and producing a refined
dense depth. It also produces ego-motion of the camera
simultineously.
Motivation & Overview
Estimation with Sparse&Noisy Depth Measurement on NVIDIA Tesla V100
Estimation with Sparse Depth Measurement on NVIDIA Tesla V100
DeFineNet (Output)
GroundTruth Sparse depth
RGB + Sparse & noisy depth (Input)RGB (Input)
Sparse depth
Sparse & noisy depth (Input)
DFineNet (Output)