SlideShare ist ein Scribd-Unternehmen logo
1 von 42
Downloaden Sie, um offline zu lesen
3D Interpretation from Single 2D Image
for Autonomous Driving V
Yu Huang
Yu.huang07@gmail.com
Sunnyvale, California
Outline
• MonoRUn: Monocular 3D Object Detection by Reconstruction and
Uncertainty Propagation
• M3DSSD: Monocular 3D Single Stage Object Detector
• Delving into Localization Errors for Monocular 3D Object Detection
• GrooMeD-NMS: Grouped Mathematically Differentiable NMS for
Monocular 3D Object Detection
• Objects are Different: Flexible Monocular 3D Object Detection
MonoRUn: Monocular 3D Object Detection by Reconstruction
and Uncertainty Propagation
• MonoRUn: Self supervised, learn dense
correspondences and geometry;
• Robust KL loss: minimize the uncertainty
weighted projection error;
• Uncertainty aware region reconstruction
network for 3-d object coordinate
regression;
• uncertainty-driven PnP for object pose
and covariance matrix estimation;
• Codes: https://github.com/tjiiv-
cprg/MonoRUn.
MonoRUn: Monocular 3D Object Detection by Reconstruction
and Uncertainty Propagation
MonoRUn: Monocular 3D Object Detection by Reconstruction
and Uncertainty Propagation
MonoRUn: Monocular 3D Object Detection by Reconstruction
and Uncertainty Propagation
M3DSSD: Monocular 3D Single Stage
Object Detector
• Monocular 3D Single Stage object Detector (M3DSSD) with feature alignment and
asymmetric non-local attention.
• Current anchor-based monocular 3D object detection methods suffer from feature
mismatching.
• To overcome this, propose a two-step feature alignment approach.
• In the first step, the shape alignment is performed to enable the receptive field of
the feature map to focus on the pre-defined anchors with high confidence scores.
• In the second step, the center alignment is used to align the features at 2D/3D
centers.
• Further, it is often difficult to learn global information and capture long-range
relationships, which are important for the depth prediction of objects.
• Asymmetric non-local attention block with multiscale sampling to extract depth-
wise features.
• The code is released at https://github.com/mumianyuxin/M3DSSD.
M3DSSD: Monocular 3D Single Stage
Object Detector
The architecture of M3DSSD. (a) The backbone of the framework, which is modified from DLA-102. (b)
The two-step feature alignment, classification head, 2D/3D center regression heads, and ANAB
especially designed for predicting the depth z3d. (c) Other regression heads.
M3DSSD: Monocular 3D Single Stage
Object Detector
The architecture of shape alignment
and the outcome of shape alignment
on objects. The yellow squares indicate
the sampling location of the AlignConv,
and the anchors are in red.
M3DSSD: Monocular 3D Single Stage
Object Detector
The architectures of center alignment and the
outcome of the center alignment. When
applying center alignment to objects, the
sampling locations on the foreground regions (in
white) all concentrate on the centers of objects
(in yellow) after center alignment, which are
near to the true centers of objects (in red).
M3DSSD: Monocular 3D Single Stage
Object Detector
Asymmetric Non-local Attention Block. The
key and query branches share the same
attention maps, which forces the key and
value to focus on the same place. Bottom:
Pyramid Average Pooling with Attention
(PA2) that generates different level
descriptors in various resolutions.
M3DSSD: Monocular 3D Single Stage
Object Detector
M3DSSD: Monocular 3D Single Stage
Object Detector
M3DSSD: Monocular 3D Single Stage
Object Detector
Delving into Localization Errors for
Monocular 3D Object Detection
• Quantify the impact introduced by each sub-task and found the
‘localization error’ is the vital factor in restricting monocular 3D detection.
• Besides, investigate the underlying reasons behind localization errors,
analyze the issues they might bring, and propose three strategies.
• First, misalignment between the center of the 2D bounding box and the
projected center of the 3D object, which is a vital factor leading to low
localization accuracy.
• Second, we observe that accurately localizing distant objects with existing
technologies is almost impossible, while those samples will mislead the
learned network. To remove such samples from the training set for
improving the overall performance of the detector.
• Lastly, 3D IoU oriented loss for the size estimation of the object, which is not
affected by ‘localization error’.
• Codes: https://github.com/xinzhuma/monodle.
Delving into Localization Errors for
Monocular 3D Object Detection
Delving into Localization Errors for
Monocular 3D Object Detection
• Coupled with the errors accumulated by other tasks such as depth
estimation, it becomes an almost impossible task to accurately estimate the
3D bounding box of distant objects from a single monocular image, unless
the depth estimation is accurate enough (not achieved to date).
• For estimating the coarse center c, 1) use the projected 3D center cw as the
ground-truth for the branch estimating coarse center c and 2) force the
model to learn features from 2D detection simultaneously.
• Here adopting the projected 3D center cw as the ground-truth for the
coarse center c, helps the branch for estimating the coarse center aware of
3D geometry and more related to the task of estimating 3D object center,
which is the key of localization problem.
• 2D detection serves as auxiliary task to learn better 3D aware features.
Delving into Localization Errors for
Monocular 3D Object Detection
• Two schemes are proposed on how to generate the object level training weight for sample:
• Hard coding: discard all samples over a certain distance
• Soft coding: generate it using a reverse sigmoid-like function
• A IoU oriented optimization for 3D size estimation: Specifically, suppose all prediction items
except the 3D size s = [h,w,l]3D are completely correct, then
Delving into Localization Errors for
Monocular 3D Object Detection
Delving into Localization Errors for
Monocular 3D Object Detection
Delving into Localization Errors for
Monocular 3D Object Detection
GrooMeD-NMS: Grouped Mathematically Differentiable
NMS for Monocular 3D Object Detection
• While there were attempts to include NMS in the training pipeline for tasks
such as 2D object detection, they have been less widely
• adopted due to a non-mathematical expression of the NMS.
• It integrate GrooMeD-NMS – a Grouped Mathematically Differentiable NMS
for monocular 3D object detection, such that the network is trained end-to-
end with a loss on the boxes after NMS.
• First formulate NMS as a matrix operation and then group and mask the
boxes in an unsupervised manner to obtain a simple closed-form expression
of the NMS.
• GrooMeDNMS addresses the mismatch between training and inference
pipelines and, therefore, forces the network to select the best 3D box in a
differentiable manner.
GrooMeD-NMS: Grouped Mathematically Differentiable
NMS for Monocular 3D Object Detection
(a) Conventional object detection has a mismatch between training and inference as it uses NMS
only in inference. (b) To address this, propose a novel GrooMeD-NMS layer, such that the network is
trained end-to-end with NMS applied. s and r denote the score of boxes B before and after the NMS
respectively. O denotes the matrix containing IoU2D overlaps of B. Lbefore denotes the losses before the
NMS, while Lafter denotes the loss after the NMS. (c) GrooMeD-NMS layer calculates r in a differentiable
manner giving gradients from Lafter when the best-localized box corresponding to an object is not
selected after NMS.
GrooMeD-NMS: Grouped Mathematically Differentiable
NMS for Monocular 3D Object Detection
GrooMeD-NMS: Grouped Mathematically Differentiable
NMS for Monocular 3D Object Detection
write the rescores r in a matrix formulation as written compactly as
where P, called the Prune Matrix, is obtained when the
pruning function p operates element-wise on O.
to avoid recursion, use as the solution
cluster boxes in an image in an unsupervised manner
based on IoU2D overlaps to obtain the groups G. Grouping
thus mimics the grouping of the classical NMS, but does
not rescore the boxes. Rewrite as
GrooMeD-NMS: Grouped Mathematically Differentiable
NMS for Monocular 3D Object Detection
Classical NMS considers the IoU2D of the top-scored box
with other boxes. This consideration is equivalent to only
keeping the column of O corresponding to the top box
while assigning the rest of the columns to be zero.
Implement this through masking of PGk . Let MGk denote
the binary mask corresponding to group Gk.
Due to Frobenius matrix, simplify further to get
entries in the binary matrix MGk in the column correspond.
to the top scored box are 1 and the rest are 0.
GrooMeD-NMS: Grouped Mathematically Differentiable
NMS for Monocular 3D Object Detection
pruning function
GrooMeD-NMS: Grouped Mathematically Differentiable
NMS for Monocular 3D Object Detection
The method consists of M3DRPN and uses binning and
self-balancing confidence. The boxes’ self-balancing
confidence are used as scores s, which pass through the
GrooMeD-NMS layer to obtain the rescores r. The
rescores signal the network if the best box has not been
selected for a particular object. The target assignment is
For calculating gIoU3D, first calculate the volume V
and hull volume Vhull of the 3D boxes. Vhull is the
product of gIoU2D in Birds Eye View (BEV),
removing the rotations and hull of the Y dimension.
if the best boxes are correctly ranked in one image
and are not in the second, then the gradients only
affect the boxes of the second image.
modification as Image-wise AP-Loss
use the modified AP-Loss as the loss after NMS since
AP-Loss does not suffer from class imbalance.
GrooMeD-NMS: Grouped Mathematically Differentiable
NMS for Monocular 3D Object Detection
GrooMeD-NMS: Grouped Mathematically Differentiable
NMS for Monocular 3D Object Detection
Objects are Different: Flexible Monocular 3D
Object Detection
• Most existing methods adopt the same approach for all objects regardless of
diverse distributions, leading to limited performance for truncated objects.
• A flexible framework for monocular 3D object detection which explicitly decouples
the truncated objects and adaptively combines multiple approaches for object
depth estimation.
• Specifically, decouple the edge of the feature map for predicting long-tail truncated
objects so that the optimization of normal objects is not influenced.
• Furthermore, formulate the object depth estimation as an uncertainty-guided
ensemble of directly regressed object depth and solved depths from different
groups of keypoints.
• Code to be at https://github.com/zhangyp15/MonoFlex.
Objects are Different: Flexible Monocular 3D
Object Detection
Objects are Different: Flexible Monocular 3D
Object Detection
• The framework is extended from CenterNet, where objects are identified by their
representative points and predicted by peaks of the heatmap.
• First, the CNN backbone extracts feature maps from the monocular image as the
input for multiple prediction heads.
• Multiple prediction branches are deployed on the shared backbone to regress
objects’ properties, including the 2D bounding box, dimension, orientation,
keypoints, and depth.
• The image-level localization involves the heatmap and offsets, where the edge
fusion modules are used to decouple the feature learning and prediction of
truncated objects.
• The final depth estimation is an uncertainty guided combination of the regressed
depth and the computed depths from estimated keypoints and dimensions.
• The adaptive depth ensemble adopts four methods for depth estimation and
simultaneously predicts their uncertainties, which are utilized to form an uncertainty
weighted prediction.
Objects are Different: Flexible Monocular 3D
Object Detection
The dimension and orientation can be directly inferred
from appearance-based clues, while the 3D location is
converted to the projected 3D center xc = (uc, vc) and the
object depth z as
Existing methods utilize a unified representation xr, the
center of 2D bounding box xb, for every object. In such
cases, the offset 𝛿c = xc - xb is regressed to derive the
projected 3D center xc.
(a) The 3D location is converted to the projected
center and the object depth. (b) The distribution of
the offsets c from 2D centers to projected 3D
centers. Inside and outside objects exhibit entirely
different distributions.
Divide objects into two groups depending on whether
their projected 3D centers are inside or outside the
image.
The joint learning of 𝛿c can suffer from long-tail
offsets and therefore to decouple the
representations and the offset learning of inside
and outside objects.
Objects are Different: Flexible Monocular 3D
Object Detection
Inside Objects’ discretization offset error
Outside Objects’ discretization offset error
(a) The intersection xI between the image edge and
the line from xb to xc is used to represent the
truncated object. (b) The edge heatmap is generated
with 1D Gaussian distribution. (c) The edge
intersection xI (cyan) is a better representation than
2D center xb (green) for heavily truncated objects.
Since 2D bounding boxes only capture the inside-
image part of objects, the visual locations of xb can
be confusing and even on other objects. By contrast,
the intersection xI disentangles the edge area of the
heatmap to focus on outside objects and offers a
strong boundary prior to simplify the localization.
Objects are Different: Flexible Monocular 3D
Object Detection
• Edge fusion module to further decouple the feature learning and prediction of
outside objects;
• The module first extracts four boundaries of the feature map and concatenates
them into an edge feature vector in clockwise order, which is then processed by
two 1D convolutional layers to learn
• unique features for truncated objects.
• Finally, the processed vector is remapped to the four boundaries and added to
the input feature map.
Objects are Different: Flexible Monocular 3D
Object Detection
Relation of global orientation, local
orientation, and the viewing angle.
Keypoints include the projections of eight vertexes,
top center and bottom center of the 3D bounding box.
The depth of a supporting line of the 3D bounding
box can be computed with the object height and the line’s
pixel height. Split ten keypoints into three groups, each
of which can produce the center depth independently
Objects are Different: Flexible Monocular 3D
Object Detection
Objects are Different: Flexible Monocular 3D
Object Detection
Objects are Different: Flexible Monocular 3D
Object Detection
Objects are Different: Flexible Monocular 3D
Object Detection
3-d interpretation from single 2-d image V

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Depth Fusion from RGB and Depth Sensors II
Depth Fusion from RGB and Depth Sensors IIDepth Fusion from RGB and Depth Sensors II
Depth Fusion from RGB and Depth Sensors II
 
Fisheye Omnidirectional View in Autonomous Driving
Fisheye Omnidirectional View in Autonomous DrivingFisheye Omnidirectional View in Autonomous Driving
Fisheye Omnidirectional View in Autonomous Driving
 
3-d interpretation from single 2-d image III
3-d interpretation from single 2-d image III3-d interpretation from single 2-d image III
3-d interpretation from single 2-d image III
 
3-d interpretation from single 2-d image for autonomous driving
3-d interpretation from single 2-d image for autonomous driving3-d interpretation from single 2-d image for autonomous driving
3-d interpretation from single 2-d image for autonomous driving
 
3-d interpretation from single 2-d image IV
3-d interpretation from single 2-d image IV3-d interpretation from single 2-d image IV
3-d interpretation from single 2-d image IV
 
LiDAR-based Autonomous Driving III (by Deep Learning)
LiDAR-based Autonomous Driving III (by Deep Learning)LiDAR-based Autonomous Driving III (by Deep Learning)
LiDAR-based Autonomous Driving III (by Deep Learning)
 
Deep learning for 3-D Scene Reconstruction and Modeling
Deep learning for 3-D Scene Reconstruction and Modeling Deep learning for 3-D Scene Reconstruction and Modeling
Deep learning for 3-D Scene Reconstruction and Modeling
 
Multi sensor calibration by deep learning
Multi sensor calibration by deep learningMulti sensor calibration by deep learning
Multi sensor calibration by deep learning
 
Deep vo and slam iii
Deep vo and slam iiiDeep vo and slam iii
Deep vo and slam iii
 
Lidar for Autonomous Driving II (via Deep Learning)
Lidar for Autonomous Driving II (via Deep Learning)Lidar for Autonomous Driving II (via Deep Learning)
Lidar for Autonomous Driving II (via Deep Learning)
 
Anchor free object detection by deep learning
Anchor free object detection by deep learningAnchor free object detection by deep learning
Anchor free object detection by deep learning
 
Fisheye Omnidirectional View in Autonomous Driving II
Fisheye Omnidirectional View in Autonomous Driving IIFisheye Omnidirectional View in Autonomous Driving II
Fisheye Omnidirectional View in Autonomous Driving II
 
Deep Learning’s Application in Radar Signal Data
Deep Learning’s Application in Radar Signal DataDeep Learning’s Application in Radar Signal Data
Deep Learning’s Application in Radar Signal Data
 
Stereo Matching by Deep Learning
Stereo Matching by Deep LearningStereo Matching by Deep Learning
Stereo Matching by Deep Learning
 
Deep learning for image video processing
Deep learning for image video processingDeep learning for image video processing
Deep learning for image video processing
 
Deep Learning’s Application in Radar Signal Data II
Deep Learning’s Application in Radar Signal Data IIDeep Learning’s Application in Radar Signal Data II
Deep Learning’s Application in Radar Signal Data II
 
Pedestrian behavior/intention modeling for autonomous driving V
Pedestrian behavior/intention modeling for autonomous driving VPedestrian behavior/intention modeling for autonomous driving V
Pedestrian behavior/intention modeling for autonomous driving V
 
Deep vo and slam ii
Deep vo and slam iiDeep vo and slam ii
Deep vo and slam ii
 
Depth Fusion from RGB and Depth Sensors III
Depth Fusion from RGB and Depth Sensors  IIIDepth Fusion from RGB and Depth Sensors  III
Depth Fusion from RGB and Depth Sensors III
 
Camera-based road Lane detection by deep learning III
Camera-based road Lane detection by deep learning IIICamera-based road Lane detection by deep learning III
Camera-based road Lane detection by deep learning III
 

Ähnlich wie 3-d interpretation from single 2-d image V

10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
mokamojah
 

Ähnlich wie 3-d interpretation from single 2-d image V (20)

3-d interpretation from stereo images for autonomous driving
3-d interpretation from stereo images for autonomous driving3-d interpretation from stereo images for autonomous driving
3-d interpretation from stereo images for autonomous driving
 
Deep learning for 3 d point clouds presentation
Deep learning for 3 d point clouds presentationDeep learning for 3 d point clouds presentation
Deep learning for 3 d point clouds presentation
 
[DL輪読会]ClearGrasp
[DL輪読会]ClearGrasp[DL輪読会]ClearGrasp
[DL輪読会]ClearGrasp
 
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
 
Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud
Point-GNN: Graph Neural Network for 3D Object Detection in a Point CloudPoint-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud
Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud
 
Large Scale Image Retrieval 2022.pdf
Large Scale Image Retrieval 2022.pdfLarge Scale Image Retrieval 2022.pdf
Large Scale Image Retrieval 2022.pdf
 
FastV2C-HandNet - ICICC 2020
FastV2C-HandNet - ICICC 2020FastV2C-HandNet - ICICC 2020
FastV2C-HandNet - ICICC 2020
 
Introduction to 3D Computer Vision and Differentiable Rendering
Introduction to 3D Computer Vision and Differentiable RenderingIntroduction to 3D Computer Vision and Differentiable Rendering
Introduction to 3D Computer Vision and Differentiable Rendering
 
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
 
Mmpaper draft10
Mmpaper draft10Mmpaper draft10
Mmpaper draft10
 
Mmpaper draft10
Mmpaper draft10Mmpaper draft10
Mmpaper draft10
 
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4
 
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...
 
BEV Semantic Segmentation
BEV Semantic SegmentationBEV Semantic Segmentation
BEV Semantic Segmentation
 
Parallel wisard object tracker a rambased tracking system
Parallel wisard object tracker a rambased tracking systemParallel wisard object tracker a rambased tracking system
Parallel wisard object tracker a rambased tracking system
 
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
 
Object detection in images based on homogeneous region segmentation
Object detection in images based on homogeneous region segmentationObject detection in images based on homogeneous region segmentation
Object detection in images based on homogeneous region segmentation
 
AUTO AI 2021 talk Real world data augmentations for autonomous driving : B Ra...
AUTO AI 2021 talk Real world data augmentations for autonomous driving : B Ra...AUTO AI 2021 talk Real world data augmentations for autonomous driving : B Ra...
AUTO AI 2021 talk Real world data augmentations for autonomous driving : B Ra...
 
OBJECT DETECTION FOR SERVICE ROBOT USING RANGE AND COLOR FEATURES OF AN IMAGE
OBJECT DETECTION FOR SERVICE ROBOT USING RANGE AND COLOR FEATURES OF AN IMAGEOBJECT DETECTION FOR SERVICE ROBOT USING RANGE AND COLOR FEATURES OF AN IMAGE
OBJECT DETECTION FOR SERVICE ROBOT USING RANGE AND COLOR FEATURES OF AN IMAGE
 
Object Detection for Service Robot Using Range and Color Features of an Image
Object Detection for Service Robot Using Range and Color Features of an ImageObject Detection for Service Robot Using Range and Color Features of an Image
Object Detection for Service Robot Using Range and Color Features of an Image
 

Mehr von Yu Huang

Mehr von Yu Huang (20)

Application of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingApplication of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous Driving
 
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
 
Data Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous DrivingData Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous Driving
 
Techniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingTechniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous Driving
 
BEV Joint Detection and Segmentation
BEV Joint Detection and SegmentationBEV Joint Detection and Segmentation
BEV Joint Detection and Segmentation
 
BEV Object Detection and Prediction
BEV Object Detection and PredictionBEV Object Detection and Prediction
BEV Object Detection and Prediction
 
Fisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIFisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VI
 
Fisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VFisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving V
 
Fisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IVFisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IV
 
Prediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduPrediction,Planninng & Control at Baidu
Prediction,Planninng & Control at Baidu
 
Cruise AI under the Hood
Cruise AI under the HoodCruise AI under the Hood
Cruise AI under the Hood
 
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
 
Scenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingScenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous Driving
 
How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?
 
Simulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgSimulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atg
 
Prediction and planning for self driving at waymo
Prediction and planning for self driving at waymoPrediction and planning for self driving at waymo
Prediction and planning for self driving at waymo
 
Jointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planningJointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planning
 
Data pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingData pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous driving
 
Open Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planningOpen Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planning
 
Lidar in the adverse weather: dust, fog, snow and rain
Lidar in the adverse weather: dust, fog, snow and rainLidar in the adverse weather: dust, fog, snow and rain
Lidar in the adverse weather: dust, fog, snow and rain
 

Kürzlich hochgeladen

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 

Kürzlich hochgeladen (20)

Vivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design SpainVivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design Spain
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 

3-d interpretation from single 2-d image V

  • 1. 3D Interpretation from Single 2D Image for Autonomous Driving V Yu Huang Yu.huang07@gmail.com Sunnyvale, California
  • 2. Outline • MonoRUn: Monocular 3D Object Detection by Reconstruction and Uncertainty Propagation • M3DSSD: Monocular 3D Single Stage Object Detector • Delving into Localization Errors for Monocular 3D Object Detection • GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection • Objects are Different: Flexible Monocular 3D Object Detection
  • 3. MonoRUn: Monocular 3D Object Detection by Reconstruction and Uncertainty Propagation • MonoRUn: Self supervised, learn dense correspondences and geometry; • Robust KL loss: minimize the uncertainty weighted projection error; • Uncertainty aware region reconstruction network for 3-d object coordinate regression; • uncertainty-driven PnP for object pose and covariance matrix estimation; • Codes: https://github.com/tjiiv- cprg/MonoRUn.
  • 4. MonoRUn: Monocular 3D Object Detection by Reconstruction and Uncertainty Propagation
  • 5. MonoRUn: Monocular 3D Object Detection by Reconstruction and Uncertainty Propagation
  • 6. MonoRUn: Monocular 3D Object Detection by Reconstruction and Uncertainty Propagation
  • 7. M3DSSD: Monocular 3D Single Stage Object Detector • Monocular 3D Single Stage object Detector (M3DSSD) with feature alignment and asymmetric non-local attention. • Current anchor-based monocular 3D object detection methods suffer from feature mismatching. • To overcome this, propose a two-step feature alignment approach. • In the first step, the shape alignment is performed to enable the receptive field of the feature map to focus on the pre-defined anchors with high confidence scores. • In the second step, the center alignment is used to align the features at 2D/3D centers. • Further, it is often difficult to learn global information and capture long-range relationships, which are important for the depth prediction of objects. • Asymmetric non-local attention block with multiscale sampling to extract depth- wise features. • The code is released at https://github.com/mumianyuxin/M3DSSD.
  • 8. M3DSSD: Monocular 3D Single Stage Object Detector The architecture of M3DSSD. (a) The backbone of the framework, which is modified from DLA-102. (b) The two-step feature alignment, classification head, 2D/3D center regression heads, and ANAB especially designed for predicting the depth z3d. (c) Other regression heads.
  • 9. M3DSSD: Monocular 3D Single Stage Object Detector The architecture of shape alignment and the outcome of shape alignment on objects. The yellow squares indicate the sampling location of the AlignConv, and the anchors are in red.
  • 10. M3DSSD: Monocular 3D Single Stage Object Detector The architectures of center alignment and the outcome of the center alignment. When applying center alignment to objects, the sampling locations on the foreground regions (in white) all concentrate on the centers of objects (in yellow) after center alignment, which are near to the true centers of objects (in red).
  • 11. M3DSSD: Monocular 3D Single Stage Object Detector Asymmetric Non-local Attention Block. The key and query branches share the same attention maps, which forces the key and value to focus on the same place. Bottom: Pyramid Average Pooling with Attention (PA2) that generates different level descriptors in various resolutions.
  • 12. M3DSSD: Monocular 3D Single Stage Object Detector
  • 13. M3DSSD: Monocular 3D Single Stage Object Detector
  • 14. M3DSSD: Monocular 3D Single Stage Object Detector
  • 15. Delving into Localization Errors for Monocular 3D Object Detection • Quantify the impact introduced by each sub-task and found the ‘localization error’ is the vital factor in restricting monocular 3D detection. • Besides, investigate the underlying reasons behind localization errors, analyze the issues they might bring, and propose three strategies. • First, misalignment between the center of the 2D bounding box and the projected center of the 3D object, which is a vital factor leading to low localization accuracy. • Second, we observe that accurately localizing distant objects with existing technologies is almost impossible, while those samples will mislead the learned network. To remove such samples from the training set for improving the overall performance of the detector. • Lastly, 3D IoU oriented loss for the size estimation of the object, which is not affected by ‘localization error’. • Codes: https://github.com/xinzhuma/monodle.
  • 16. Delving into Localization Errors for Monocular 3D Object Detection
  • 17. Delving into Localization Errors for Monocular 3D Object Detection • Coupled with the errors accumulated by other tasks such as depth estimation, it becomes an almost impossible task to accurately estimate the 3D bounding box of distant objects from a single monocular image, unless the depth estimation is accurate enough (not achieved to date). • For estimating the coarse center c, 1) use the projected 3D center cw as the ground-truth for the branch estimating coarse center c and 2) force the model to learn features from 2D detection simultaneously. • Here adopting the projected 3D center cw as the ground-truth for the coarse center c, helps the branch for estimating the coarse center aware of 3D geometry and more related to the task of estimating 3D object center, which is the key of localization problem. • 2D detection serves as auxiliary task to learn better 3D aware features.
  • 18. Delving into Localization Errors for Monocular 3D Object Detection • Two schemes are proposed on how to generate the object level training weight for sample: • Hard coding: discard all samples over a certain distance • Soft coding: generate it using a reverse sigmoid-like function • A IoU oriented optimization for 3D size estimation: Specifically, suppose all prediction items except the 3D size s = [h,w,l]3D are completely correct, then
  • 19. Delving into Localization Errors for Monocular 3D Object Detection
  • 20. Delving into Localization Errors for Monocular 3D Object Detection
  • 21. Delving into Localization Errors for Monocular 3D Object Detection
  • 22. GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection • While there were attempts to include NMS in the training pipeline for tasks such as 2D object detection, they have been less widely • adopted due to a non-mathematical expression of the NMS. • It integrate GrooMeD-NMS – a Grouped Mathematically Differentiable NMS for monocular 3D object detection, such that the network is trained end-to- end with a loss on the boxes after NMS. • First formulate NMS as a matrix operation and then group and mask the boxes in an unsupervised manner to obtain a simple closed-form expression of the NMS. • GrooMeDNMS addresses the mismatch between training and inference pipelines and, therefore, forces the network to select the best 3D box in a differentiable manner.
  • 23. GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection (a) Conventional object detection has a mismatch between training and inference as it uses NMS only in inference. (b) To address this, propose a novel GrooMeD-NMS layer, such that the network is trained end-to-end with NMS applied. s and r denote the score of boxes B before and after the NMS respectively. O denotes the matrix containing IoU2D overlaps of B. Lbefore denotes the losses before the NMS, while Lafter denotes the loss after the NMS. (c) GrooMeD-NMS layer calculates r in a differentiable manner giving gradients from Lafter when the best-localized box corresponding to an object is not selected after NMS.
  • 24. GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection
  • 25. GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection write the rescores r in a matrix formulation as written compactly as where P, called the Prune Matrix, is obtained when the pruning function p operates element-wise on O. to avoid recursion, use as the solution cluster boxes in an image in an unsupervised manner based on IoU2D overlaps to obtain the groups G. Grouping thus mimics the grouping of the classical NMS, but does not rescore the boxes. Rewrite as
  • 26. GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection Classical NMS considers the IoU2D of the top-scored box with other boxes. This consideration is equivalent to only keeping the column of O corresponding to the top box while assigning the rest of the columns to be zero. Implement this through masking of PGk . Let MGk denote the binary mask corresponding to group Gk. Due to Frobenius matrix, simplify further to get entries in the binary matrix MGk in the column correspond. to the top scored box are 1 and the rest are 0.
  • 27. GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection pruning function
  • 28. GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection The method consists of M3DRPN and uses binning and self-balancing confidence. The boxes’ self-balancing confidence are used as scores s, which pass through the GrooMeD-NMS layer to obtain the rescores r. The rescores signal the network if the best box has not been selected for a particular object. The target assignment is For calculating gIoU3D, first calculate the volume V and hull volume Vhull of the 3D boxes. Vhull is the product of gIoU2D in Birds Eye View (BEV), removing the rotations and hull of the Y dimension. if the best boxes are correctly ranked in one image and are not in the second, then the gradients only affect the boxes of the second image. modification as Image-wise AP-Loss use the modified AP-Loss as the loss after NMS since AP-Loss does not suffer from class imbalance.
  • 29. GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection
  • 30. GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection
  • 31. Objects are Different: Flexible Monocular 3D Object Detection • Most existing methods adopt the same approach for all objects regardless of diverse distributions, leading to limited performance for truncated objects. • A flexible framework for monocular 3D object detection which explicitly decouples the truncated objects and adaptively combines multiple approaches for object depth estimation. • Specifically, decouple the edge of the feature map for predicting long-tail truncated objects so that the optimization of normal objects is not influenced. • Furthermore, formulate the object depth estimation as an uncertainty-guided ensemble of directly regressed object depth and solved depths from different groups of keypoints. • Code to be at https://github.com/zhangyp15/MonoFlex.
  • 32. Objects are Different: Flexible Monocular 3D Object Detection
  • 33. Objects are Different: Flexible Monocular 3D Object Detection • The framework is extended from CenterNet, where objects are identified by their representative points and predicted by peaks of the heatmap. • First, the CNN backbone extracts feature maps from the monocular image as the input for multiple prediction heads. • Multiple prediction branches are deployed on the shared backbone to regress objects’ properties, including the 2D bounding box, dimension, orientation, keypoints, and depth. • The image-level localization involves the heatmap and offsets, where the edge fusion modules are used to decouple the feature learning and prediction of truncated objects. • The final depth estimation is an uncertainty guided combination of the regressed depth and the computed depths from estimated keypoints and dimensions. • The adaptive depth ensemble adopts four methods for depth estimation and simultaneously predicts their uncertainties, which are utilized to form an uncertainty weighted prediction.
  • 34. Objects are Different: Flexible Monocular 3D Object Detection The dimension and orientation can be directly inferred from appearance-based clues, while the 3D location is converted to the projected 3D center xc = (uc, vc) and the object depth z as Existing methods utilize a unified representation xr, the center of 2D bounding box xb, for every object. In such cases, the offset 𝛿c = xc - xb is regressed to derive the projected 3D center xc. (a) The 3D location is converted to the projected center and the object depth. (b) The distribution of the offsets c from 2D centers to projected 3D centers. Inside and outside objects exhibit entirely different distributions. Divide objects into two groups depending on whether their projected 3D centers are inside or outside the image. The joint learning of 𝛿c can suffer from long-tail offsets and therefore to decouple the representations and the offset learning of inside and outside objects.
  • 35. Objects are Different: Flexible Monocular 3D Object Detection Inside Objects’ discretization offset error Outside Objects’ discretization offset error (a) The intersection xI between the image edge and the line from xb to xc is used to represent the truncated object. (b) The edge heatmap is generated with 1D Gaussian distribution. (c) The edge intersection xI (cyan) is a better representation than 2D center xb (green) for heavily truncated objects. Since 2D bounding boxes only capture the inside- image part of objects, the visual locations of xb can be confusing and even on other objects. By contrast, the intersection xI disentangles the edge area of the heatmap to focus on outside objects and offers a strong boundary prior to simplify the localization.
  • 36. Objects are Different: Flexible Monocular 3D Object Detection • Edge fusion module to further decouple the feature learning and prediction of outside objects; • The module first extracts four boundaries of the feature map and concatenates them into an edge feature vector in clockwise order, which is then processed by two 1D convolutional layers to learn • unique features for truncated objects. • Finally, the processed vector is remapped to the four boundaries and added to the input feature map.
  • 37. Objects are Different: Flexible Monocular 3D Object Detection Relation of global orientation, local orientation, and the viewing angle. Keypoints include the projections of eight vertexes, top center and bottom center of the 3D bounding box. The depth of a supporting line of the 3D bounding box can be computed with the object height and the line’s pixel height. Split ten keypoints into three groups, each of which can produce the center depth independently
  • 38. Objects are Different: Flexible Monocular 3D Object Detection
  • 39. Objects are Different: Flexible Monocular 3D Object Detection
  • 40. Objects are Different: Flexible Monocular 3D Object Detection
  • 41. Objects are Different: Flexible Monocular 3D Object Detection