SlideShare ist ein Scribd-Unternehmen logo
1 von 80
Downloaden Sie, um offline zu lesen
LiDAR-based Autonomous
Driving III (by Deep Learning)
Yu Huang
Yu.huang07@gmail.com
Sunnyvale, California
Outline
• CalibNet
• PointPillars
• Complex-YOLO
• Robust Deep Multi-modal Learning Based on GIF Network
• LATTE: Accelerate Lidar Point Cloud Annotation
• FVNet: 3D Front-View Proposal Generation for Object Detection from Point Cloud
• RGB and LiDAR fusion based 3D Semantic Segmentation
• Voxel-FPN: multi-scale voxel feature aggregation in 3D object detection from point clouds
• STD: Sparse-to-Dense 3D Object Detector for Point Cloud
• End-to-end sensor modeling for LiDAR Point Cloud
• Part-A2 Net
• StarNet: Targeted Computation for Object Detection in Point Clouds
• Class-balanced Grouping and Sampling for Point Cloud 3D Object Detection
• Deep Hough Voting for 3D Object Detection in Point Clouds
• MLOD: A multi-view 3D object detection based on robust feature fusion method
CalibNet: Self-Supervised Extrinsic Calibration
using 3D Spatial Transformer Networks
• CalibNet: a self-supervised deep network capable of automatically estimating the 6-DoF
rigid body transformation between a 3D LiDAR and a 2D camera in real-time.
• CalibNet alleviates the need for calibration targets, thereby resulting in significant savings in
calibration efforts.
• During training, the network only takes as input a LiDAR point cloud, the corresponding
monocular image, and the camera calibration matrix K.
• At train time, no impose direct supervision (i.e., no directly regress to the calibration parameters,
for example).
• Instead, train the network to predict calibration parameters that maximize the geometric and
photometric consistency of the input images and point clouds.
• CalibNet learns to iteratively solve the underlying geometric problem and accurately predicts
extrinsic calibration parameters for a wide range of mis-calibrations, without requiring
retraining or domain adaptation.
• Code: https://github.com/epiception/CalibNet.
CalibNet: Self-Supervised Extrinsic Calibration
using 3D Spatial Transformer Networks
Input RGB image (a), a raw LiDAR point cloud (b), and outputs a transformation T that best aligns the two
inputs. (c) the colorized point cloud output for a mis-calibrated setup, and (d) the output after calibration
CalibNet: Self-Supervised Extrinsic Calibration
using 3D Spatial Transformer Networks
Network architecture
CalibNet: Self-Supervised Extrinsic Calibration
using 3D Spatial Transformer Networks
PointPillars: Fast Encoders for Object
Detection from Point Clouds
• It addresses encoding a point cloud into a format appropriate for a detection pipeline.
• Two types of encoders: fixed encoders tend to be fast but sacrifice accuracy, while
encoders that are learned from data are more accurate, but slower.
• PointPillars is an encoder which utilizes PointNets to learn a representation of point
clouds organized in vertical columns (pillars).
• While the encoded features can be used with any standard 2D convolutional detection
architecture, run a lean downstream network.
• Despite only using lidar, a full detection pipeline significantly outperforms the SoA, even
among fusion methods, w.r.t. both the 3D and bird’s eye view KITTI benchmarks.
• This detection performance is achieved while running at 62 Hz.
• A faster version matches the state of the art at 105 Hz.
PointPillars: Fast Encoders for Object
Detection from Point Clouds
Network overview. The components of the network are a Pillar Feature Network, Backbone(2D CNN),
and SSD Detection Head. The raw point cloud is converted to a stacked pillar tensor and pillar index tensor.
The encoder uses the stacked pillars to learn a set of features that can be scattered back to a 2D pseudo-
image for a CNN. The features from the backbone are used by the detection head to predict 3D bounding
boxes for objects.
PointPillars: Fast Encoders for Object
Detection from Point Clouds
Qualitative analysis of KITTI results
Failure cases on KITTI
Complex-YOLO: An Euler-Region-Proposal for
Real-time 3D Object Detection on Point Clouds
• Complex-YOLO, a real-time 3D object detection network on point clouds only.
• A network that expands YOLOv2, a fast 2D standard object detector for RGB images, by a
specific complex regression strategy to estimate multi-class 3D boxes in Cartesian space.
• A specific Euler-Region- Proposal Network (E-RPN) to estimate the pose of the object by
adding an imaginary and a real fraction to the regression network.
• This network ends up in a closed complex space and avoids singularities, which occur by
single angle estimations. The E-RPN supports to generalize well during training.
Complex-YOLO: An Euler-Region-Proposal for
Real-time 3D Object Detection on Point Clouds
Complex-YOLO is a very efficient model that directly operates on Lidar only
based birds-eye-view RGB-maps to estimate and localize accurate 3D
multiclass bounding boxes. The figure shows a bird view based on a
Velodyne HDL64 point cloud such as the predicted objects.
Complex-YOLO: An Euler-Region-Proposal for
Real-time 3D Object Detection on Point Clouds
Complex-YOLO Pipeline. A pipeline for fast and accurate 3D box estimations on
point clouds. The RGB-map is fed into the CNN. The E-RPN grid runs simultaneously
on the last feature map and predicts five boxes per grid cell. Each box prediction is
composed by the regression parameters t and object scores p with a general
probability p0 and n class scores p1...pn.
Complex-YOLO: An Euler-Region-Proposal for
Real-time 3D Object Detection on Point Clouds
Complex-YOLO: An Euler-Region-Proposal for
Real-time 3D Object Detection on Point Clouds
Visualization of Complex-YOLO results.
Robust Deep Multi-modal Learning Based
on GIF Network
• Designing robust deep multimodal learning architecture in the presence of the modalities
degraded in quality.
• Deep fusion architecture for object detection which processes each modality using the
separate convolutional neural network (CNN) and constructs the joint feature maps by
combining the intermediate features obtained by the CNNs.
• To facilitate the robustness to the degraded modalities, the gated information fusion (GIF)
network which weights the contribution from each modality according to the input feature
maps to be fused.
• The combining weights are determined by applying the convolutional layers followed by the
sigmoid function to the concatenated intermediate feature maps.
• The network including the CNN backbone and GIF is trained in an end-to-end fashion.
Robust Deep Multi-modal Learning Based
on GIF Network
LATTE: Accelerating LiDAR Point Cloud Annotation via
Sensor Fusion, One-Click Annotation, and Tracking
• Annotating LiDAR point cloud data is challenging due to the following issues: 1) A LiDAR point cloud is
usually sparse and has low resolution, making it difficult for human annotators to recognize objects. 2)
Compared to annotation on 2D images, the operation of drawing 3D bounding boxes or even point- wise
labels on LiDAR point clouds is more complex and time- consuming. 3) LiDAR data are usually collected in
sequences, so consecutive frames are highly correlated, leading to repeated annotations.
• To tackle these challenges, LATTE, an open-sourced annotation tool for LiDAR point clouds.
• LATTE features the following innovations: 1) Sensor fusion: utilize image-based detection algorithms to
automatically pre-label a calibrated image, and transfer the labels to the point cloud. 2) One-click
annotation: Instead of drawing 3D bounding boxes or point-wise labels, simplify the annotation to just one
click on the target object, and automatically generate the bounding box for the target. 3) Tracking: integrate
tracking into sequence annotation such that transfer labels from one frame to subsequent ones and
therefore significantly reduce repeated labeling.
• Experiments show the features accelerate the annotation speed by 6.2x and significantly improve label
quality with 23.6% and 2.2% higher instance-level precision and recall, and 2.0% higher bounding box IoU.
• LATTE is open-sourced at https://github.com/bernwang/latte.
LATTE: Accelerating LiDAR Point Cloud Annotation via
Sensor Fusion, One-Click Annotation, and Tracking
A screenshot of LATTE
LATTE: Accelerating LiDAR Point Cloud Annotation via
Sensor Fusion, One-Click Annotation, and Tracking
Challenges of annotating LiDAR point clouds. (a) LiDAR point clouds have low resolution and therefore objects
are difficult for humans to recognize. The upper two figures are point clouds of a traffic pole and a cyclist, but
both are difficult to recognize. The lower two are the corresponding images. (b) Annotating 2D bounding boxes
on an image vs. 3D bounding boxes on a point cloud. Annotating 3D bounding boxes is more complicated due
to more degrees of freedom of 3D scaling and rotation. (c) Point clouds of two consecutive frames are shown
here. Even though the two frames are highly similar, target objects are moving and have different speeds. As a
LATTE: Accelerating LiDAR Point Cloud Annotation via
Sensor Fusion, One-Click Annotation, and Tracking
The sensor-fusion pipeline of LATTE. A Lidar point cloud is projected onto
its corresponding image. Next, use Mask-RCNN to predict semantic
labels on the image. The labels are then transferred back to the LiDAR
point cloud.
LATTE: Accelerating LiDAR Point Cloud Annotation via
Sensor Fusion, One-Click Annotation, and Tracking
To use sensor fusion to help annotators confirm the category of a selected object. Once a 3D
bounding box is chosen, project all the points within the bounding box to the image and show the
corresponding crop of the image to human annotators for visual confirmation.
LATTE: Accelerating LiDAR Point Cloud Annotation via
Sensor Fusion, One-Click Annotation, and Tracking
The one click annotation pipeline of LATTE. For a given Lidar point cloud, first remove the ground.
After an annotator clicks on one point on a target object, use clustering algorithms to expand from
the clicked point to the entire object. Finally, estimate a top-view 2D bounding box for the object.
LATTE: Accelerating LiDAR Point Cloud Annotation via
Sensor Fusion, One-Click Annotation, and Tracking
To model the ground as a segment of planes
After find the cluster, use a search-based
rectangle fitting to estimate bounding boxes.
Other methods, such as PCA based ones,
can also be plugged into LATTE. To have
the optimal rectangle fitting for a cluster,
need to know the appropriate heading of
the rectangle.
LATTE: Accelerating LiDAR Point Cloud Annotation via
Sensor Fusion, One-Click Annotation, and Tracking
Tracking pipeline of LATTE. Annotators label a
bounding box in the initial frame. Next, use
Kalman filtering to predict the center position of
the bounding box at the next frame. Human
annotators then adjust the bounding box, and
use the new center position as a new
observation to update the Kalman filter.
FVNet: 3D Front-View Proposal Generation for
Real-Time Object Detection from Point Cloud
• A framework called FVNet for 3D front-view proposal generation and object
detection from point clouds.
• It consists of two stages: generation of front-view proposals and estimation of 3D
bounding box parameters.
• Instead of generating proposals from camera images or bird’s-eye-view maps, first
project point clouds onto a cylindrical surface to generate front-view feature maps
which retains rich information.
• Then introduce a proposal generation network to predict 3D region proposals from
the generated maps and further extrude objects of interest from the whole point
cloud.
• Another network to extract the point-wise features from the extruded object points
and regress the final 3D bounding box parameters in the canonical coordinates.
• The framework achieves real-time performance with 12ms per point cloud sample.
FVNet: 3D Front-View Proposal Generation for
Real-Time Object Detection from Point Cloud
The overview of (a) FVNet. It consists of two sub-networks: (b) Proposal Generation Network (PG-Net) for
generation of 3D region proposals and (c) Parameter Estimation Network (PE-Net) for estimation of 3D
bounding box parameters.
FVNet: 3D Front-View Proposal Generation for
Real-Time Object Detection from Point Cloud
The architecture of PG-Net. The bottom shows
the details of the residual block, the
convolutional block and the up- sampling block,
respectively.
FVNet: 3D Front-View Proposal Generation for
Real-Time Object Detection from Point Cloud
A 3D bounding box and its
corresponding cylinder
fragment. Left: the 3D bounding
box with dimension prior (Pw,
Ph), location prediction (bx, by)
and truncated distances
prediction (r1, r2). Right: the
corresponding cylinder
fragment in 3D space, which is
generated by truncating the
frustum with two radial
distances r1 and r2.
The projection functions
RGB and LiDAR fusion based 3D Semantic Segmentation for
Autonomous Driving Fast Point RCNN
• LiDAR perception is gradually becoming mature for algorithms including object
detection and SLAM.
• However, semantic segmentation algorithm remains to be relatively less explored.
• Motivated by the fact that semantic segmentation is a mature algorithm on image
data, explore sensor fusion based 3D segmentation.
• To convert the RGB image to a polar-grid mapping representation used for LiDAR
and design early and mid-level fusion architectures.
• Additionally, design a hybrid fusion architecture that combines both fusion
algorithms.
• To evaluate the algorithm on KITTI dataset which provides segmentation annotation
for cars, pedestrians and cyclists.
• Have evaluated two state-of-the-art architectures namely SqueezeSeg and
PointSeg and improve the mIoU score by 10% in both cases relative to the LiDAR
only baseline.
RGB and LiDAR fusion based 3D Semantic Segmentation for
Autonomous Driving Fast Point RCNN
Illustration of LiDAR Polar Grid Map representation.
RGB and LiDAR fusion based 3D Semantic Segmentation for
Autonomous Driving Fast Point RCNN
Input frame and ground-truth
tensor. Top to bottom: X, Y, Z, D, I,
RGB and Ground Truth.
RGB and LiDAR fusion based 3D Semantic Segmentation for
Autonomous Driving Fast Point RCNN
(a) LiDAR baseline architecture based on SqueezeSeg
RGB and LiDAR fusion based 3D Semantic Segmentation for
Autonomous Driving Fast Point RCNN
(b) Proposed RGB+LiDAR mid-fusion architecture
Semantic Segmentation network architectures. (a) shows the baseline SqueezeSeg based unimodal
baseline architecture. The architecture remains the same for early fusion except for the change in
number of input planes. (b) shows the proposed mid-fusion architecture.
Part-A2 Net: 3D Part-Aware and Aggregation Neural
Network for Object Detection from Point Cloud
• The part-aware and aggregation neural network (Part-A2 Net) for 3D object detection from
point cloud.
• The whole framework consists of the part- aware stage and the part-aggregation stage.
• Firstly, the part-aware stage learns to simultaneously predict coarse 3D proposals and
accurate intra-object part locations with the free-of-charge supervisions derived from 3D
ground- truth boxes.
• The predicted intra-object part locations within the same proposals are grouped by the
new-designed RoI- aware point cloud pooling module, which results in an effective
representation to encode the features of 3D proposals.
• Then the part-aggregation stage learns to re-score the box and refine the box location
based on the pooled part locations.
• Extensive experiments on the KITTI 3D object detection dataset, which demonstrate that
both the predicted intra-object part locations and the proposed RoI-aware point cloud
pooling scheme benefit 3D object detection and Part-A2 net outperforms state-of-the-art
methods by utilizing only point cloud data.
Part-A2 Net: 3D Part-Aware and Aggregation Neural
Network for Object Detection from Point Cloud
Intra-object part locations and
segmentation masks can be robustly
predicted by the proposed part-aware
and aggregation network even when
objects are partially occluded. Such part
locations can assist accurate 3D object
detection.
Part-A2 Net: 3D Part-Aware and Aggregation Neural
Network for Object Detection from Point Cloud
The overall framework of part-aware and aggregation NN for 3D object detection. It consists of two stages: (a) The
first part-aware stage estimates intra-object part locations accurately and generates 3D proposals by feeding the
raw point cloud to newly designed backbone network. (b) The second part-aggregation stage conducts the
proposed RoI-aware point cloud pooling operation to group the part information from each 3D proposal, then the
part-aggregation network is utilized to score boxes and refine locations based on the part features and information.
Part-A2 Net: 3D Part-Aware and Aggregation Neural
Network for Object Detection from Point Cloud
Sparse up-sampling and feature refinement
block. This module is adopted in the decoder of
sparse convolution based UNet backbone. The
lateral features and bottom features are first
fused and transformed by sparse convolution.
The fused feature is then up-sampled by the
sparse inverse convolution.
Illustration of RoI-aware point cloud feature pooling. Due to the
ambiguity showed in the above BEV figure, not recover the
original box shape by using previous point cloud pooling method.
The RoI-aware point cloud pooling method could encode the box
shape by keeping the empty voxels, which could be efficiently
processed by following sparse convolution.
Part-A2 Net: 3D Part-Aware and Aggregation Neural
Network for Object Detection from Point Cloud
Qualitative results of Part-A2 Net on the KITTI test split. The predicted 3D boxes are drawn with green
3D bounding boxes, and the estimated intra-object part locations are visualized with different colors.
Voxel-FPN: multi-scale voxel feature aggregation
in 3D object detection from point clouds
• Object detection in point cloud data is one of the key components
in computer vision systems, especially for autonomous driving
applications.
• To present Voxel-FPN, a novel one-stage 3D object detector that
utilizes raw data from LIDAR sensors only.
• The core framework consists of an encoder network and a
corresponding decoder followed by a region proposal network.
• Encoder extracts multi-scale voxel information in a bottom-up
manner while decoder fuses multiple feature maps from various
scales in a top-down way.
Voxel-FPN: multi-scale voxel feature aggregation
in 3D object detection from point clouds
Voxel-FPN framework
Voxel-FPN: multi-scale voxel feature aggregation
in 3D object detection from point clouds
Structure of voxel feature extraction network
Voxel-FPN: multi-scale voxel feature aggregation
in 3D object detection from point clouds
The detailed structure for RPN-FPN
Voxel-FPN: multi-scale voxel feature aggregation
in 3D object detection from point clouds
Visualized car detection results from the method: cubes in green color
denote ground truth 3D boxes and those in red indicate detection results.
STD: Sparse-to-Dense 3D Object Detector
for Point Cloud
• A two-stage 3D object detection frame- work, named sparse-to-dense 3D Object
Detector (STD).
• The first stage is a bottom-up proposal generation network that uses raw point
cloud as input to generate accurate proposals by seeding each point with a new
spherical anchor.
• It achieves a high recall with less computation compared with prior works.
• Then, PointsPool is applied for generating proposal features by transforming their
interior point features from sparse expression to compact representation, which
saves even more computation time.
• In box prediction, which is the second stage, implement a parallel intersection-over-
union (IoU) branch to increase awareness of localization accuracy, resulting in
further improved performance.
• Experiments on KITTI dataset, and evaluate in terms of 3D object and Bird’s Eye
View (BEV) detection.
• It outperforms other state- of-the-arts by a large margin, especially on the hard set,
with inference speed more than 10 FPS.
STD: Sparse-to-Dense 3D Object Detector
for Point Cloud
Illustration of framework consisting of three different parts. The first is a proposal generation module (PGM) to
generate accurate proposals from man-made point-based spherical anchors. The second part is a PointsPool
layer to convert proposal features from sparse expression to compact representation. The final one is a box
prediction network. It classifies and regresses proposals, and picks high-quality predictions.
STD: Sparse-to-Dense 3D Object Detector
for Point Cloud
Illustration of networks in the proposal generation module. (a) 3D segmentation network (PointNet++). It takes a raw
point cloud (x, y, z, r) as input, and generates semantic segmentation scores as well as global context features for
each point by stacking SA layers and FP modules. (b) Proposal generation Network (PointNet). It treats normalized
coordinates and semantic features of points within anchors as input, and produces classification and regression
predictions.
STD: Sparse-to-Dense 3D Object Detector
for Point Cloud
Visualization of results on KITTI test set. Cars, pedestrians and cyclists are
highlighted in yellow, red and green respectively. The upper row in each image is
the 3D object detection result projected onto the RGB image. The other is the result
in the LiDAR phase.
End-to-end sensor modeling for LiDAR
Point Cloud
• Laser scanner sensors (LiDAR, Light Detection And Ranging) became a fundamental choice
due to its long- range and robustness to low light driving conditions.
• The problem of designing a control software for self-driving cars is a complex task to
explicitly formulate in rule-based systems, thus recent approaches rely on machine learning
that can learn those rules from data.
• The major problem with such approaches is that the amount of training data required for
generalizing a machine learning model is big, and on the other hand LiDAR data annotation
is very costly compared to other car sensors.
• An accurate LiDAR sensor model can cope with such problem.
• Moreover, its value goes beyond this because existing LiDAR development, validation, and
evaluation platforms and processes are very costly, and virtual testing and development
environments are still immature in terms of physical properties representation.
• This is a Deep Learning-based LiDAR sensor model.
• It models the sensor echos, using a Deep Neural Network to model echo pulse widths
learned from real data using Polar Grid Maps (PGM).
• To benchmark performance against comprehensive real sensor data.
End-to-end sensor modeling for LiDAR
Point Cloud
End-to-end sensor modeling for LiDAR
Point Cloud
LiDAR multiple echos phenomina
End-to-end sensor modeling for LiDAR
Point Cloud
A comparison between real LiDAR data and
data from synthetic data generated from the
sensor model to left and right respectively.
Each scan point color represent its Echo Pulse
Width (EPW) value. It is obvious that both
examples 1- the approach has clearly
mimicked EPW values from real data. 2- the
approach could mimic noise model in syntactic
generated data in the far perception. 3- the
model could learn how to represent lanes as
learned from real traces.
End-to-end sensor modeling for LiDAR
Point Cloud
Multidimensional Lockup
table that the DNN need to
learn.
End-to-end sensor modeling for LiDAR
Point Cloud
DNN Pipeline that Encapsulate Sensor model N- dimensional
Lockup table.
End-to-end sensor modeling for LiDAR
Point Cloud
Annotated Polar Grid Map point cloud, Upper PGM is
depth representation, lower PGM is point level annotation.
The Polar Grid Map (PGM) is a representation for a LiDAR full scan in a 3D
tensor.
End-to-end sensor modeling for LiDAR
Point Cloud
Dense Point level Annotated Point Cloud.
End-to-end sensor modeling for LiDAR
Point Cloud
Unet architecture. Each white box
corresponds to a multi-channel feature map.
The number of channels is denoted on top of
the box. The x-y-size is provided at the
middle of the box.
End-to-end sensor modeling for LiDAR
Point Cloud
Point cloud with its inferred EPWs.
End-to-end sensor modeling for LiDAR
Point Cloud
Histogram Bayes Classifier output, one out of many selection
block.
End-to-end sensor modeling for LiDAR
Point Cloud
Summary, learn from real traces(left image), to transfer syntactic
data(middle image) to be more realistic(right image).
Fast Point RCNN
• A unified, efficient and effective framework for point-cloud based 3D
object detection.
• The two-stage approach utilizes both voxel representation and raw
point cloud data to exploit respective advantages.
• The first stage network, with voxel representation as input, only
consists of light convolutional operations, producing a small number of
high-quality initial predictions.
• Coordinate and indexed convolutional feature of each point in initial
prediction are effectively fused with the attention mechanism,
preserving both accurate localization and context information.
• The second stage works on interior points with their fused feature for
further refining the prediction.
Fast Point RCNN
Overview of the two-stage framework. In the first stage, voxelize point cloud and feed them to VoxelRPN to
produce a small number of initial predictions. Then generate the box feature for each prediction by fusing
interior points’ coordinates and context feature from VoxelRPN. Box features are fed to RefinerNet for further
refinement.
Fast Point RCNN
Network structure of VoxelRPN. The format of
layers used in the figure follows (kernel
size)(channels)/(stride), i.e. (kx, ky, kz)(chn)/(sx, sy,
sz). The default stride is 1 unless otherwise
specified.
Suppose the region of interest for the point
cloud is a cuboid of size (L,W,H) and each
voxel is of size (vl,vw,vh), the 3D space can be
divided into 3D voxel grid of size (L/vl, W/vw,
V/vh).
Fast Point RCNN
Network Structure of RefinerNet.
Canonization of a box. The number denotes
the order of corner prediction in RefinerNet.
Fast Point RCNN
Visualization of results.
StarNet: Targeted Computation for Object
Detection in Point Clouds
• Previous work on object detection from LiDAR has emphasized re-purposing
convolutional approaches from traditional camera imagery.
• An object detection system designed specifically for point cloud data blending
aspects of one-stage and two-stage systems.
• Objects in point clouds are quite distinct from traditional camera images: objects are
sparse and vary widely in location, but do not exhibit scale distortions observed in
single camera perspective.
• It suggests that simple and cheap data-driven object proposals to maximize spatial
coverage or match the observed densities of point cloud data may suffice.
• This recognition paired with a local, non-convolutional, point-based network
permits building an object detector for point clouds that may be trained only once,
but adapted to different computational settings – targeted to different predictive
priorities or spatial regions.
• It is demonstrated this flexibility and the targeted detection strategies on both the
KITTI detection dataset as well as on the large-scale Waymo Open Dataset.
StarNet: Targeted Computation for Object
Detection in Point Clouds
StarNet overview
StarNet: Targeted Computation for Object
Detection in Point Clouds
StarNet point featurizer. (a)
StarNet Blocks take as input a set
of points, where each point has an
associated feature vector. Each
block first computes aggregate
statistics (max) across the point
cloud. Next, the global statistics
are concatenated back to each
point’s feature. Finally, two fully-
connected layers are applied, each
composed of BN, linear projection,
and ReLU activation. (b) The
StarNet point featurizer stacks
multiple StarNet Blocks and
performs a readout of each block’s
output using mean aggregation.
The readouts are concatenated
together to form the featurization
StarNet: Targeted Computation for Object
Detection in Point Clouds
Class-balanced Grouping and Sampling for
Point Cloud 3D Object Detection
• This report presents a method which wins the nuScenes 3D Detection
Challenge held in Workshop on Autonomous Driving(WAD, CVPR 2019).
• Generally, utilize sparse 3D convolution to extract rich semantic features,
which are then fed into a class-balanced multi-head network to perform 3D
object detection.
• To handle the severe class imbalance problem inherent in the autonomous
driving scenarios, design a class-balanced sampling and augmentation
strategy to generate a more balanced data distribution.
• A balanced grouping head to boost the performance for the categories with
similar shapes.
• Based on the Challenge results, it outperforms the PointPillars baseline by a
large margin across all metrics, achieving state-of-the-art (SOTA) detection
performance on the nuScenes dataset.
Class-balanced Grouping and Sampling for
Point Cloud 3D Object Detection
Network Architecture. 3D Feature Extractor is composed of submanifold and regular 3D sparse
convolutions. Outputs of 3D Feature Extractor are of 16× downscale ratio, which are flatten along
output axis and fed into following Region Proposal Network to generate 8× feature maps, followed by
the multi-group head network to generate final predictions. Number of groups in head is set according
to grouping specification.
Class-balanced Grouping and Sampling for
Point Cloud 3D Object Detection
Examples of detection results in validation split. Ground truth annotations are in green and detection results
are in blue. The token on top of each point cloud bird view image is its corresponding sample data token.
Deep Hough Voting for 3D Object
Detection in Point Clouds
• Code is open sourced at https://github.com/ facebookresearch/votenet
• Current 3D object detection methods are heavily influenced by 2D detectors.
• In order to leverage architectures in 2D detectors, they often convert 3D point clouds to regular grids
(i.e., to voxel grids or to bird’s eye view images), or rely on detection in 2D images to propose 3D
boxes.
• Few works have attempted to directly detect objects in point clouds.
• The first principle is to construct a 3D detection pipeline for point cloud data and as generic as
possible.
• However, due to the sparse nature of the data – samples from 2D manifolds in 3D space – a major
challenge when directly predicting bounding box parameters from scene points: a 3D object centroid
can be far from any surface point thus hard to regress accurately in one step.
• To address the challenge, VoteNet, an end-to-end 3D object detection network based on a synergy of
deep point set networks and Hough voting.
• This model achieves state-of-the-art 3D detection on two large datasets of real 3D scans, ScanNet and
SUN RGB-D with a simple design, compact model size and high efficiency.
• Remarkably, VoteNet outperforms previous methods by using purely geometric information without
relying on color images.
Deep Hough Voting for 3D Object
Detection in Point Clouds
3D object detection in point clouds with a deep Hough voting model. Given a
point cloud of a 3D scene, VoteNet votes to object centers and then groups
and aggregates the votes to predict 3D bounding boxes and semantic classes
of objects.
Deep Hough Voting for 3D Object
Detection in Point Clouds
Illustration of the VoteNet architecture for 3D object detection in point clouds. Given an input point cloud of N points with XYZ
coordinates, a backbone network (PointNet++ layers) subsamples and learns deep features on the points and outputs a subset
of M points but extended by C-dim features. This subset of points are considered as seed points. Each seed independently
generates a vote through a voting module. Then the votes are grouped into clusters and processed by the proposal module to
generate the final proposals. The classified and NMS proposals become the final 3D Bboxes output.
Deep Hough Voting for 3D Object
Detection in Point Clouds
Voting helps increase detection contexts. Seed
points that generate good boxes (BoxNet), or good
votes (VoteNet) which in turn generate good boxes,
are overlaid (in blue) on top of a representative
ScanNet scene. As the voting step effectively
increases context, VoteNet demonstrates a much
denser cover of the scene, therefore increasing the
likelihood of accurate detection.
MLOD: A multi-view 3D object detection
based on robust feature fusion method
• Multi-view Labelling Object Detector (MLOD).
• The detector takes an RGB image and a LIDAR point cloud as input and follows the two-stage object
detection framework.
• A Region Proposal Network (RPN) generates 3D proposals in a Bird’s Eye View (BEV) projection of the
point cloud.
• The second stage projects the 3D proposal bounding boxes to the image and BEV feature maps and
sends the corresponding map crops to a detection header for classification and bounding-box
regression.
• Unlike other multi-view based methods, the cropped image features are not directly fed to the
detection header, but masked by the depth information to filter out parts outside 3D bounding boxes.
• The fusion of image and BEV features is challenging, as they are derived from different perspectives.
• A detection header, which provides detection results not just from fusion layer, but also from each
sensor channel. Hence the object detector can be trained on data labelled in different views to avoid
the degeneration of feature extractors.
• MLOD achieves state-of-the-art performance on the KITTI 3D object detection benchmark.
• Most importantly, the evaluation shows that the header architecture is effective in preventing image
feature extractor degeneration.
MLOD: A multi-view 3D object detection
based on robust feature fusion method
The multi-view header architecture diagram
Architectural diagram of the proposed method
MLOD: A multi-view 3D object detection
based on robust feature fusion method
The procedure of the foreground masking layer.
(a) Illustration of foreground masking layer procedure:
Step 1: calculating the median of nonzero values in
each grid; Step 2: obtaining a mask by Equation 1
(dmin = 6.8, dmax = 9.7 in this example); Step 3: applying
the mask to the feature maps. (b) A qualitative
example of a foreground mask and its application to
the original image. The bottom left background and
the top left and right background are masked.
(a)
(b)
MLOD: A multi-view 3D object detection
based on robust feature fusion method
Qualitative results of MLOD. In each image, detected cars are in green, pedestrians are in blue, and cyclists are in yellow.
LiDAR-based Autonomous Driving III (by Deep Learning)

Weitere ähnliche Inhalte

Was ist angesagt?

Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detectionBrodmann17
 
3-d interpretation from single 2-d image for autonomous driving II
3-d interpretation from single 2-d image for autonomous driving II3-d interpretation from single 2-d image for autonomous driving II
3-d interpretation from single 2-d image for autonomous driving IIYu Huang
 
Object detection with deep learning
Object detection with deep learningObject detection with deep learning
Object detection with deep learningSushant Shrivastava
 
Deep VO and SLAM
Deep VO and SLAMDeep VO and SLAM
Deep VO and SLAMYu Huang
 
Multi sensor calibration by deep learning
Multi sensor calibration by deep learningMulti sensor calibration by deep learning
Multi sensor calibration by deep learningYu Huang
 
"Fundamentals of Monocular SLAM," a Presentation from Cadence
"Fundamentals of Monocular SLAM," a Presentation from Cadence"Fundamentals of Monocular SLAM," a Presentation from Cadence
"Fundamentals of Monocular SLAM," a Presentation from CadenceEdge AI and Vision Alliance
 
multiple object tracking using particle filter
multiple object tracking using particle filtermultiple object tracking using particle filter
multiple object tracking using particle filterSRIKANTH DANDE
 
Lec14 multiview stereo
Lec14 multiview stereoLec14 multiview stereo
Lec14 multiview stereoBaliThorat1
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImageryRAHUL BHOJWANI
 
Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)Hwa Pyung Kim
 
Deeplabv1, v2, v3, v3+
Deeplabv1, v2, v3, v3+Deeplabv1, v2, v3, v3+
Deeplabv1, v2, v3, v3+Sungchul Kim
 
Depth Fusion from RGB and Depth Sensors by Deep Learning
Depth Fusion from RGB and Depth Sensors by Deep LearningDepth Fusion from RGB and Depth Sensors by Deep Learning
Depth Fusion from RGB and Depth Sensors by Deep LearningYu Huang
 
Object Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkObject Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkNader Karimi
 
"Introduction to Feature Descriptors in Vision: From Haar to SIFT," A Present...
"Introduction to Feature Descriptors in Vision: From Haar to SIFT," A Present..."Introduction to Feature Descriptors in Vision: From Haar to SIFT," A Present...
"Introduction to Feature Descriptors in Vision: From Haar to SIFT," A Present...Edge AI and Vision Alliance
 
Depth Fusion from RGB and Depth Sensors II
Depth Fusion from RGB and Depth Sensors IIDepth Fusion from RGB and Depth Sensors II
Depth Fusion from RGB and Depth Sensors IIYu Huang
 
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object DetectionTaegyun Jeon
 
Object tracking survey
Object tracking surveyObject tracking survey
Object tracking surveyRich Nguyen
 
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)Yu Huang
 

Was ist angesagt? (20)

Object tracking
Object trackingObject tracking
Object tracking
 
Introduction of slam
Introduction of slamIntroduction of slam
Introduction of slam
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detection
 
3-d interpretation from single 2-d image for autonomous driving II
3-d interpretation from single 2-d image for autonomous driving II3-d interpretation from single 2-d image for autonomous driving II
3-d interpretation from single 2-d image for autonomous driving II
 
Object detection with deep learning
Object detection with deep learningObject detection with deep learning
Object detection with deep learning
 
Deep VO and SLAM
Deep VO and SLAMDeep VO and SLAM
Deep VO and SLAM
 
Multi sensor calibration by deep learning
Multi sensor calibration by deep learningMulti sensor calibration by deep learning
Multi sensor calibration by deep learning
 
"Fundamentals of Monocular SLAM," a Presentation from Cadence
"Fundamentals of Monocular SLAM," a Presentation from Cadence"Fundamentals of Monocular SLAM," a Presentation from Cadence
"Fundamentals of Monocular SLAM," a Presentation from Cadence
 
multiple object tracking using particle filter
multiple object tracking using particle filtermultiple object tracking using particle filter
multiple object tracking using particle filter
 
Lec14 multiview stereo
Lec14 multiview stereoLec14 multiview stereo
Lec14 multiview stereo
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite Imagery
 
Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)
 
Deeplabv1, v2, v3, v3+
Deeplabv1, v2, v3, v3+Deeplabv1, v2, v3, v3+
Deeplabv1, v2, v3, v3+
 
Depth Fusion from RGB and Depth Sensors by Deep Learning
Depth Fusion from RGB and Depth Sensors by Deep LearningDepth Fusion from RGB and Depth Sensors by Deep Learning
Depth Fusion from RGB and Depth Sensors by Deep Learning
 
Object Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkObject Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning Framework
 
"Introduction to Feature Descriptors in Vision: From Haar to SIFT," A Present...
"Introduction to Feature Descriptors in Vision: From Haar to SIFT," A Present..."Introduction to Feature Descriptors in Vision: From Haar to SIFT," A Present...
"Introduction to Feature Descriptors in Vision: From Haar to SIFT," A Present...
 
Depth Fusion from RGB and Depth Sensors II
Depth Fusion from RGB and Depth Sensors IIDepth Fusion from RGB and Depth Sensors II
Depth Fusion from RGB and Depth Sensors II
 
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
 
Object tracking survey
Object tracking surveyObject tracking survey
Object tracking survey
 
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
 

Ähnlich wie LiDAR-based Autonomous Driving III (by Deep Learning)

3-d interpretation from single 2-d image IV
3-d interpretation from single 2-d image IV3-d interpretation from single 2-d image IV
3-d interpretation from single 2-d image IVYu Huang
 
3-d interpretation from stereo images for autonomous driving
3-d interpretation from stereo images for autonomous driving3-d interpretation from stereo images for autonomous driving
3-d interpretation from stereo images for autonomous drivingYu Huang
 
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4IRJET Journal
 
Unsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object trackingUnsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object trackingYu Huang
 
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...Edge AI and Vision Alliance
 
Pose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learningPose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learningYu Huang
 
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdfmokamojah
 
Image Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyImage Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyNUPUR YADAV
 
Deep Learning’s Application in Radar Signal Data II
Deep Learning’s Application in Radar Signal Data IIDeep Learning’s Application in Radar Signal Data II
Deep Learning’s Application in Radar Signal Data IIYu Huang
 
Dataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problemsDataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problemsPetteriTeikariPhD
 
3-d interpretation from single 2-d image III
3-d interpretation from single 2-d image III3-d interpretation from single 2-d image III
3-d interpretation from single 2-d image IIIYu Huang
 
Deep learning for 3 d point clouds presentation
Deep learning for 3 d point clouds presentationDeep learning for 3 d point clouds presentation
Deep learning for 3 d point clouds presentationVijaylaxmiNagurkar
 
Depth Fusion from RGB and Depth Sensors IV
Depth Fusion from RGB and Depth Sensors  IVDepth Fusion from RGB and Depth Sensors  IV
Depth Fusion from RGB and Depth Sensors IVYu Huang
 
Arindam batabyal literature reviewpresentation
Arindam batabyal literature reviewpresentationArindam batabyal literature reviewpresentation
Arindam batabyal literature reviewpresentationArindam Batabyal
 
3-d interpretation from single 2-d image V
3-d interpretation from single 2-d image V3-d interpretation from single 2-d image V
3-d interpretation from single 2-d image VYu Huang
 
QUALITY ASSESSMENT FOR LIDAR POINT CLOUD REGISTRATION USING IN-SITU CONJUGATE...
QUALITY ASSESSMENT FOR LIDAR POINT CLOUD REGISTRATION USING IN-SITU CONJUGATE...QUALITY ASSESSMENT FOR LIDAR POINT CLOUD REGISTRATION USING IN-SITU CONJUGATE...
QUALITY ASSESSMENT FOR LIDAR POINT CLOUD REGISTRATION USING IN-SITU CONJUGATE...grssieee
 

Ähnlich wie LiDAR-based Autonomous Driving III (by Deep Learning) (20)

3-d interpretation from single 2-d image IV
3-d interpretation from single 2-d image IV3-d interpretation from single 2-d image IV
3-d interpretation from single 2-d image IV
 
3-d interpretation from stereo images for autonomous driving
3-d interpretation from stereo images for autonomous driving3-d interpretation from stereo images for autonomous driving
3-d interpretation from stereo images for autonomous driving
 
Mmpaper draft10
Mmpaper draft10Mmpaper draft10
Mmpaper draft10
 
Mmpaper draft10
Mmpaper draft10Mmpaper draft10
Mmpaper draft10
 
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4
 
Unsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object trackingUnsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object tracking
 
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
 
Pose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learningPose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learning
 
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
 
kanimozhi2019.pdf
kanimozhi2019.pdfkanimozhi2019.pdf
kanimozhi2019.pdf
 
Image Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyImage Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A survey
 
Deep Learning’s Application in Radar Signal Data II
Deep Learning’s Application in Radar Signal Data IIDeep Learning’s Application in Radar Signal Data II
Deep Learning’s Application in Radar Signal Data II
 
Dataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problemsDataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problems
 
3-d interpretation from single 2-d image III
3-d interpretation from single 2-d image III3-d interpretation from single 2-d image III
3-d interpretation from single 2-d image III
 
Deep learning for 3 d point clouds presentation
Deep learning for 3 d point clouds presentationDeep learning for 3 d point clouds presentation
Deep learning for 3 d point clouds presentation
 
Depth Fusion from RGB and Depth Sensors IV
Depth Fusion from RGB and Depth Sensors  IVDepth Fusion from RGB and Depth Sensors  IV
Depth Fusion from RGB and Depth Sensors IV
 
pydataPointCloud.pptx
pydataPointCloud.pptxpydataPointCloud.pptx
pydataPointCloud.pptx
 
Arindam batabyal literature reviewpresentation
Arindam batabyal literature reviewpresentationArindam batabyal literature reviewpresentation
Arindam batabyal literature reviewpresentation
 
3-d interpretation from single 2-d image V
3-d interpretation from single 2-d image V3-d interpretation from single 2-d image V
3-d interpretation from single 2-d image V
 
QUALITY ASSESSMENT FOR LIDAR POINT CLOUD REGISTRATION USING IN-SITU CONJUGATE...
QUALITY ASSESSMENT FOR LIDAR POINT CLOUD REGISTRATION USING IN-SITU CONJUGATE...QUALITY ASSESSMENT FOR LIDAR POINT CLOUD REGISTRATION USING IN-SITU CONJUGATE...
QUALITY ASSESSMENT FOR LIDAR POINT CLOUD REGISTRATION USING IN-SITU CONJUGATE...
 

Mehr von Yu Huang

Application of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingApplication of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingYu Huang
 
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...Yu Huang
 
Data Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous DrivingData Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous DrivingYu Huang
 
Techniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingTechniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingYu Huang
 
BEV Object Detection and Prediction
BEV Object Detection and PredictionBEV Object Detection and Prediction
BEV Object Detection and PredictionYu Huang
 
Fisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIFisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIYu Huang
 
Fisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VFisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VYu Huang
 
Fisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IVFisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IVYu Huang
 
Prediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduPrediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduYu Huang
 
Cruise AI under the Hood
Cruise AI under the HoodCruise AI under the Hood
Cruise AI under the HoodYu Huang
 
Scenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingScenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingYu Huang
 
How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?Yu Huang
 
Annotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingAnnotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingYu Huang
 
Simulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgSimulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgYu Huang
 
Prediction and planning for self driving at waymo
Prediction and planning for self driving at waymoPrediction and planning for self driving at waymo
Prediction and planning for self driving at waymoYu Huang
 
Jointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planningJointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planningYu Huang
 
Data pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingData pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingYu Huang
 
Open Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planningOpen Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planningYu Huang
 
Lidar in the adverse weather: dust, fog, snow and rain
Lidar in the adverse weather: dust, fog, snow and rainLidar in the adverse weather: dust, fog, snow and rain
Lidar in the adverse weather: dust, fog, snow and rainYu Huang
 
Autonomous Driving of L3/L4 Commercial trucks
Autonomous Driving of L3/L4 Commercial trucksAutonomous Driving of L3/L4 Commercial trucks
Autonomous Driving of L3/L4 Commercial trucksYu Huang
 

Mehr von Yu Huang (20)

Application of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingApplication of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous Driving
 
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
 
Data Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous DrivingData Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous Driving
 
Techniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingTechniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous Driving
 
BEV Object Detection and Prediction
BEV Object Detection and PredictionBEV Object Detection and Prediction
BEV Object Detection and Prediction
 
Fisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIFisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VI
 
Fisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VFisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving V
 
Fisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IVFisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IV
 
Prediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduPrediction,Planninng & Control at Baidu
Prediction,Planninng & Control at Baidu
 
Cruise AI under the Hood
Cruise AI under the HoodCruise AI under the Hood
Cruise AI under the Hood
 
Scenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingScenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous Driving
 
How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?
 
Annotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingAnnotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous Driving
 
Simulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgSimulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atg
 
Prediction and planning for self driving at waymo
Prediction and planning for self driving at waymoPrediction and planning for self driving at waymo
Prediction and planning for self driving at waymo
 
Jointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planningJointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planning
 
Data pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingData pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous driving
 
Open Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planningOpen Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planning
 
Lidar in the adverse weather: dust, fog, snow and rain
Lidar in the adverse weather: dust, fog, snow and rainLidar in the adverse weather: dust, fog, snow and rain
Lidar in the adverse weather: dust, fog, snow and rain
 
Autonomous Driving of L3/L4 Commercial trucks
Autonomous Driving of L3/L4 Commercial trucksAutonomous Driving of L3/L4 Commercial trucks
Autonomous Driving of L3/L4 Commercial trucks
 

Kürzlich hochgeladen

Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 

Kürzlich hochgeladen (20)

Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 

LiDAR-based Autonomous Driving III (by Deep Learning)

  • 1. LiDAR-based Autonomous Driving III (by Deep Learning) Yu Huang Yu.huang07@gmail.com Sunnyvale, California
  • 2. Outline • CalibNet • PointPillars • Complex-YOLO • Robust Deep Multi-modal Learning Based on GIF Network • LATTE: Accelerate Lidar Point Cloud Annotation • FVNet: 3D Front-View Proposal Generation for Object Detection from Point Cloud • RGB and LiDAR fusion based 3D Semantic Segmentation • Voxel-FPN: multi-scale voxel feature aggregation in 3D object detection from point clouds • STD: Sparse-to-Dense 3D Object Detector for Point Cloud • End-to-end sensor modeling for LiDAR Point Cloud • Part-A2 Net • StarNet: Targeted Computation for Object Detection in Point Clouds • Class-balanced Grouping and Sampling for Point Cloud 3D Object Detection • Deep Hough Voting for 3D Object Detection in Point Clouds • MLOD: A multi-view 3D object detection based on robust feature fusion method
  • 3. CalibNet: Self-Supervised Extrinsic Calibration using 3D Spatial Transformer Networks • CalibNet: a self-supervised deep network capable of automatically estimating the 6-DoF rigid body transformation between a 3D LiDAR and a 2D camera in real-time. • CalibNet alleviates the need for calibration targets, thereby resulting in significant savings in calibration efforts. • During training, the network only takes as input a LiDAR point cloud, the corresponding monocular image, and the camera calibration matrix K. • At train time, no impose direct supervision (i.e., no directly regress to the calibration parameters, for example). • Instead, train the network to predict calibration parameters that maximize the geometric and photometric consistency of the input images and point clouds. • CalibNet learns to iteratively solve the underlying geometric problem and accurately predicts extrinsic calibration parameters for a wide range of mis-calibrations, without requiring retraining or domain adaptation. • Code: https://github.com/epiception/CalibNet.
  • 4. CalibNet: Self-Supervised Extrinsic Calibration using 3D Spatial Transformer Networks Input RGB image (a), a raw LiDAR point cloud (b), and outputs a transformation T that best aligns the two inputs. (c) the colorized point cloud output for a mis-calibrated setup, and (d) the output after calibration
  • 5. CalibNet: Self-Supervised Extrinsic Calibration using 3D Spatial Transformer Networks Network architecture
  • 6. CalibNet: Self-Supervised Extrinsic Calibration using 3D Spatial Transformer Networks
  • 7. PointPillars: Fast Encoders for Object Detection from Point Clouds • It addresses encoding a point cloud into a format appropriate for a detection pipeline. • Two types of encoders: fixed encoders tend to be fast but sacrifice accuracy, while encoders that are learned from data are more accurate, but slower. • PointPillars is an encoder which utilizes PointNets to learn a representation of point clouds organized in vertical columns (pillars). • While the encoded features can be used with any standard 2D convolutional detection architecture, run a lean downstream network. • Despite only using lidar, a full detection pipeline significantly outperforms the SoA, even among fusion methods, w.r.t. both the 3D and bird’s eye view KITTI benchmarks. • This detection performance is achieved while running at 62 Hz. • A faster version matches the state of the art at 105 Hz.
  • 8. PointPillars: Fast Encoders for Object Detection from Point Clouds Network overview. The components of the network are a Pillar Feature Network, Backbone(2D CNN), and SSD Detection Head. The raw point cloud is converted to a stacked pillar tensor and pillar index tensor. The encoder uses the stacked pillars to learn a set of features that can be scattered back to a 2D pseudo- image for a CNN. The features from the backbone are used by the detection head to predict 3D bounding boxes for objects.
  • 9. PointPillars: Fast Encoders for Object Detection from Point Clouds Qualitative analysis of KITTI results Failure cases on KITTI
  • 10. Complex-YOLO: An Euler-Region-Proposal for Real-time 3D Object Detection on Point Clouds • Complex-YOLO, a real-time 3D object detection network on point clouds only. • A network that expands YOLOv2, a fast 2D standard object detector for RGB images, by a specific complex regression strategy to estimate multi-class 3D boxes in Cartesian space. • A specific Euler-Region- Proposal Network (E-RPN) to estimate the pose of the object by adding an imaginary and a real fraction to the regression network. • This network ends up in a closed complex space and avoids singularities, which occur by single angle estimations. The E-RPN supports to generalize well during training.
  • 11. Complex-YOLO: An Euler-Region-Proposal for Real-time 3D Object Detection on Point Clouds Complex-YOLO is a very efficient model that directly operates on Lidar only based birds-eye-view RGB-maps to estimate and localize accurate 3D multiclass bounding boxes. The figure shows a bird view based on a Velodyne HDL64 point cloud such as the predicted objects.
  • 12. Complex-YOLO: An Euler-Region-Proposal for Real-time 3D Object Detection on Point Clouds Complex-YOLO Pipeline. A pipeline for fast and accurate 3D box estimations on point clouds. The RGB-map is fed into the CNN. The E-RPN grid runs simultaneously on the last feature map and predicts five boxes per grid cell. Each box prediction is composed by the regression parameters t and object scores p with a general probability p0 and n class scores p1...pn.
  • 13. Complex-YOLO: An Euler-Region-Proposal for Real-time 3D Object Detection on Point Clouds
  • 14. Complex-YOLO: An Euler-Region-Proposal for Real-time 3D Object Detection on Point Clouds Visualization of Complex-YOLO results.
  • 15. Robust Deep Multi-modal Learning Based on GIF Network • Designing robust deep multimodal learning architecture in the presence of the modalities degraded in quality. • Deep fusion architecture for object detection which processes each modality using the separate convolutional neural network (CNN) and constructs the joint feature maps by combining the intermediate features obtained by the CNNs. • To facilitate the robustness to the degraded modalities, the gated information fusion (GIF) network which weights the contribution from each modality according to the input feature maps to be fused. • The combining weights are determined by applying the convolutional layers followed by the sigmoid function to the concatenated intermediate feature maps. • The network including the CNN backbone and GIF is trained in an end-to-end fashion.
  • 16. Robust Deep Multi-modal Learning Based on GIF Network
  • 17. LATTE: Accelerating LiDAR Point Cloud Annotation via Sensor Fusion, One-Click Annotation, and Tracking • Annotating LiDAR point cloud data is challenging due to the following issues: 1) A LiDAR point cloud is usually sparse and has low resolution, making it difficult for human annotators to recognize objects. 2) Compared to annotation on 2D images, the operation of drawing 3D bounding boxes or even point- wise labels on LiDAR point clouds is more complex and time- consuming. 3) LiDAR data are usually collected in sequences, so consecutive frames are highly correlated, leading to repeated annotations. • To tackle these challenges, LATTE, an open-sourced annotation tool for LiDAR point clouds. • LATTE features the following innovations: 1) Sensor fusion: utilize image-based detection algorithms to automatically pre-label a calibrated image, and transfer the labels to the point cloud. 2) One-click annotation: Instead of drawing 3D bounding boxes or point-wise labels, simplify the annotation to just one click on the target object, and automatically generate the bounding box for the target. 3) Tracking: integrate tracking into sequence annotation such that transfer labels from one frame to subsequent ones and therefore significantly reduce repeated labeling. • Experiments show the features accelerate the annotation speed by 6.2x and significantly improve label quality with 23.6% and 2.2% higher instance-level precision and recall, and 2.0% higher bounding box IoU. • LATTE is open-sourced at https://github.com/bernwang/latte.
  • 18. LATTE: Accelerating LiDAR Point Cloud Annotation via Sensor Fusion, One-Click Annotation, and Tracking A screenshot of LATTE
  • 19. LATTE: Accelerating LiDAR Point Cloud Annotation via Sensor Fusion, One-Click Annotation, and Tracking Challenges of annotating LiDAR point clouds. (a) LiDAR point clouds have low resolution and therefore objects are difficult for humans to recognize. The upper two figures are point clouds of a traffic pole and a cyclist, but both are difficult to recognize. The lower two are the corresponding images. (b) Annotating 2D bounding boxes on an image vs. 3D bounding boxes on a point cloud. Annotating 3D bounding boxes is more complicated due to more degrees of freedom of 3D scaling and rotation. (c) Point clouds of two consecutive frames are shown here. Even though the two frames are highly similar, target objects are moving and have different speeds. As a
  • 20. LATTE: Accelerating LiDAR Point Cloud Annotation via Sensor Fusion, One-Click Annotation, and Tracking The sensor-fusion pipeline of LATTE. A Lidar point cloud is projected onto its corresponding image. Next, use Mask-RCNN to predict semantic labels on the image. The labels are then transferred back to the LiDAR point cloud.
  • 21. LATTE: Accelerating LiDAR Point Cloud Annotation via Sensor Fusion, One-Click Annotation, and Tracking To use sensor fusion to help annotators confirm the category of a selected object. Once a 3D bounding box is chosen, project all the points within the bounding box to the image and show the corresponding crop of the image to human annotators for visual confirmation.
  • 22. LATTE: Accelerating LiDAR Point Cloud Annotation via Sensor Fusion, One-Click Annotation, and Tracking The one click annotation pipeline of LATTE. For a given Lidar point cloud, first remove the ground. After an annotator clicks on one point on a target object, use clustering algorithms to expand from the clicked point to the entire object. Finally, estimate a top-view 2D bounding box for the object.
  • 23. LATTE: Accelerating LiDAR Point Cloud Annotation via Sensor Fusion, One-Click Annotation, and Tracking To model the ground as a segment of planes After find the cluster, use a search-based rectangle fitting to estimate bounding boxes. Other methods, such as PCA based ones, can also be plugged into LATTE. To have the optimal rectangle fitting for a cluster, need to know the appropriate heading of the rectangle.
  • 24. LATTE: Accelerating LiDAR Point Cloud Annotation via Sensor Fusion, One-Click Annotation, and Tracking Tracking pipeline of LATTE. Annotators label a bounding box in the initial frame. Next, use Kalman filtering to predict the center position of the bounding box at the next frame. Human annotators then adjust the bounding box, and use the new center position as a new observation to update the Kalman filter.
  • 25. FVNet: 3D Front-View Proposal Generation for Real-Time Object Detection from Point Cloud • A framework called FVNet for 3D front-view proposal generation and object detection from point clouds. • It consists of two stages: generation of front-view proposals and estimation of 3D bounding box parameters. • Instead of generating proposals from camera images or bird’s-eye-view maps, first project point clouds onto a cylindrical surface to generate front-view feature maps which retains rich information. • Then introduce a proposal generation network to predict 3D region proposals from the generated maps and further extrude objects of interest from the whole point cloud. • Another network to extract the point-wise features from the extruded object points and regress the final 3D bounding box parameters in the canonical coordinates. • The framework achieves real-time performance with 12ms per point cloud sample.
  • 26. FVNet: 3D Front-View Proposal Generation for Real-Time Object Detection from Point Cloud The overview of (a) FVNet. It consists of two sub-networks: (b) Proposal Generation Network (PG-Net) for generation of 3D region proposals and (c) Parameter Estimation Network (PE-Net) for estimation of 3D bounding box parameters.
  • 27. FVNet: 3D Front-View Proposal Generation for Real-Time Object Detection from Point Cloud The architecture of PG-Net. The bottom shows the details of the residual block, the convolutional block and the up- sampling block, respectively.
  • 28. FVNet: 3D Front-View Proposal Generation for Real-Time Object Detection from Point Cloud A 3D bounding box and its corresponding cylinder fragment. Left: the 3D bounding box with dimension prior (Pw, Ph), location prediction (bx, by) and truncated distances prediction (r1, r2). Right: the corresponding cylinder fragment in 3D space, which is generated by truncating the frustum with two radial distances r1 and r2. The projection functions
  • 29. RGB and LiDAR fusion based 3D Semantic Segmentation for Autonomous Driving Fast Point RCNN • LiDAR perception is gradually becoming mature for algorithms including object detection and SLAM. • However, semantic segmentation algorithm remains to be relatively less explored. • Motivated by the fact that semantic segmentation is a mature algorithm on image data, explore sensor fusion based 3D segmentation. • To convert the RGB image to a polar-grid mapping representation used for LiDAR and design early and mid-level fusion architectures. • Additionally, design a hybrid fusion architecture that combines both fusion algorithms. • To evaluate the algorithm on KITTI dataset which provides segmentation annotation for cars, pedestrians and cyclists. • Have evaluated two state-of-the-art architectures namely SqueezeSeg and PointSeg and improve the mIoU score by 10% in both cases relative to the LiDAR only baseline.
  • 30. RGB and LiDAR fusion based 3D Semantic Segmentation for Autonomous Driving Fast Point RCNN Illustration of LiDAR Polar Grid Map representation.
  • 31. RGB and LiDAR fusion based 3D Semantic Segmentation for Autonomous Driving Fast Point RCNN Input frame and ground-truth tensor. Top to bottom: X, Y, Z, D, I, RGB and Ground Truth.
  • 32. RGB and LiDAR fusion based 3D Semantic Segmentation for Autonomous Driving Fast Point RCNN (a) LiDAR baseline architecture based on SqueezeSeg
  • 33. RGB and LiDAR fusion based 3D Semantic Segmentation for Autonomous Driving Fast Point RCNN (b) Proposed RGB+LiDAR mid-fusion architecture Semantic Segmentation network architectures. (a) shows the baseline SqueezeSeg based unimodal baseline architecture. The architecture remains the same for early fusion except for the change in number of input planes. (b) shows the proposed mid-fusion architecture.
  • 34. Part-A2 Net: 3D Part-Aware and Aggregation Neural Network for Object Detection from Point Cloud • The part-aware and aggregation neural network (Part-A2 Net) for 3D object detection from point cloud. • The whole framework consists of the part- aware stage and the part-aggregation stage. • Firstly, the part-aware stage learns to simultaneously predict coarse 3D proposals and accurate intra-object part locations with the free-of-charge supervisions derived from 3D ground- truth boxes. • The predicted intra-object part locations within the same proposals are grouped by the new-designed RoI- aware point cloud pooling module, which results in an effective representation to encode the features of 3D proposals. • Then the part-aggregation stage learns to re-score the box and refine the box location based on the pooled part locations. • Extensive experiments on the KITTI 3D object detection dataset, which demonstrate that both the predicted intra-object part locations and the proposed RoI-aware point cloud pooling scheme benefit 3D object detection and Part-A2 net outperforms state-of-the-art methods by utilizing only point cloud data.
  • 35. Part-A2 Net: 3D Part-Aware and Aggregation Neural Network for Object Detection from Point Cloud Intra-object part locations and segmentation masks can be robustly predicted by the proposed part-aware and aggregation network even when objects are partially occluded. Such part locations can assist accurate 3D object detection.
  • 36. Part-A2 Net: 3D Part-Aware and Aggregation Neural Network for Object Detection from Point Cloud The overall framework of part-aware and aggregation NN for 3D object detection. It consists of two stages: (a) The first part-aware stage estimates intra-object part locations accurately and generates 3D proposals by feeding the raw point cloud to newly designed backbone network. (b) The second part-aggregation stage conducts the proposed RoI-aware point cloud pooling operation to group the part information from each 3D proposal, then the part-aggregation network is utilized to score boxes and refine locations based on the part features and information.
  • 37. Part-A2 Net: 3D Part-Aware and Aggregation Neural Network for Object Detection from Point Cloud Sparse up-sampling and feature refinement block. This module is adopted in the decoder of sparse convolution based UNet backbone. The lateral features and bottom features are first fused and transformed by sparse convolution. The fused feature is then up-sampled by the sparse inverse convolution. Illustration of RoI-aware point cloud feature pooling. Due to the ambiguity showed in the above BEV figure, not recover the original box shape by using previous point cloud pooling method. The RoI-aware point cloud pooling method could encode the box shape by keeping the empty voxels, which could be efficiently processed by following sparse convolution.
  • 38. Part-A2 Net: 3D Part-Aware and Aggregation Neural Network for Object Detection from Point Cloud Qualitative results of Part-A2 Net on the KITTI test split. The predicted 3D boxes are drawn with green 3D bounding boxes, and the estimated intra-object part locations are visualized with different colors.
  • 39. Voxel-FPN: multi-scale voxel feature aggregation in 3D object detection from point clouds • Object detection in point cloud data is one of the key components in computer vision systems, especially for autonomous driving applications. • To present Voxel-FPN, a novel one-stage 3D object detector that utilizes raw data from LIDAR sensors only. • The core framework consists of an encoder network and a corresponding decoder followed by a region proposal network. • Encoder extracts multi-scale voxel information in a bottom-up manner while decoder fuses multiple feature maps from various scales in a top-down way.
  • 40. Voxel-FPN: multi-scale voxel feature aggregation in 3D object detection from point clouds Voxel-FPN framework
  • 41. Voxel-FPN: multi-scale voxel feature aggregation in 3D object detection from point clouds Structure of voxel feature extraction network
  • 42. Voxel-FPN: multi-scale voxel feature aggregation in 3D object detection from point clouds The detailed structure for RPN-FPN
  • 43. Voxel-FPN: multi-scale voxel feature aggregation in 3D object detection from point clouds Visualized car detection results from the method: cubes in green color denote ground truth 3D boxes and those in red indicate detection results.
  • 44. STD: Sparse-to-Dense 3D Object Detector for Point Cloud • A two-stage 3D object detection frame- work, named sparse-to-dense 3D Object Detector (STD). • The first stage is a bottom-up proposal generation network that uses raw point cloud as input to generate accurate proposals by seeding each point with a new spherical anchor. • It achieves a high recall with less computation compared with prior works. • Then, PointsPool is applied for generating proposal features by transforming their interior point features from sparse expression to compact representation, which saves even more computation time. • In box prediction, which is the second stage, implement a parallel intersection-over- union (IoU) branch to increase awareness of localization accuracy, resulting in further improved performance. • Experiments on KITTI dataset, and evaluate in terms of 3D object and Bird’s Eye View (BEV) detection. • It outperforms other state- of-the-arts by a large margin, especially on the hard set, with inference speed more than 10 FPS.
  • 45. STD: Sparse-to-Dense 3D Object Detector for Point Cloud Illustration of framework consisting of three different parts. The first is a proposal generation module (PGM) to generate accurate proposals from man-made point-based spherical anchors. The second part is a PointsPool layer to convert proposal features from sparse expression to compact representation. The final one is a box prediction network. It classifies and regresses proposals, and picks high-quality predictions.
  • 46. STD: Sparse-to-Dense 3D Object Detector for Point Cloud Illustration of networks in the proposal generation module. (a) 3D segmentation network (PointNet++). It takes a raw point cloud (x, y, z, r) as input, and generates semantic segmentation scores as well as global context features for each point by stacking SA layers and FP modules. (b) Proposal generation Network (PointNet). It treats normalized coordinates and semantic features of points within anchors as input, and produces classification and regression predictions.
  • 47. STD: Sparse-to-Dense 3D Object Detector for Point Cloud Visualization of results on KITTI test set. Cars, pedestrians and cyclists are highlighted in yellow, red and green respectively. The upper row in each image is the 3D object detection result projected onto the RGB image. The other is the result in the LiDAR phase.
  • 48. End-to-end sensor modeling for LiDAR Point Cloud • Laser scanner sensors (LiDAR, Light Detection And Ranging) became a fundamental choice due to its long- range and robustness to low light driving conditions. • The problem of designing a control software for self-driving cars is a complex task to explicitly formulate in rule-based systems, thus recent approaches rely on machine learning that can learn those rules from data. • The major problem with such approaches is that the amount of training data required for generalizing a machine learning model is big, and on the other hand LiDAR data annotation is very costly compared to other car sensors. • An accurate LiDAR sensor model can cope with such problem. • Moreover, its value goes beyond this because existing LiDAR development, validation, and evaluation platforms and processes are very costly, and virtual testing and development environments are still immature in terms of physical properties representation. • This is a Deep Learning-based LiDAR sensor model. • It models the sensor echos, using a Deep Neural Network to model echo pulse widths learned from real data using Polar Grid Maps (PGM). • To benchmark performance against comprehensive real sensor data.
  • 49. End-to-end sensor modeling for LiDAR Point Cloud
  • 50. End-to-end sensor modeling for LiDAR Point Cloud LiDAR multiple echos phenomina
  • 51. End-to-end sensor modeling for LiDAR Point Cloud A comparison between real LiDAR data and data from synthetic data generated from the sensor model to left and right respectively. Each scan point color represent its Echo Pulse Width (EPW) value. It is obvious that both examples 1- the approach has clearly mimicked EPW values from real data. 2- the approach could mimic noise model in syntactic generated data in the far perception. 3- the model could learn how to represent lanes as learned from real traces.
  • 52. End-to-end sensor modeling for LiDAR Point Cloud Multidimensional Lockup table that the DNN need to learn.
  • 53. End-to-end sensor modeling for LiDAR Point Cloud DNN Pipeline that Encapsulate Sensor model N- dimensional Lockup table.
  • 54. End-to-end sensor modeling for LiDAR Point Cloud Annotated Polar Grid Map point cloud, Upper PGM is depth representation, lower PGM is point level annotation. The Polar Grid Map (PGM) is a representation for a LiDAR full scan in a 3D tensor.
  • 55. End-to-end sensor modeling for LiDAR Point Cloud Dense Point level Annotated Point Cloud.
  • 56. End-to-end sensor modeling for LiDAR Point Cloud Unet architecture. Each white box corresponds to a multi-channel feature map. The number of channels is denoted on top of the box. The x-y-size is provided at the middle of the box.
  • 57. End-to-end sensor modeling for LiDAR Point Cloud Point cloud with its inferred EPWs.
  • 58. End-to-end sensor modeling for LiDAR Point Cloud Histogram Bayes Classifier output, one out of many selection block.
  • 59. End-to-end sensor modeling for LiDAR Point Cloud Summary, learn from real traces(left image), to transfer syntactic data(middle image) to be more realistic(right image).
  • 60. Fast Point RCNN • A unified, efficient and effective framework for point-cloud based 3D object detection. • The two-stage approach utilizes both voxel representation and raw point cloud data to exploit respective advantages. • The first stage network, with voxel representation as input, only consists of light convolutional operations, producing a small number of high-quality initial predictions. • Coordinate and indexed convolutional feature of each point in initial prediction are effectively fused with the attention mechanism, preserving both accurate localization and context information. • The second stage works on interior points with their fused feature for further refining the prediction.
  • 61. Fast Point RCNN Overview of the two-stage framework. In the first stage, voxelize point cloud and feed them to VoxelRPN to produce a small number of initial predictions. Then generate the box feature for each prediction by fusing interior points’ coordinates and context feature from VoxelRPN. Box features are fed to RefinerNet for further refinement.
  • 62. Fast Point RCNN Network structure of VoxelRPN. The format of layers used in the figure follows (kernel size)(channels)/(stride), i.e. (kx, ky, kz)(chn)/(sx, sy, sz). The default stride is 1 unless otherwise specified. Suppose the region of interest for the point cloud is a cuboid of size (L,W,H) and each voxel is of size (vl,vw,vh), the 3D space can be divided into 3D voxel grid of size (L/vl, W/vw, V/vh).
  • 63. Fast Point RCNN Network Structure of RefinerNet. Canonization of a box. The number denotes the order of corner prediction in RefinerNet.
  • 65. StarNet: Targeted Computation for Object Detection in Point Clouds • Previous work on object detection from LiDAR has emphasized re-purposing convolutional approaches from traditional camera imagery. • An object detection system designed specifically for point cloud data blending aspects of one-stage and two-stage systems. • Objects in point clouds are quite distinct from traditional camera images: objects are sparse and vary widely in location, but do not exhibit scale distortions observed in single camera perspective. • It suggests that simple and cheap data-driven object proposals to maximize spatial coverage or match the observed densities of point cloud data may suffice. • This recognition paired with a local, non-convolutional, point-based network permits building an object detector for point clouds that may be trained only once, but adapted to different computational settings – targeted to different predictive priorities or spatial regions. • It is demonstrated this flexibility and the targeted detection strategies on both the KITTI detection dataset as well as on the large-scale Waymo Open Dataset.
  • 66. StarNet: Targeted Computation for Object Detection in Point Clouds StarNet overview
  • 67. StarNet: Targeted Computation for Object Detection in Point Clouds StarNet point featurizer. (a) StarNet Blocks take as input a set of points, where each point has an associated feature vector. Each block first computes aggregate statistics (max) across the point cloud. Next, the global statistics are concatenated back to each point’s feature. Finally, two fully- connected layers are applied, each composed of BN, linear projection, and ReLU activation. (b) The StarNet point featurizer stacks multiple StarNet Blocks and performs a readout of each block’s output using mean aggregation. The readouts are concatenated together to form the featurization
  • 68. StarNet: Targeted Computation for Object Detection in Point Clouds
  • 69. Class-balanced Grouping and Sampling for Point Cloud 3D Object Detection • This report presents a method which wins the nuScenes 3D Detection Challenge held in Workshop on Autonomous Driving(WAD, CVPR 2019). • Generally, utilize sparse 3D convolution to extract rich semantic features, which are then fed into a class-balanced multi-head network to perform 3D object detection. • To handle the severe class imbalance problem inherent in the autonomous driving scenarios, design a class-balanced sampling and augmentation strategy to generate a more balanced data distribution. • A balanced grouping head to boost the performance for the categories with similar shapes. • Based on the Challenge results, it outperforms the PointPillars baseline by a large margin across all metrics, achieving state-of-the-art (SOTA) detection performance on the nuScenes dataset.
  • 70. Class-balanced Grouping and Sampling for Point Cloud 3D Object Detection Network Architecture. 3D Feature Extractor is composed of submanifold and regular 3D sparse convolutions. Outputs of 3D Feature Extractor are of 16× downscale ratio, which are flatten along output axis and fed into following Region Proposal Network to generate 8× feature maps, followed by the multi-group head network to generate final predictions. Number of groups in head is set according to grouping specification.
  • 71. Class-balanced Grouping and Sampling for Point Cloud 3D Object Detection Examples of detection results in validation split. Ground truth annotations are in green and detection results are in blue. The token on top of each point cloud bird view image is its corresponding sample data token.
  • 72. Deep Hough Voting for 3D Object Detection in Point Clouds • Code is open sourced at https://github.com/ facebookresearch/votenet • Current 3D object detection methods are heavily influenced by 2D detectors. • In order to leverage architectures in 2D detectors, they often convert 3D point clouds to regular grids (i.e., to voxel grids or to bird’s eye view images), or rely on detection in 2D images to propose 3D boxes. • Few works have attempted to directly detect objects in point clouds. • The first principle is to construct a 3D detection pipeline for point cloud data and as generic as possible. • However, due to the sparse nature of the data – samples from 2D manifolds in 3D space – a major challenge when directly predicting bounding box parameters from scene points: a 3D object centroid can be far from any surface point thus hard to regress accurately in one step. • To address the challenge, VoteNet, an end-to-end 3D object detection network based on a synergy of deep point set networks and Hough voting. • This model achieves state-of-the-art 3D detection on two large datasets of real 3D scans, ScanNet and SUN RGB-D with a simple design, compact model size and high efficiency. • Remarkably, VoteNet outperforms previous methods by using purely geometric information without relying on color images.
  • 73. Deep Hough Voting for 3D Object Detection in Point Clouds 3D object detection in point clouds with a deep Hough voting model. Given a point cloud of a 3D scene, VoteNet votes to object centers and then groups and aggregates the votes to predict 3D bounding boxes and semantic classes of objects.
  • 74. Deep Hough Voting for 3D Object Detection in Point Clouds Illustration of the VoteNet architecture for 3D object detection in point clouds. Given an input point cloud of N points with XYZ coordinates, a backbone network (PointNet++ layers) subsamples and learns deep features on the points and outputs a subset of M points but extended by C-dim features. This subset of points are considered as seed points. Each seed independently generates a vote through a voting module. Then the votes are grouped into clusters and processed by the proposal module to generate the final proposals. The classified and NMS proposals become the final 3D Bboxes output.
  • 75. Deep Hough Voting for 3D Object Detection in Point Clouds Voting helps increase detection contexts. Seed points that generate good boxes (BoxNet), or good votes (VoteNet) which in turn generate good boxes, are overlaid (in blue) on top of a representative ScanNet scene. As the voting step effectively increases context, VoteNet demonstrates a much denser cover of the scene, therefore increasing the likelihood of accurate detection.
  • 76. MLOD: A multi-view 3D object detection based on robust feature fusion method • Multi-view Labelling Object Detector (MLOD). • The detector takes an RGB image and a LIDAR point cloud as input and follows the two-stage object detection framework. • A Region Proposal Network (RPN) generates 3D proposals in a Bird’s Eye View (BEV) projection of the point cloud. • The second stage projects the 3D proposal bounding boxes to the image and BEV feature maps and sends the corresponding map crops to a detection header for classification and bounding-box regression. • Unlike other multi-view based methods, the cropped image features are not directly fed to the detection header, but masked by the depth information to filter out parts outside 3D bounding boxes. • The fusion of image and BEV features is challenging, as they are derived from different perspectives. • A detection header, which provides detection results not just from fusion layer, but also from each sensor channel. Hence the object detector can be trained on data labelled in different views to avoid the degeneration of feature extractors. • MLOD achieves state-of-the-art performance on the KITTI 3D object detection benchmark. • Most importantly, the evaluation shows that the header architecture is effective in preventing image feature extractor degeneration.
  • 77. MLOD: A multi-view 3D object detection based on robust feature fusion method The multi-view header architecture diagram Architectural diagram of the proposed method
  • 78. MLOD: A multi-view 3D object detection based on robust feature fusion method The procedure of the foreground masking layer. (a) Illustration of foreground masking layer procedure: Step 1: calculating the median of nonzero values in each grid; Step 2: obtaining a mask by Equation 1 (dmin = 6.8, dmax = 9.7 in this example); Step 3: applying the mask to the feature maps. (b) A qualitative example of a foreground mask and its application to the original image. The bottom left background and the top left and right background are masked. (a) (b)
  • 79. MLOD: A multi-view 3D object detection based on robust feature fusion method Qualitative results of MLOD. In each image, detected cars are in green, pedestrians are in blue, and cyclists are in yellow.