SlideShare ist ein Scribd-Unternehmen logo
1 von 113
Downloaden Sie, um offline zu lesen
LiDAR for Autonomous Vehicles II
(via Deep Learning)
Yu Huang
Yu.huang07@gmail.com
Sunnyvale, California
Outline
 Online Camera LiDAR Fusion and Object Detection on
Hybrid Data for Autonomous Driving
 RegNet: Multimodal Sensor Registration Using Deep
Neural Networks
 Vehicle Detection from 3D Lidar Using FCN
 VoxelNet: End-to-End Learning for Point Cloud Based
3D Object Detection
 Object Detection and Classification in Occupancy Grid
Maps using Deep Convolutional Networks
 RT3D: Real-Time 3-D Vehicle Detection in LiDAR Point
Cloud for Autonomous Driving
 BirdNet: a 3D Object Detection Framework from LiDAR
information
 LMNet: Real-time Multiclass Object Detection on CPU
using 3D LiDAR
 HDNET: Exploit HD Maps for 3D Object Detection
 IPOD: Intensive Point-based Object Detector for Point
Cloud
 PIXOR: Real-time 3D Object Detection from Point
Clouds
 DepthCN: Vehicle Detection Using 3D-LIDAR and
ConvNet
 SECOND: Sparsely Embedded Convolutional Detection
 YOLO3D: E2E RT 3D Oriented Object Bounding Box
Detection from LiDAR Point Cloud
 YOLO4D: A ST Approach for RT Multi-object Detection
and Classification from LiDAR Point Clouds
 Deconvolutional Networks for Point-Cloud Vehicle
Detection and Tracking in Driving Scenarios
 Fast and Furious: Real Time E2E 3D Detection,
Tracking and Motion Forecasting with a Single
Convolutional Net
…To be continued
Outline
 SqueezeSeg: Convolutional Neural Nets with
Recurrent CRF for Real-Time Road-Object
Segmentation from 3D LiDAR Point Cloud
 SEGCloud: Semantic Segmentation of 3D Point
Clouds
 Multi-View 3D Object Detection Network for
Autonomous Driving
 A General Pipeline for 3D Detection of Vehicles
 Combining LiDAR Space Clustering and Convolutional
Neural Networks for Pedestrian Detection
 Pseudo-LiDAR from Visual Depth Estimation: Bridging
the Gap in 3D Object Detection for Autonomous
Driving
 PointNet: Deep Learning on Point Sets for 3D
Classification and Segmentation
 PointNet++: Deep Hierarchical Feature Learning on
Point Sets in a Metric Space
 PointFusion: Deep Sensor Fusion for 3D Bounding
Box Estimation
 Frustum PointNets for 3D Object Detection from RGB-
D Data
 RoarNet: A Robust 3D Object Detection based on
RegiOn Approximation Refinement
 Joint 3D Proposal Generation and Object Detection
from View Aggregation
 SPLATNet: Sparse Lattice Networks for Point Cloud
Processing
 PointRCNN: 3D Object Proposal Generation and
Detection from Point Cloud
 Deep Continuous Fusion for Multi-Sensor 3D Object
Detection
 End-to-end Learning of Multi-sensor 3D Tracking by
Detection
Online Camera LiDAR Fusion and Object Detection on
Hybrid Data for Autonomous Driving
 Non-calibrated sensors result in artifacts and aberration in the environment model, which
makes tasks like free-space detection more challenging.
 To improve the LiDAR and camera fusion approach of Levinson and Thrun.
 Rely on intensity discontinuities and erosion and dilation of the edge image for increased
robustness against shadows and visual patterns, which is a recurring problem in point cloud
related work.
 Use a gradient free optimizer instead of an exhaustive grid search to find the extrinsic
calibration.
 The fusion pipeline is lightweight and able to run in real-time on a computer in the car.
 For the detection task, modify the Faster R-CNN architecture to accommodate hybrid LiDAR-
camera data for improved object detection and classification.
Online Camera LiDAR Fusion and Object Detection on
Hybrid Data for Autonomous Driving
sensor fusion and object detection pipeline
estimating the rotation and translation
btw their coordinate systems Non-optimal calibration
RegNet: Multimodal Sensor Registration Using Deep
Neural Networks
 RegNet, the deep CNN to infer a 6 DOF extrinsic calibration between multimodal sensors,
exemplified using a scanning LiDAR and a monocular camera.
 Compared to existing approaches, RegNet casts all 3 conventional calibration steps (feature
extraction, feature matching and global regression) into a single real-time capable CNN.
 It does not require any human interaction and bridges the gap between classical offline and
target-less online calibration approaches as it provides both a stable initial estimation as well
as a continuous online correction of the extrinsic parameters.
 During training, randomly decalibrate our system in order to train RegNet to infer the
correspondence between projected depth measurements and RGB image and finally regress
the extrinsic calibration.
 Additionally, with an iterative execution of multiple CNNs, that are trained on different
magnitudes of decalibration, it compares favorably to state-of-the-art methods in terms of a
mean calibration error of 0.28◦ for the rotational and 6 cm for the translation components
even for large decalibrations up to 1.5 m and 20◦ .
RegNet: Multimodal Sensor Registration Using Deep
Neural Networks
It estimates the calibration btw a depth and an RGB sensor. The depth points are projected on the RGB
image using an initial calibration Hinit. In the 1st and 2nd part of the network, use NiN blocks to extract rich
features for matching. The final part regresses decalibration by gathering global info. using two FCLs.
During training φdecalib is randomly permutated resulting in different projections of the depth points.
RegNet: Multimodal Sensor Registration Using Deep
Neural Networks
Vehicle Detection from 3D Lidar Using FCN
 Point clouds from a Velodyne scan can be
roughly projected and discretized into a 2D
point map;
The projected point map analogous to
cylindral images;
Encode the bounding box corner of the
vehicle (8 corners as 24-d);
It consists of one objectness classification
branch and one bounding box regression
branch.
(a) The input point map, with
the d channel visualized. (b)
The output confidence map of
the objectness branch. (c)
Bounding box candidates
corresponding to all points
predicted as positive, i.e. high
confidence points in (b). (d)
Remaining bounding boxes
after non-max suppression.
Vehicle Detection from 3D Lidar Using FCN
VoxelNet: End-to-End Learning for Point Cloud Based
3D Object Detection
 Just remove the need of manual feature
engineering for 3D point clouds and propose
VoxelNet, a generic 3D detection network that
unifies feature extraction and bounding box
prediction into a single stage, end-to-end trainable
deep network.
 Specifically, VoxelNet divides a point cloud into
equally spaced 3D voxels and transforms a group
of points within each voxel into a unified feature
representation through the voxel feature encoding
(VFE) layer.
 In this way, the point cloud is encoded as a
descriptive volumetric representation, which is
then connected to a RPN to generate detections.
VoxelNet: End-to-End Learning for Point Cloud Based
3D Object Detection
VoxelNet: End-to-End Learning for Point Cloud Based
3D Object Detection
Voxel feature encoding layer.
VoxelNet: End-to-End Learning for Point Cloud Based
3D Object Detection
Region proposal network architecture
Object Detection and Classification in Occupancy
Grid Maps using Deep Convolutional Networks
 Based on a grid map environment
representation, well-suited for sensor fusion,
free-space estimation and machine learning,
detect and classify objects using deep CNNs.
 As input, use a multi-layer grid map efficiently
encoding 3D range sensor info.
 The inference output consists of a list of rotated
Bboxes with associated semantic classes.
Transform range sensor measurements to a multi-
layer grid map which serves as input for object
detection and classification network. From these top
view grid maps the network infers rotated 3D
bounding boxes together with semantic classes.
These boxes can be projected into the camera image
for visual validation. Cars are depicted green, cyclists
aquamarin and pedestrians cyan.
Object Detection and Classification in Occupancy
Grid Maps using Deep Convolutional Networks
 Below are minimal preprocessing to obtain occupancy grid maps.
 As there are labeled objects only in the camera image, remove all points that are not in the
camera’s field of view.
 Apply ground surface segmentation and estimate different grid cell features, then the resulting
multi-layer grid maps are of size 60m×60m and a cell size of either 10cm or 15cm.
 As observed, the ground is flat in most of the scenarios, so fit a ground plane to the
representing point set.
 Then, use the full point set or a non-ground subset to construct a multi-layer grid map
containing different features.
Object Detection and Classification in Occupancy
Grid Maps using Deep Convolutional Networks
 KITTI Bird’s Eye View Evaluation 2017 consists of 7481 images for training and 7518
images for testing as well as corresponding range sensor data represented as point sets.
 Training and test data contain 80,256 labeled objects in total which are represented as
oriented 3D Bboxes (7 parameters).
 As summarized in Table, there are 8 semantic classes labeled in the training set although
not all classes are used to determine the benchmark result.
RT3D: Real-Time 3-D Vehicle Detection in LiDAR Point
Cloud for Autonomous Driving
 Real-time 3-dimensional (RT3D) vehicle detection
method that utilizes pure LiDAR point cloud to
predict the location, orientation, and size of vehicles.
 Apply pre-RoIpooling convolution that moves a
majority of the convolution operations to ahead of the
RoI pooling, leaving just a small part behind, so that
significantly boosts the computation efficiency.
 A pose-sensitive feature map design is strongly
activated by the relative poses of vehicles, leading to
a high regression accuracy on the location,
orientation, and size of vehicles.
 RT3D is the 1st LiDAR 3-D vehicle detection work
that completes detection within 0.09s.
RT3D: Real-Time 3-D Vehicle Detection in LiDAR Point
Cloud for Autonomous Driving
The network architecture of RT3D
BirdNet: a 3D Object Detection Framework from LiDAR
information
 LiDAR- based 3D object detection pipeline entailing three stages:
 First, laser info. is projected into a novel cell encoding for bird’s eye view projection.
 Later, both object location on the plane and its heading are estimated through a
convolutional neural network originally designed for image processing.
 Finally, 3D oriented detections are computed in a post-processing phase.
BirdNet: a 3D Object Detection Framework from LiDAR
information
Results on KITTI Benchmark test set: detections in image, BEV projection, and 3D point cloud.
LMNet: Real-time Multiclass Object Detection on CPU
using 3D LiDAR
 An optimized single-stage deep CNN to detect objects in urban environments, using
nothing more than point cloud data.
 The network structure employs dilated convolutions to gradually increase the perceptive
field as depth increases, this helps to reduce the computation time by about 30%.
 The input consists of 5 perspective representations of the unorganized point cloud data.
 The network outputs an objectness map and the bounding box offset values for each point.
 Using reflection, range, and the position on each of the 3 axes helped to improve the
location and orientation of the output bounding box.
 Execution times is 50 FPS using desktop GPUs, and up to 10 FPS on a Intel Core i5 CPU.
LMNet: Real-time Multiclass Object Detection on CPU
using 3D LiDAR
Used dilated layers
The LMNet architecture
Encoded input point cloud
HDNET: Exploit HD Maps for 3D Object
Detection
 High-Definition (HD) maps provide strong priors that can boost the performance and
robustness of modern 3D object detectors.
 Here is a one stage detector to extract geometric and semantic features from the HD maps.
 As maps might not be available everywhere, a map prediction module estimates the map
on the fly from raw LiDAR data.
 The whole framework runs at 20 frames per second.
HDNET: Exploit HD Maps for 3D Object
Detection
BEV LiDAR representation that exploits geometric and semantic HD map information.
(a) The raw LiDAR point cloud. (b) Incorporating geometric ground prior.
(c) Discretization of the LiDAR point cloud. (d) Incorporating semantic road prior.
HDNET: Exploit HD Maps for 3D Object
Detection
Network structures for object detection (left) and online map estimation (right).
IPOD: Intensive Point-based Object Detector for Point
Cloud
 A 3D object detection framework, IPOD, based on raw point cloud.
 It seeds object proposal for each point, which is the basic element.
 An E2E trainable architecture, where features of all points within a proposal are extracted
from the backbone network and achieve a proposal feature for final bounding inference.
 These features with both context info. and precise point cloud coord.s improve the performance.
IPOD: Intensive Point-based Object Detector for Point
Cloud
Illustration of point-based proposal
generation. (a) Semantic segmentation result
on the image. (b) Projected segmentation
result on point cloud. (c) Point-based
proposals on positive points after NMS.
IPOD: Intensive Point-based Object Detector for Point
Cloud
Illustration of proposal feature generation module. It combines location info. and
context feature to generate offsets from the centroid of interior points to the center of
target instance object. The predicted residuals are added back to the location info. in order
to make feature more robust to geometric transformation.
IPOD: Intensive Point-based Object Detector for Point
Cloud
Backbone architecture. Bounding-box prediction network.
PIXOR: Real-time 3D Object Detection from Point
Clouds
 This method utilizes the 3D data more efficiently by representing the scene from the
Bird’s Eye View (BEV), and propose PIXOR (ORiented 3D object detection from
PIXel-wise NN predictions), a proposal-free, single-stage detector that outputs
oriented 3D object estimates decoded from pixel-wise neural network predictions.
 The input representation, network architecture, and model optimization are specially
designed to balance high accuracy and real-time efficiency.
3D object detector from Bird’s Eye View (BEV) of LIDAR point cloud.
PIXOR: Real-time 3D Object Detection from Point
Clouds
The network architecture of PIXOR
Use cross-entropy loss on the classification output
and a smooth loss on the regression output.
Sum the classification loss over all locations on the
output map, while the regression loss is computed
over positive locations only.
DepthCN: Vehicle Detection Using 3D-LIDAR and
ConvNet
 Vehicle detection based on the Hypothesis Generation (HG) and Verification (HV)
paradigms.
 The data inputted to the system is a point cloud obtained from a 3D-LIDAR
mounted on board an instrumented vehicle, which is transformed to a Dense-
depth Map (DM).
 The solution starts by removing ground points followed by point cloud
segmentation.
 Then, segmented obstacles (object hypotheses) are projected onto the DM.
 Bboxes are fitted to the segmented objects as vehicle hypotheses (HG step).
 Bboxes are used as inputs to a ConvNet to classify/verify the hypotheses of
belonging to the category ‘vehicle’ (HV step).
DepthCN: Vehicle Detection Using 3D-LIDAR and
ConvNet
3D-LIDAR-based vehicle detection algorithm (DepthCN).
DepthCN: Vehicle Detection Using 3D-LIDAR and
ConvNet
Top: the point cloud where the detected ground points are denoted with green and LIDAR points that are
out of the field of view of the camera are shown in red. Bottom: the projected clusters and HG results in the
form of 2D BB. Right: the zoomed view, and the vertical orange arrows indicate corresponding obstacles.
DepthCN: Vehicle Detection Using 3D-LIDAR and
ConvNet
The generated Dense-depth Map (DM) with the
projected hypotheses (red).
The ConvNet architecture The generated hypotheses and the detection results are
shown as red and dashed-green BBs, respectively, in both
DM and images. The bottom figures show the result in PCD.
SECOND: Sparsely Embedded Convolutional
Detection
 An improved sparse convolution method for such networks, which significantly increases
the speed of both training and inference.
 Introduce a new form of angle loss regression to improve the orientation estimation
performance and a new data augmentation approach that can enhance the convergence
speed and performance.
 The proposed network produces SoA results on the KITTI 3D object detection
benchmarks while maintaining a fast inference speed.
The detector takes a raw point cloud as input, converts it to voxel features and coordinates, and applies two VFE
(voxel feature encoding) layers and a linear layer. A sparse CNN is applied and an RPN generates the detection.
SECOND: Sparsely Embedded Convolutional
Detection
The sparse convolution
algorithm is shown above, and
the GPU rule generation
algorithm is shown below. Nin
denotes the number of input
features, and Nout denotes the
number of output features. N is
the number of gathered features.
Rule is the rule matrix, where
Rule[i, :, :] is the ith rule
corresponding to the ith kernel
matrix in the convolution kernel.
The boxes with colors except
white indicate points with sparse
data and the white boxes
indicate empty points.
SECOND: Sparsely Embedded Convolutional
Detection
A GPU-based rule generation algorithm
(Algorithm 1) that runs faster on a GPU.
First, collect the input indexes and
associated spatial indexes instead of the
output indexes (1st loop). Duplicate
output locations are obtained in this
stage. Then execute a unique parallel
algorithm on the spatial index data to
obtain the output indexes and their
associated spatial indexes. A buffer with
the same spatial dimensions as those of
the sparse data is generated from the
previous results for table lookup in the
next step (2nd loop). Finally, we iterate
on the rules and use the stored spatial
indexes to obtain the output index for
each input index (3rd loop).
SECOND: Sparsely Embedded Convolutional
Detection
The structure of sparse middle feature extractor. The
yellow boxes represent sparse convolution, the
white boxes represent submanifold convolution, and
the red box represents the sparse-to-dense layer.
The upper part of the figure shows the spatial
dimensions of the sparse data.
Lθ = SmoothL1(sin(θp − θt)),
Introducing a new angle loss regression
This approach to angle loss has two advantages:
(1) it solves the adversarial problem btw orientations of 0, π;
(2) it naturally models the IoU against the angle offset
function.
Structure of RPN
downsampling
convolutional layers
concatenation
transpose convolutional layers
SECOND: Sparsely Embedded Convolutional
Detection
Results of 3D detection on the KITTI test set. For better visualization, the 3D boxes
detected using LiDAR are projected onto images from the left camera.
YOLO3D: E2E RT 3D Oriented Object Bounding Box
Detection from LiDAR Point Cloud
 Based on the success of the one-shot regression meta-architecture in the 2D perspective
image space, extend it to generate oriented 3D object Bboxes from LiDAR point cloud.
 The idea is extending the loss function of YOLO v2 to include the yaw angle, the 3D box
center in Cartesian coordinates and the height of the box as a direct regression problem.
 This formulation enables real-time performance, which is essential for automated driving.
 In KITTI, it achieves real-time performance (40 fps) on Titan X GPU.
YOLO3D: E2E RT 3D Oriented Object Bounding Box
Detection from LiDAR Point Cloud
The total loss
Project the point cloud to get bird’s eye view grid map.
create two grid maps from projection of point cloud.
The first feature map contains the maximum height,
where each grid cell (pixel) value represents the height
of the highest point associated with that cell.
The second grid map represent the density of points.
In YOLO-v2, anchors are calculated using k-means
clustering over width and length of ground truth boxes.
The point behind using anchors, is to find priors for the
boxes, onto which the model can predict modifications.
The anchors must be able to cover the whole range of
boxes that can appear in the data.
Choose not to use clustering to calculate the anchors,
and instead, calculate the mean 3D box dimensions for
each object class, and use these average box
dimensions as anchors.
YOLO4D: A ST Approach for RT Multi-object
Detection and Classification from LiDAR Point Clouds
 YOLO4D: the 3D LiDAR point clouds are aggregated over time as a 4D tensor;
3D space dimensions in addition to the time dimension, which is fed to a one-
shot fully convolutional detector, based on YOLO v2 architecture.
 YOLO3D is extended with Convol. LSTM for temporal features aggregation.
 The outputs are the oriented 3D Object BBox info., in addition to its length (L),
width (W), height (H) and orientation (yaw), together with the objects classes and
confidence scores.
 Two different techniques are evaluated to incorporate the temporal dimension:
recurrence and frame stacking.
YOLO4D: A ST Approach for RT Multi-object
Detection and Classification from LiDAR Point Clouds
Left: Frame stacking architecture; Right: Convolutional LSTM architecture.
The prediction model
The total loss
Deconvolutional Networks for Point-Cloud Vehicle
Detection and Tracking in Driving Scenarios
 A full vehicle detection and tracking system that
works with 3D lidar information only.
 The detection step uses a CNN that receives as
input a featured representation of the 3D
information provided by a Velodyne HDL-64
sensor and returns a per-point classification of
whether it belongs to a vehicle or not.
 The classified point cloud is then geometrically
processed to generate observations for a multi-
object tracking system implemented via a
number of Multi-Hypothesis Extended Kalman
Filters (MH-EKF) that estimate the position and
velocity of the surrounding vehicles.
The model is fed with an encoded representation
of the point cloud and computes for 3D each point
its probability of belonging to a vehicle. The
classified points are then clustered generating
trustworthy observations that are fed to MH-EKF
based tracker.
Deconvolutional Networks for Point-Cloud Vehicle
Detection and Tracking in Driving Scenarios
To obtain a useful input for the detector,
project the 3D point cloud raw data to a
featured image-like representation
containing ranges and reflectivity info. by
means of transformation G(·).
Ground truth for learning the classification
task is obtained by first projecting the
image-based Kitti tracklets over the 3D
Velodyne info., and then applying again
transformation G(·) over the selected points.
Deconvolutional Networks for Point-Cloud Vehicle
Detection and Tracking in Driving Scenarios
The network encompasses only conv. and deconv. Blocks followed by BN and ReLU nonlinearities. The first
3 blocks conduct the feature extraction step controlling, according to vehicle detection objective, the size of
the receptive fields and the feature maps generated. The next 3 deconvolutional blocks expanse the info.
enabling the point-wise classification. After each deconvolution, feature maps from the lower part of the
network are concatenated (CAT) before applying the normalization and non-linearities, providing richer info.
and better performance. During training, 3 losses are calculated at different network points.
Deconvolutional Networks for Point-Cloud Vehicle
Detection and Tracking in Driving Scenarios
They show the raw input point cloud, the
Deep detector output, the final tracked
vehicles and the RGB projected bounding
boxes submitted for evaluation.
Fast and Furious: Real Time E2E 3D Detection,
Tracking and Motion Forecasting with a Single
Convolutional Net
 A deep neural network to jointly reason about 3D detection, tracking and motion forecasting
given data captured by a 3D sensor.
 By jointly reasoning about these tasks, the holistic approach is more robust to occlusion as
well as sparse data at range.
 It performs 3D convolutions across space and time over a bird’s eye view representation of
the 3D world, which is very efficient in terms of both memory and computation.
 It can perform all tasks in as little as 30 ms.
Overlay temporal & motion forecasting data.
Green: bbox w/ 3D point. Grey: bbox w/o 3D point.
Fast and Furious: Real Time E2E 3D Detection,
Tracking and Motion Forecasting with a Single
Convolutional Net
The FaF work takes multiple frames as input and performs detection, tracking and motion forecasting.
Fast and Furious: Real Time E2E 3D Detection,
Tracking and Motion Forecasting with a Single
Convolutional Net
Modeling temporal information
Fast and Furious: Real Time E2E 3D Detection,
Tracking and Motion Forecasting with a Single
Convolutional Net
Motion forecasting
The loss function
classification loss
The regression targets
smooth L1
SqueezeSeg: Conv. Neural Nets with Recurrent CRF for RT
Road-Object Segmentation from 3D LiDAR Point Cloud
 Semantic segmentation of road-objects from 3D LiDAR point clouds.
 Detect and categorize instances of interest, such as cars, pedestrians and cyclists.
 Formulate it as a pointwise classification problem, and propose an E2E pipeline called
SqueezeSeg based on CNN: the CNN takes a transformed LiDAR point cloud as input and
directly outputs a point-wise label map, which is then refined by a CRF as a recurrent layer.
 Instance-level labels are then obtained by conventional clustering algorithms.
 The CNN model is trained on LiDAR point clouds from the KITTI dataset, and point-wise
segmentation labels are derived from 3D bounding boxes from KITTI.
 To obtain extra training data, built a LiDAR simulator into Grand Theft Auto V (GTA-V), a
popular video game, to synthesize large amounts of realistic training data.
GT segmentation Predicted segmentation
SqueezeSeg: Conv. Neural Nets with Recurrent CRF for RT
Road-Object Segmentation from 3D LiDAR Point Cloud
LiDAR Projections.
Network structure of SqueezeSeg
SqueezeSeg: Conv. Neural Nets with Recurrent CRF for RT
Road-Object Segmentation from 3D LiDAR Point Cloud
Structure of FireModule and FireDeconv
Conditional Random Field (CRF) as an RNN layer
https://github.com/BichenWuUCB/SqueezeSeg.
SEGCloud: Semantic Segmentation of 3D Point Clouds
 SEGCloud, an E2E framework to obtain 3D point-level segmentation that combines the
advantages of NNs, trilinear interpolation(TI) and fully connected CRF (FC-CRF).
 Coarse voxel predictions from a 3D Fully Convolutional NN are transferred back to the raw
3D points via trilinear interpolation.
 FC-CRF enforces global consistency and provides fine-grained semantics on the points.
 Implement the FC-CRF as a differentiable Recurrent NN to allow joint optimization.
SEGCloud: Semantic Segmentation of 3D Point Clouds
The 3D-FCNN is made of 3 residual layers sandwiched between 2 convolutional layers.
Max Pooling in the early stages of the network yields a 4X downsampling.
SEGCloud: Semantic Segmentation of 3D Point Clouds
Trilinear interpolation of class scores from voxels to points: Each point’s score is
computed as the weighted sum of the scores from its 8 spatially closest voxel centers.
SEGCloud: Semantic Segmentation of 3D Point Clouds
A 2-stage training by first optimizing over the point-level unary potentials (no
CRF) and then over the joint framework for point-level fine-grained labeling.
 Multi-View 3D networks (MV3D), a sensory-fusion
framework that takes both LIDAR point cloud and RGB
images as input and predicts oriented 3D b boxes.
Composed of 2 subnetworks: one for 3D object
proposal generation, one for multi-view feature fusion.
The proposal network generates 3D candidate boxes
from bird’s eye view representation of 3D point cloud.
A deep fusion scheme to combine region-wise
features from multiple views and enable interactions
btw intermediate layers of different paths.
Multi-View 3D Object Detection
Network for Autonomous Driving
Multi-View 3D Object Detection
Network for Autonomous Driving
Input features of the MV3D network.
Multi-View 3D Object Detection
Network for Autonomous Driving
Training strategy for the Region-
based Fusion Network: During
training, the bottom 3 paths and losses
are added to regularize the network.
The auxiliary layers share weights with
the corresponding layers in the main
network.
Multi-View 3D Object Detection
Network for Autonomous Driving
A General Pipeline for 3D Detection of
Vehicles
 A pipeline to adopt 2D detection net and fuse it with a 3D point cloud to generate 3D info.
 To identify the 3D box, model fitting based on generalised car models and score maps.
 A two-stage CNN is proposed to refine the detected 3D box.
General fusion pipeline. All of the point clouds viewed from the top (bird’s eye view). The height is encoded by color, with
red being the ground. A subset of points is selected based on the 2D detection. A model fitting algorithm based on the
generalised car models and score maps is applied to find the car points in the subset and a two-stage refinement CNN is
designed to fine tune the detected 3D box and re-assign an objectiveness score to it.
A General Pipeline for 3D Detection of
Vehicles
Generalised car models Score map (scores are indicated at bottom.)
Qualitative result illustration on KITTI data (top) and Boston data (bottom). Blue boxes are the 3D detection results
Combining LiDAR Space Clustering and Convolutional
Neural Networks for Pedestrian Detection
 In purely image- based pedestrian detection approaches, the SoA results
have been achieved with CNN and surprisingly few detection frameworks
have been built upon multi-cue approaches.
 This is a pedestrian detector for autonomous vehicles that exploits LiDAR
data, in addition to visual info.
 LiDAR data is utilized to generate region proposals by processing the 3-d
point cloud that it provides.
 These candidate regions are then further processed by a SoA CNN
classifier that was fine-tuned for pedestrian detection.
Combining LiDAR Space Clustering and Convolutional
Neural Networks for Pedestrian Detection
(a) Cluster proposal (b) Size and ratio corrections
Pseudo-LiDAR from Visual Depth Estimation: Bridging the
Gap in 3D Object Detection for Autonomous Driving
 Taking the inner workings of CNNs into consideration, convert image- based depth maps to
pseudo-LiDAR representations.
 With this representation, apply different existing LiDAR-based detection algorithms.
 On the popular KITTI benchmark, it raises the detection accuracy of objects within 30m
range from the previous SoA of 22% to an unprecedented 74%.
Pseudo-LiDAR signal from visual depth estimation.
Pseudo-LiDAR from Visual Depth Estimation: Bridging the
Gap in 3D Object Detection for Autonomous Driving
The two-step pipeline for image-based 3D object detection. Given stereo or monocular images,
first predict the depth map, followed by transforming it into a 3D point cloud in the LiDAR
coordinate system. Call this representation as pseudo-LiDAR, and process it exactly like
LiDAR — any LiDAR-based 3D objection algorithms thus can be applied.
Pseudo-LiDAR from Visual Depth Estimation: Bridging the
Gap in 3D Object Detection for Autonomous Driving
Apply a single 2D convolution with a
uniform kernel to the frontal view depth
map (top-left). The resulting depth map
(top-right), after projected into the bird’s-
eye view (bottom-right), reveals a large
depth distortion in comparison to the
original pseudo-LiDAR view (bottom-left),
especially for far-away objects. The boxes
are super-imposed and contain all points of
the green and yellow cars respectively.
Fusing Bird’s Eye View LIDAR Point Cloud and Front View
Camera Image for Deep Object Detection
 A method for fusing LIDAR point cloud and camera-captured images in deep
CNN.
 The method constructs a layer called sparse non-homogeneous pooling layer to
transform features between bird’s eye view and front view.
 The sparse point cloud is used to construct the mapping between the two views.
 The pooling layer allows fusion of multi-view features at any stage of the network.
 This is favorable for 3D object detection using camera-LIDAR fusion for
autonomous driving.
 A corresponding one-stage detector is designed and tested, which produces 3D
Bboxes from the bird’s eye view map.
Fusing Bird’s Eye View LIDAR Point Cloud and Front View
Camera Image for Deep Object Detection
The vanilla fusion-based one-stage object detection network
The sparse non-homogeneous pooling layer that fuses
front view image and bird’s eye view LIDAR feature.
Fusing Bird’s Eye View LIDAR Point Cloud and Front View
Camera Image for Deep Object Detection
(a)From camera to bird’s eye. (b)From bird’s eye to camera. (c)From front view conv4
layer to bird’s eye conv4 layer. (d)From bird’s eye conv4 to bird’s eye conv4.
Fusing Bird’s Eye View LIDAR Point Cloud and Front View
Camera Image for Deep Object Detection
The fusion-based one-stage object detection network with SOA single-sensor networks.
PointNet: Deep Learning on Point Sets for 3D
Classification and Segmentation
Applications of PointNet. PointNet is a deep net architecture that consumes raw point cloud (set of
points) without voxelization or rendering. It is a unified architecture that learns both global and local
point features, providing a simple, efficient and effective approach for a number of 3D recognition tasks.
PointNet: Deep Learning on Point Sets for 3D
Classification and Segmentation
PointNet Architecture. The classification network takes n points as input, applies input and feature transformations, and
then aggregates point features by max pooling. The output is classification scores for k classes. The segmentation network
is an extension to the classification net. It concatenates global and local features and outputs per point scores.
PointNet++: Deep Hierarchical Feature Learning on
Point Sets in a Metric Space
 PointNet does not capture local structures induced by the metric space points live in,
limiting its ability to recognize fine-grained patterns and generalizability to complex
scenes.
 The network called PointNet++ is able to learn deep point set features efficiently and
robustly.
 This is a hierarchical NN that applies PointNet recursively on a nested partitioning of the
input point set.
 By exploiting metric space distances, the network is able to learn local features with
increasing contextual scales.
 With further observation that point sets are usually sampled with varying densities, which
results in greatly decreased performance for networks trained on uniform densities, a set
learning layers is able to adaptively combine features from multiple scales.
PointNet++: Deep Hierarchical Feature Learning on
Point Sets in a Metric Space
PointFusion: Deep Sensor Fusion for 3D Bounding Box
Estimation
 PointFusion, a generic 3D object detection method that leverages both image and 3D point
cloud information.
 The image data and the raw point cloud data are independently processed by a CNN and a
PointNet architecture, respectively.
 The resulting outputs are then combined by a novel fusion network, which predicts multiple
3D box hypotheses and their confidences, using the input 3D points as spatial anchors.
Sample 3D object detection results of
PointFusion model on the KITTI dataset
(left) and the SUN-RGBD dataset (right).
PointFusion: Deep Sensor Fusion for 3D Bounding Box
Estimation
A PointNet variant that processes raw point cloud data (A), and a CNN that extracts visual features from an input
image (B). A vanilla global architecture that directly regresses the box corner locations (D), and a dense
architecture that predicts the spatial offset of each of the 8 corners relative to an input point (C): for each input
point, the network predicts the spatial offset (white arrows) from a corner (red dot) to the input point (blue), and
selects the prediction with the highest score as the final prediction (E).
Frustum PointNets for 3D Object Detection
from RGB-D Data
 A 3D object detection solution from RGB-D data in both indoor and outdoor
scenes.
 Previous methods focus on images or 3D voxels, often obscuring natural 3D
patterns and invariances of 3D data, this operate on raw point clouds by popping
up RGB-D scans.
 A challenge is how to efficiently localize objects in point clouds of large-scale
scenes (region proposal).
 Instead of solely relying on 3D proposals, it leverages both mature 2D object
detectors and advanced 3D deep learning for object localization, achieving
efficiency as well as high recall.
 Benefited from learning directly in raw point clouds, it is also able to precisely
estimate 3D Bboxes even under strong occlusion or with very sparse points.
Frustum PointNets for 3D Object Detection
from RGB-D Data
3D object detection pipeline. Given RGB-D data, first generate 2D object region proposals in
the RGB image using a CNN. Each 2D region is then extruded to a 3D viewing frustum in which
to get a point cloud from depth data. Finally, the frustum PointNet predicts a (oriented and amodal)
3D bounding box for the object from the points in frustum.
Frustum PointNets for 3D Object Detection
from RGB-D Data
Frustum PointNets for 3D object detection. First leverage a 2D CNN object detector to propose 2D regions and
classify their content. 2D regions are then lifted to 3D and thus become frustum proposals. Given a point cloud in a
frustum (n × c with n points and c channels of XYZ, intensity etc. for each point), the object instance is segmented
by binary classification of each point. Based on the segmented object point cloud (m × c), a light-weight regression
PointNet (T-Net) tries to align points by translation such that their centroid is close to amodal box center. At last the
box estimation net estimates the amodal 3D bounding box for the object.
Frustum PointNets for 3D Object Detection
from RGB-D Data
Coordinate systems for point cloud. (a) default camera
coordinate; (b) frustum coordinate after rotating frustums to
center view; (c) mask coordinate with object points’ centroid
at origin; (d) object coordinate predicted by T-Net.
Basic architectures and IO for PointNets. Architecture is
illustrated for PointNet++ (v2) models with set abstraction
layers and feature propagation layers (for segmentation).
Frustum PointNets for 3D Object Detection
from RGB-D Data
Visualizations of Frustum PointNet results on KITTI val set.
RoarNet: A Robust 3D Object Detection based on
RegiOn Approximation Refinement
 RoarNet for 3D object detection from 2D image and 3D Lidar point clouds.
 Based on two stage object detection framework with PointNet as backbone network, several
ideas to improve 3D object detection performance.
 The first part, estimates the 3D poses of objects from a monocular image, which approximates
where to examine further, and derives multiple candidates that are geometrically feasible.
 This step significantly narrows down feasible 3D regions, which otherwise requires demanding
processing of 3D point clouds in a huge search space.
 The second part, takes the candidate regions and conducts in-depth inferences to conclude final
poses in a recursive manner.
 Inspired by PointNet, RoarNet processes 3D point clouds directly, leading to precise detection.
 RoarNet is implemented in Tensorflow and publicly available with pretrained models.
RoarNet: A Robust 3D Object Detection based on
RegiOn Approximation Refinement
Detection pipeline of RoarNet. The model (a) predicts region proposals in 3D space using geometric
agreement search, (b) predicts objectness in each region proposal, (c) predicts 3D bounding boxes, (d)
calculates IoU (Intersection over Union) between 2D detection and 3D detection.
RoarNet: A Robust 3D Object Detection based on
RegiOn Approximation Refinement
Architecture of RoarNet
RoarNet: A Robust 3D Object Detection based on
RegiOn Approximation Refinement
(a) Previous
Architecture
(b) RoarNet 2D
Architecture
RoarNet: A Robust 3D Object Detection based on
RegiOn Approximation Refinement
RoarNet 2D. An unified architecture detects 2D
bounding boxes and 3D poses illustrated in (a)
and (b), respectively. For each object, two
extreme cases are shown as non-filled boxes,
and final equally-spaced candidate locations as
colored dots in (b). All calculations are derived
in 3D space despite bird’s eye view (i.e., X-Z
plane) visualization.
RoarNet: A Robust 3D Object Detection based on
RegiOn Approximation Refinement
A detection pipeline of several network architectures
Joint 3D Proposal Generation and Object Detection
from View Aggregation
 AVOD, an Aggregate View Object Detection network for autonomous driving scenarios.
 The network uses LIDAR point clouds and RGB images to generate features shared by two
subnetworks: a region proposal network (RPN) and a second stage detector network.
 The RPN is capable of performing multimodal feature fusion on high resolution feature maps to
generate reliable 3D object proposals for multiple object classes in road scenes.
 Using these proposals, the second stage detection network performs accurate oriented 3D bounding
box regression and category classification to predict the extents, orientation, and classification of
objects in 3D space.
 Source code is at: https://github.com/kujason/avod.
A visual representation of the 3D detection problem
from Bird’s Eye View (BEV). The Bbox in green is used to
determine the IoU overlap in the computation of the average
precision. The importance of explicit orientation estimation
can be seen as an object’s Bbox does not change when the
orientation (purple) is shifted by ±π radians.
Joint 3D Proposal Generation and Object Detection
from View Aggregation
The method’s architectural diagram. The feature extractors are shown in blue, the region proposal
network in pink, and the second stage detection network in green.
Joint 3D Proposal Generation and Object Detection
from View Aggregation
The architecture of high resolution
feature extractor for the image branch.
Feature maps are propagated from the
encoder to the decoder section via red
arrows. Fusion is then performed at every
stage of the decoder by a learned
upsampling layer, followed by concatenation,
and then mixing via a convolutional layer,
resulting in a full resolution feature map at
the last layer of the decoder.
Joint 3D Proposal Generation and Object Detection
from View Aggregation
Qualitative results of AVOD for cars (top) and pedestrians/cyclists (bottom). Left: 3D RPN output, Middle: 3D
detection output, and Right: the projection of the detection output onto image space for all three classes.
SPLATNet: Sparse Lattice Networks for Point Cloud
Processing
 A network architecture for processing point clouds that directly operates on a collection of
points represented as a sparse set of samples in a high-dimensional lattice.
 The network uses sparse bilateral convolutional layers as building blocks, and these layers
maintain efficiency by using indexing structures to apply convolutions only on occupied parts
of the lattice, so allow flexible specifications of the lattice structure enabling hierarchical and
spatially-aware feature learning, as well as joint 2D-3D reasoning.
 Both point-based and image-based representations can be easily incorporated in a network
with such layers and the resulting model can be trained in an E2E manner.
From point clouds and images to semantics. SPLATNet3D
directly takes point cloud as input and predicts labels for
each point. SPLATNet2D-3D, on the other hand, jointly
processes both point cloud and the corresponding multi-
view images for better 2D and 3D predictions.
SPLATNet: Sparse Lattice Networks for Point Cloud
Processing
Bilateral Convolution Layer (BCL). Splat: BCL
first interpolates input features F onto a dl-
dimensional permutohedral lattice defined by the
lattice features L at input points. Convolve: BCL
then does dl-dimensional convolution over this
sparsely populated lattice. Slice: The filtered signal
is then interpolated back onto the input signal.
• The input points to BCL need not be ordered or
lie on a grid as they are projected onto a dl-
dimensional grid defined by lattice features Lin.
• The input and output points can be different for
BCL with the specification of different input and
output lattice features Lin and Lout.
• Since BCL allows separate specifications of input
and lattice features, input signals can be projected
into a different dimensional space for filtering.
• Just like in standard spatial convolutions, BCL
allows an easy specification of filter neighborhood.
• Since a signal is usually sparse in high-
dimension, BCL uses hash tables to index the
populated vertices and does convolutions only at
those locations.
SPLATNet: Sparse Lattice Networks for Point Cloud
Processing
SPLATNet. Illustration of inputs, outputs and network architectures for SPLATNet3D and SPLATNet2D-3D.
SPLATNet: Sparse Lattice Networks for Point Cloud
Processing
2D to 3D projection. Using splat and slice using
splat and slice operations. Given input features of 2D
images, pixels are projected onto a 3D
permutohedral lattice defined by 3D positional lattice
features. The splatted signal is then sliced onto the
points of interest in a 3D point cloud.
Facade point cloud labeling. Sample visual
results of SPLATNet3D and SPLATNet2D-3D.
PointRCNN: 3D Object Proposal Generation and
Detection from Point Cloud
 PointRCNN is a deep NN method for 3D object detection from raw point cloud.
 The whole framework is composed of two stages:
 stage-1 for the bottom-up 3D proposal generation;
 stage-2 for refining proposals in the canonical coord.s to obtain the detection results.
 Instead of generating proposals from RGB image or projecting point cloud to bird’s view
or voxels, this stage-1 sub-network directly generates a small number of high-quality 3D
proposals from point cloud in a bottom-up manner via segmenting the point cloud of
whole scene into FG points and BG.
 The stage-2 sub-network transforms the pooled points of each proposal to canonical
coord.s to learn local spatial features, which is combined with global semantic features of
each point learned in stage-1 for accurate box refinement and confidence prediction.
PointRCNN: 3D Object Proposal Generation and
Detection from Point Cloud
Instead of generating proposals from fused feature
maps of bird’s view and front view, or RGB images,
this method directly generates 3D proposals from raw
point cloud in a bottom-up manner.
C: PointRCNN
PointRCNN: 3D Object Proposal Generation and
Detection from Point Cloud
The PointRCNN architecture. The whole network consists of two parts: (a) for generating 3D proposals
from raw point cloud in a bottom-up manner. (b) for refining the 3D proposals in canonical coordinate.
PointRCNN: 3D Object Proposal Generation and
Detection from Point Cloud
Bin-based localization. The surrounding area along X
and Z axes of each foreground point is split into a series
of bins to locate the object center.
Canonical transformation. The pooled points belonged to
each proposal are transformed to the corresponding canonical
coordinate system for better local spatial feature learning,
where CCS denotes Canonical Coordinate System.
PointRCNN: 3D Object Proposal Generation and
Detection from Point Cloud
The upper is the image and the lower is a representative view of the corresponding point cloud.
Deep Continuous Fusion for Multi-Sensor 3D Object
Detection
 A 3D object detector exploits both LIDAR and cameras to perform very accurate localization.
 Design an E2E learnable architecture that exploits continuous convolutions to fuse image and
LIDAR feature maps at different levels of resolution.
 The continuous fusion layer encode both discrete-state image features and continuous
geometric info.
 Deep parametric continuous convolution is a learnable operator that operates over non-grid-
structured data.
 The motivation behind is to extend the standard grid-structured convolution to non-grid-structured
data, while retaining high capacity and low complexity.
 The key idea is to exploit multi-layer perceptrons as parameterized kernel functions for continuous
convolution.
 This parametric kernel function spans the full continuous domain.
 The weighted summation over finite number of neighboring points is used to approximate the
otherwise computationally prohibitive continuous convolution.
 Each neighbor is weighted differently according to its relative geometric offset wrt the target point.
 This enables a reliable and efficient E2E learnable 3D object detector based on multiple
sensors.
Deep Continuous Fusion for Multi-Sensor 3D Object
Detection
Continuous fusion layer: given a target pixel on BEV image, extract K nearest LIDAR points (S1); project the 3D
points onto the camera image plane (S2-3); this helps retrieve corresponding image features (S4); feed the image
feature + continuous geometry offset into a MLP to generate feature for the target pixel (S5).
Deep Continuous Fusion for Multi-Sensor 3D Object
Detection
Qualitative
results on KITTI
Dataset.
End-to-end Learning of Multi-sensor 3D Tracking by
Detection
 An approach of tracking by detection that can exploit both cameras as well as LIDAR data to
produce very accurate 3D trajectories.
 Towards this goal, formulate it as a linear program that can be solved exactly, and learn
convolutional networks for detection as well as matching in an end-to-end manner.
The system takes as external input a time series of RGB Frames and LIDAR point clouds. From these
inputs, the system produces discrete trajectories of the targets. In particular, an architecture that is e2e
trainable while still maintaining explainability, is achieved by formulating the system in a structured manner.
End-to-end Learning of Multi-sensor 3D Tracking by
Detection
Forward passes over a set of detections from
two frames for both scoring and matching.
For each detection xj, a forward pass of a Detection
Network is computed to produce θdet
W(xj), the cost of
using or discarding xj according to the assignment to ydet
j.
For each pair of detections xj and xi from subsequent
frames, a forward pass of the Match Network is computed
to produce θlink
W(xi,xj), the cost of linking or not these two
detections according to the assignment to ylink
i,j. Finally,
each detection might start a new trajectory or end an
existing one, the costs for this are computed via θnew
W(x)
and θend
W(x), respectively, and are associated with the
assignments to ynew and yend.
Formulate the problem as inference in a deep structured model (DSM), where the factors are computed
using a set of feed forward neural nets that exploit both camera and LIDAR data to compute both detection
and matching scores. Inference in the model can be done exactly by a set of feed forward processes
followed by solving a linear program. Learning is done e2e via minimization of a structured hinge loss,
optimizing simultaneously the detector and tracker.
End-to-end Learning of Multi-sensor 3D Tracking by
Detection
End-to-end Learning of Multi-sensor 3D Tracking by
Detection
LiDAR Object Detection Methods for Autonomous Vehicles

Weitere ähnliche Inhalte

Was ist angesagt?

Deep Learning’s Application in Radar Signal Data II
Deep Learning’s Application in Radar Signal Data IIDeep Learning’s Application in Radar Signal Data II
Deep Learning’s Application in Radar Signal Data IIYu Huang
 
Radar 2009 a 18 synthetic aperture radar
Radar 2009 a 18 synthetic aperture radarRadar 2009 a 18 synthetic aperture radar
Radar 2009 a 18 synthetic aperture radarForward2025
 
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)Yu Huang
 
camera-based Lane detection by deep learning
camera-based Lane detection by deep learningcamera-based Lane detection by deep learning
camera-based Lane detection by deep learningYu Huang
 
BEV Joint Detection and Segmentation
BEV Joint Detection and SegmentationBEV Joint Detection and Segmentation
BEV Joint Detection and SegmentationYu Huang
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detectionBrodmann17
 
Real Time Object Tracking
Real Time Object TrackingReal Time Object Tracking
Real Time Object TrackingVanya Valindria
 
Deep VO and SLAM
Deep VO and SLAMDeep VO and SLAM
Deep VO and SLAMYu Huang
 
Introduction to Synthetic Aperture Radar (SAR)
Introduction to Synthetic Aperture Radar (SAR)Introduction to Synthetic Aperture Radar (SAR)
Introduction to Synthetic Aperture Radar (SAR)NopphawanTamkuan
 
Depth Fusion from RGB and Depth Sensors by Deep Learning
Depth Fusion from RGB and Depth Sensors by Deep LearningDepth Fusion from RGB and Depth Sensors by Deep Learning
Depth Fusion from RGB and Depth Sensors by Deep LearningYu Huang
 
Object Detection Methods using Deep Learning
Object Detection Methods using Deep LearningObject Detection Methods using Deep Learning
Object Detection Methods using Deep LearningSungjoon Choi
 
Anchor free object detection by deep learning
Anchor free object detection by deep learningAnchor free object detection by deep learning
Anchor free object detection by deep learningYu Huang
 
Itroduction to lidar ground, ballon&air born lidar
Itroduction to lidar ground, ballon&air born lidarItroduction to lidar ground, ballon&air born lidar
Itroduction to lidar ground, ballon&air born lidaranuarag1992
 
Lidar technology and it’s applications
Lidar technology and it’s applicationsLidar technology and it’s applications
Lidar technology and it’s applicationskarthik chegireddy
 
Object Detection & Tracking
Object Detection & TrackingObject Detection & Tracking
Object Detection & TrackingAkshay Gujarathi
 
Lec14 multiview stereo
Lec14 multiview stereoLec14 multiview stereo
Lec14 multiview stereoBaliThorat1
 
Deep learning based object detection basics
Deep learning based object detection basicsDeep learning based object detection basics
Deep learning based object detection basicsBrodmann17
 

Was ist angesagt? (20)

Deep Learning’s Application in Radar Signal Data II
Deep Learning’s Application in Radar Signal Data IIDeep Learning’s Application in Radar Signal Data II
Deep Learning’s Application in Radar Signal Data II
 
Radar 2009 a 18 synthetic aperture radar
Radar 2009 a 18 synthetic aperture radarRadar 2009 a 18 synthetic aperture radar
Radar 2009 a 18 synthetic aperture radar
 
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
 
camera-based Lane detection by deep learning
camera-based Lane detection by deep learningcamera-based Lane detection by deep learning
camera-based Lane detection by deep learning
 
BEV Joint Detection and Segmentation
BEV Joint Detection and SegmentationBEV Joint Detection and Segmentation
BEV Joint Detection and Segmentation
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detection
 
Real Time Object Tracking
Real Time Object TrackingReal Time Object Tracking
Real Time Object Tracking
 
Deep VO and SLAM
Deep VO and SLAMDeep VO and SLAM
Deep VO and SLAM
 
Introduction to Synthetic Aperture Radar (SAR)
Introduction to Synthetic Aperture Radar (SAR)Introduction to Synthetic Aperture Radar (SAR)
Introduction to Synthetic Aperture Radar (SAR)
 
Depth Fusion from RGB and Depth Sensors by Deep Learning
Depth Fusion from RGB and Depth Sensors by Deep LearningDepth Fusion from RGB and Depth Sensors by Deep Learning
Depth Fusion from RGB and Depth Sensors by Deep Learning
 
Object Detection Methods using Deep Learning
Object Detection Methods using Deep LearningObject Detection Methods using Deep Learning
Object Detection Methods using Deep Learning
 
Mapping mobile robotics
Mapping mobile roboticsMapping mobile robotics
Mapping mobile robotics
 
Gps
GpsGps
Gps
 
Anchor free object detection by deep learning
Anchor free object detection by deep learningAnchor free object detection by deep learning
Anchor free object detection by deep learning
 
Itroduction to lidar ground, ballon&air born lidar
Itroduction to lidar ground, ballon&air born lidarItroduction to lidar ground, ballon&air born lidar
Itroduction to lidar ground, ballon&air born lidar
 
Lidar technology and it’s applications
Lidar technology and it’s applicationsLidar technology and it’s applications
Lidar technology and it’s applications
 
Object Detection & Tracking
Object Detection & TrackingObject Detection & Tracking
Object Detection & Tracking
 
Lec14 multiview stereo
Lec14 multiview stereoLec14 multiview stereo
Lec14 multiview stereo
 
Deep learning based object detection basics
Deep learning based object detection basicsDeep learning based object detection basics
Deep learning based object detection basics
 
Sar
SarSar
Sar
 

Ähnlich wie LiDAR Object Detection Methods for Autonomous Vehicles

3-d interpretation from single 2-d image for autonomous driving II
3-d interpretation from single 2-d image for autonomous driving II3-d interpretation from single 2-d image for autonomous driving II
3-d interpretation from single 2-d image for autonomous driving IIYu Huang
 
3-d interpretation from single 2-d image IV
3-d interpretation from single 2-d image IV3-d interpretation from single 2-d image IV
3-d interpretation from single 2-d image IVYu Huang
 
3-d interpretation from stereo images for autonomous driving
3-d interpretation from stereo images for autonomous driving3-d interpretation from stereo images for autonomous driving
3-d interpretation from stereo images for autonomous drivingYu Huang
 
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdfmokamojah
 
Udacity-Didi Challenge Finalists
Udacity-Didi Challenge FinalistsUdacity-Didi Challenge Finalists
Udacity-Didi Challenge FinalistsDavid Silver
 
Arindam batabyal literature reviewpresentation
Arindam batabyal literature reviewpresentationArindam batabyal literature reviewpresentation
Arindam batabyal literature reviewpresentationArindam Batabyal
 
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4IRJET Journal
 
IRJET- Automatic Traffic Sign Detection and Recognition using CNN
IRJET- Automatic Traffic Sign Detection and Recognition using CNNIRJET- Automatic Traffic Sign Detection and Recognition using CNN
IRJET- Automatic Traffic Sign Detection and Recognition using CNNIRJET Journal
 
Understanding the world in 3D with AI.pdf
Understanding the world in 3D with AI.pdfUnderstanding the world in 3D with AI.pdf
Understanding the world in 3D with AI.pdfQualcomm Research
 
Remote Sensing Field Camp 2016
Remote Sensing Field Camp 2016 Remote Sensing Field Camp 2016
Remote Sensing Field Camp 2016 COGS Presentations
 
Traffic Light Detection and Recognition for Self Driving Cars using Deep Lear...
Traffic Light Detection and Recognition for Self Driving Cars using Deep Lear...Traffic Light Detection and Recognition for Self Driving Cars using Deep Lear...
Traffic Light Detection and Recognition for Self Driving Cars using Deep Lear...ijtsrd
 
Lane and Object Detection for Autonomous Vehicle using Advanced Computer Vision
Lane and Object Detection for Autonomous Vehicle using Advanced Computer VisionLane and Object Detection for Autonomous Vehicle using Advanced Computer Vision
Lane and Object Detection for Autonomous Vehicle using Advanced Computer VisionYogeshIJTSRD
 
Object gripping algorithm for robotic assistance by means of deep leaning
Object gripping algorithm for robotic assistance by means of deep leaning Object gripping algorithm for robotic assistance by means of deep leaning
Object gripping algorithm for robotic assistance by means of deep leaning IJECEIAES
 
Goal location prediction based on deep learning using RGB-D camera
Goal location prediction based on deep learning using RGB-D cameraGoal location prediction based on deep learning using RGB-D camera
Goal location prediction based on deep learning using RGB-D camerajournalBEEI
 
BEV Semantic Segmentation
BEV Semantic SegmentationBEV Semantic Segmentation
BEV Semantic SegmentationYu Huang
 

Ähnlich wie LiDAR Object Detection Methods for Autonomous Vehicles (20)

Mmpaper draft10
Mmpaper draft10Mmpaper draft10
Mmpaper draft10
 
Mmpaper draft10
Mmpaper draft10Mmpaper draft10
Mmpaper draft10
 
3-d interpretation from single 2-d image for autonomous driving II
3-d interpretation from single 2-d image for autonomous driving II3-d interpretation from single 2-d image for autonomous driving II
3-d interpretation from single 2-d image for autonomous driving II
 
3-d interpretation from single 2-d image IV
3-d interpretation from single 2-d image IV3-d interpretation from single 2-d image IV
3-d interpretation from single 2-d image IV
 
3-d interpretation from stereo images for autonomous driving
3-d interpretation from stereo images for autonomous driving3-d interpretation from stereo images for autonomous driving
3-d interpretation from stereo images for autonomous driving
 
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
 
Udacity-Didi Challenge Finalists
Udacity-Didi Challenge FinalistsUdacity-Didi Challenge Finalists
Udacity-Didi Challenge Finalists
 
Arindam batabyal literature reviewpresentation
Arindam batabyal literature reviewpresentationArindam batabyal literature reviewpresentation
Arindam batabyal literature reviewpresentation
 
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4
 
IRJET- Automatic Traffic Sign Detection and Recognition using CNN
IRJET- Automatic Traffic Sign Detection and Recognition using CNNIRJET- Automatic Traffic Sign Detection and Recognition using CNN
IRJET- Automatic Traffic Sign Detection and Recognition using CNN
 
Major PRC-1 ppt.pptx
Major PRC-1 ppt.pptxMajor PRC-1 ppt.pptx
Major PRC-1 ppt.pptx
 
kanimozhi2019.pdf
kanimozhi2019.pdfkanimozhi2019.pdf
kanimozhi2019.pdf
 
Understanding the world in 3D with AI.pdf
Understanding the world in 3D with AI.pdfUnderstanding the world in 3D with AI.pdf
Understanding the world in 3D with AI.pdf
 
Remote Sensing Field Camp 2016
Remote Sensing Field Camp 2016 Remote Sensing Field Camp 2016
Remote Sensing Field Camp 2016
 
Traffic Light Detection and Recognition for Self Driving Cars using Deep Lear...
Traffic Light Detection and Recognition for Self Driving Cars using Deep Lear...Traffic Light Detection and Recognition for Self Driving Cars using Deep Lear...
Traffic Light Detection and Recognition for Self Driving Cars using Deep Lear...
 
final_presentation
final_presentationfinal_presentation
final_presentation
 
Lane and Object Detection for Autonomous Vehicle using Advanced Computer Vision
Lane and Object Detection for Autonomous Vehicle using Advanced Computer VisionLane and Object Detection for Autonomous Vehicle using Advanced Computer Vision
Lane and Object Detection for Autonomous Vehicle using Advanced Computer Vision
 
Object gripping algorithm for robotic assistance by means of deep leaning
Object gripping algorithm for robotic assistance by means of deep leaning Object gripping algorithm for robotic assistance by means of deep leaning
Object gripping algorithm for robotic assistance by means of deep leaning
 
Goal location prediction based on deep learning using RGB-D camera
Goal location prediction based on deep learning using RGB-D cameraGoal location prediction based on deep learning using RGB-D camera
Goal location prediction based on deep learning using RGB-D camera
 
BEV Semantic Segmentation
BEV Semantic SegmentationBEV Semantic Segmentation
BEV Semantic Segmentation
 

Mehr von Yu Huang

Application of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingApplication of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingYu Huang
 
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...Yu Huang
 
Data Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous DrivingData Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous DrivingYu Huang
 
Techniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingTechniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingYu Huang
 
BEV Object Detection and Prediction
BEV Object Detection and PredictionBEV Object Detection and Prediction
BEV Object Detection and PredictionYu Huang
 
Fisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIFisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIYu Huang
 
Fisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VFisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VYu Huang
 
Fisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IVFisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IVYu Huang
 
Prediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduPrediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduYu Huang
 
Cruise AI under the Hood
Cruise AI under the HoodCruise AI under the Hood
Cruise AI under the HoodYu Huang
 
Scenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingScenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingYu Huang
 
How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?Yu Huang
 
Annotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingAnnotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingYu Huang
 
Simulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgSimulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgYu Huang
 
Prediction and planning for self driving at waymo
Prediction and planning for self driving at waymoPrediction and planning for self driving at waymo
Prediction and planning for self driving at waymoYu Huang
 
Jointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planningJointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planningYu Huang
 
Data pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingData pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingYu Huang
 
Open Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planningOpen Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planningYu Huang
 
Lidar in the adverse weather: dust, fog, snow and rain
Lidar in the adverse weather: dust, fog, snow and rainLidar in the adverse weather: dust, fog, snow and rain
Lidar in the adverse weather: dust, fog, snow and rainYu Huang
 
Autonomous Driving of L3/L4 Commercial trucks
Autonomous Driving of L3/L4 Commercial trucksAutonomous Driving of L3/L4 Commercial trucks
Autonomous Driving of L3/L4 Commercial trucksYu Huang
 

Mehr von Yu Huang (20)

Application of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingApplication of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous Driving
 
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
 
Data Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous DrivingData Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous Driving
 
Techniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingTechniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous Driving
 
BEV Object Detection and Prediction
BEV Object Detection and PredictionBEV Object Detection and Prediction
BEV Object Detection and Prediction
 
Fisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIFisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VI
 
Fisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VFisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving V
 
Fisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IVFisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IV
 
Prediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduPrediction,Planninng & Control at Baidu
Prediction,Planninng & Control at Baidu
 
Cruise AI under the Hood
Cruise AI under the HoodCruise AI under the Hood
Cruise AI under the Hood
 
Scenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingScenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous Driving
 
How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?
 
Annotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingAnnotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous Driving
 
Simulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgSimulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atg
 
Prediction and planning for self driving at waymo
Prediction and planning for self driving at waymoPrediction and planning for self driving at waymo
Prediction and planning for self driving at waymo
 
Jointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planningJointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planning
 
Data pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingData pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous driving
 
Open Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planningOpen Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planning
 
Lidar in the adverse weather: dust, fog, snow and rain
Lidar in the adverse weather: dust, fog, snow and rainLidar in the adverse weather: dust, fog, snow and rain
Lidar in the adverse weather: dust, fog, snow and rain
 
Autonomous Driving of L3/L4 Commercial trucks
Autonomous Driving of L3/L4 Commercial trucksAutonomous Driving of L3/L4 Commercial trucks
Autonomous Driving of L3/L4 Commercial trucks
 

Kürzlich hochgeladen

TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the weldingMuhammadUzairLiaqat
 
Industrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptIndustrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptNarmatha D
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substationstephanwindworld
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
Vishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsVishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsSachinPawar510423
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
Industrial Safety Unit-I SAFETY TERMINOLOGIES
Industrial Safety Unit-I SAFETY TERMINOLOGIESIndustrial Safety Unit-I SAFETY TERMINOLOGIES
Industrial Safety Unit-I SAFETY TERMINOLOGIESNarmatha D
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncssuser2ae721
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 
System Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingSystem Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingBootNeck1
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONjhunlian
 
Solving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptSolving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptJasonTagapanGulla
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
 
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...Amil Baba Dawood bangali
 

Kürzlich hochgeladen (20)

TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the welding
 
Industrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptIndustrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.ppt
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substation
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
Vishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsVishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documents
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
Industrial Safety Unit-I SAFETY TERMINOLOGIES
Industrial Safety Unit-I SAFETY TERMINOLOGIESIndustrial Safety Unit-I SAFETY TERMINOLOGIES
Industrial Safety Unit-I SAFETY TERMINOLOGIES
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 
System Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingSystem Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event Scheduling
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
 
Solving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptSolving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.ppt
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
 
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...
 

LiDAR Object Detection Methods for Autonomous Vehicles

  • 1. LiDAR for Autonomous Vehicles II (via Deep Learning) Yu Huang Yu.huang07@gmail.com Sunnyvale, California
  • 2. Outline  Online Camera LiDAR Fusion and Object Detection on Hybrid Data for Autonomous Driving  RegNet: Multimodal Sensor Registration Using Deep Neural Networks  Vehicle Detection from 3D Lidar Using FCN  VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection  Object Detection and Classification in Occupancy Grid Maps using Deep Convolutional Networks  RT3D: Real-Time 3-D Vehicle Detection in LiDAR Point Cloud for Autonomous Driving  BirdNet: a 3D Object Detection Framework from LiDAR information  LMNet: Real-time Multiclass Object Detection on CPU using 3D LiDAR  HDNET: Exploit HD Maps for 3D Object Detection  IPOD: Intensive Point-based Object Detector for Point Cloud  PIXOR: Real-time 3D Object Detection from Point Clouds  DepthCN: Vehicle Detection Using 3D-LIDAR and ConvNet  SECOND: Sparsely Embedded Convolutional Detection  YOLO3D: E2E RT 3D Oriented Object Bounding Box Detection from LiDAR Point Cloud  YOLO4D: A ST Approach for RT Multi-object Detection and Classification from LiDAR Point Clouds  Deconvolutional Networks for Point-Cloud Vehicle Detection and Tracking in Driving Scenarios  Fast and Furious: Real Time E2E 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net …To be continued
  • 3. Outline  SqueezeSeg: Convolutional Neural Nets with Recurrent CRF for Real-Time Road-Object Segmentation from 3D LiDAR Point Cloud  SEGCloud: Semantic Segmentation of 3D Point Clouds  Multi-View 3D Object Detection Network for Autonomous Driving  A General Pipeline for 3D Detection of Vehicles  Combining LiDAR Space Clustering and Convolutional Neural Networks for Pedestrian Detection  Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space  PointFusion: Deep Sensor Fusion for 3D Bounding Box Estimation  Frustum PointNets for 3D Object Detection from RGB- D Data  RoarNet: A Robust 3D Object Detection based on RegiOn Approximation Refinement  Joint 3D Proposal Generation and Object Detection from View Aggregation  SPLATNet: Sparse Lattice Networks for Point Cloud Processing  PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud  Deep Continuous Fusion for Multi-Sensor 3D Object Detection  End-to-end Learning of Multi-sensor 3D Tracking by Detection
  • 4. Online Camera LiDAR Fusion and Object Detection on Hybrid Data for Autonomous Driving  Non-calibrated sensors result in artifacts and aberration in the environment model, which makes tasks like free-space detection more challenging.  To improve the LiDAR and camera fusion approach of Levinson and Thrun.  Rely on intensity discontinuities and erosion and dilation of the edge image for increased robustness against shadows and visual patterns, which is a recurring problem in point cloud related work.  Use a gradient free optimizer instead of an exhaustive grid search to find the extrinsic calibration.  The fusion pipeline is lightweight and able to run in real-time on a computer in the car.  For the detection task, modify the Faster R-CNN architecture to accommodate hybrid LiDAR- camera data for improved object detection and classification.
  • 5. Online Camera LiDAR Fusion and Object Detection on Hybrid Data for Autonomous Driving sensor fusion and object detection pipeline estimating the rotation and translation btw their coordinate systems Non-optimal calibration
  • 6. RegNet: Multimodal Sensor Registration Using Deep Neural Networks  RegNet, the deep CNN to infer a 6 DOF extrinsic calibration between multimodal sensors, exemplified using a scanning LiDAR and a monocular camera.  Compared to existing approaches, RegNet casts all 3 conventional calibration steps (feature extraction, feature matching and global regression) into a single real-time capable CNN.  It does not require any human interaction and bridges the gap between classical offline and target-less online calibration approaches as it provides both a stable initial estimation as well as a continuous online correction of the extrinsic parameters.  During training, randomly decalibrate our system in order to train RegNet to infer the correspondence between projected depth measurements and RGB image and finally regress the extrinsic calibration.  Additionally, with an iterative execution of multiple CNNs, that are trained on different magnitudes of decalibration, it compares favorably to state-of-the-art methods in terms of a mean calibration error of 0.28◦ for the rotational and 6 cm for the translation components even for large decalibrations up to 1.5 m and 20◦ .
  • 7. RegNet: Multimodal Sensor Registration Using Deep Neural Networks It estimates the calibration btw a depth and an RGB sensor. The depth points are projected on the RGB image using an initial calibration Hinit. In the 1st and 2nd part of the network, use NiN blocks to extract rich features for matching. The final part regresses decalibration by gathering global info. using two FCLs. During training φdecalib is randomly permutated resulting in different projections of the depth points.
  • 8. RegNet: Multimodal Sensor Registration Using Deep Neural Networks
  • 9. Vehicle Detection from 3D Lidar Using FCN  Point clouds from a Velodyne scan can be roughly projected and discretized into a 2D point map; The projected point map analogous to cylindral images; Encode the bounding box corner of the vehicle (8 corners as 24-d); It consists of one objectness classification branch and one bounding box regression branch.
  • 10. (a) The input point map, with the d channel visualized. (b) The output confidence map of the objectness branch. (c) Bounding box candidates corresponding to all points predicted as positive, i.e. high confidence points in (b). (d) Remaining bounding boxes after non-max suppression. Vehicle Detection from 3D Lidar Using FCN
  • 11. VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection  Just remove the need of manual feature engineering for 3D point clouds and propose VoxelNet, a generic 3D detection network that unifies feature extraction and bounding box prediction into a single stage, end-to-end trainable deep network.  Specifically, VoxelNet divides a point cloud into equally spaced 3D voxels and transforms a group of points within each voxel into a unified feature representation through the voxel feature encoding (VFE) layer.  In this way, the point cloud is encoded as a descriptive volumetric representation, which is then connected to a RPN to generate detections.
  • 12. VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection
  • 13. VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection Voxel feature encoding layer.
  • 14. VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection Region proposal network architecture
  • 15. Object Detection and Classification in Occupancy Grid Maps using Deep Convolutional Networks  Based on a grid map environment representation, well-suited for sensor fusion, free-space estimation and machine learning, detect and classify objects using deep CNNs.  As input, use a multi-layer grid map efficiently encoding 3D range sensor info.  The inference output consists of a list of rotated Bboxes with associated semantic classes. Transform range sensor measurements to a multi- layer grid map which serves as input for object detection and classification network. From these top view grid maps the network infers rotated 3D bounding boxes together with semantic classes. These boxes can be projected into the camera image for visual validation. Cars are depicted green, cyclists aquamarin and pedestrians cyan.
  • 16. Object Detection and Classification in Occupancy Grid Maps using Deep Convolutional Networks  Below are minimal preprocessing to obtain occupancy grid maps.  As there are labeled objects only in the camera image, remove all points that are not in the camera’s field of view.  Apply ground surface segmentation and estimate different grid cell features, then the resulting multi-layer grid maps are of size 60m×60m and a cell size of either 10cm or 15cm.  As observed, the ground is flat in most of the scenarios, so fit a ground plane to the representing point set.  Then, use the full point set or a non-ground subset to construct a multi-layer grid map containing different features.
  • 17. Object Detection and Classification in Occupancy Grid Maps using Deep Convolutional Networks  KITTI Bird’s Eye View Evaluation 2017 consists of 7481 images for training and 7518 images for testing as well as corresponding range sensor data represented as point sets.  Training and test data contain 80,256 labeled objects in total which are represented as oriented 3D Bboxes (7 parameters).  As summarized in Table, there are 8 semantic classes labeled in the training set although not all classes are used to determine the benchmark result.
  • 18. RT3D: Real-Time 3-D Vehicle Detection in LiDAR Point Cloud for Autonomous Driving  Real-time 3-dimensional (RT3D) vehicle detection method that utilizes pure LiDAR point cloud to predict the location, orientation, and size of vehicles.  Apply pre-RoIpooling convolution that moves a majority of the convolution operations to ahead of the RoI pooling, leaving just a small part behind, so that significantly boosts the computation efficiency.  A pose-sensitive feature map design is strongly activated by the relative poses of vehicles, leading to a high regression accuracy on the location, orientation, and size of vehicles.  RT3D is the 1st LiDAR 3-D vehicle detection work that completes detection within 0.09s.
  • 19. RT3D: Real-Time 3-D Vehicle Detection in LiDAR Point Cloud for Autonomous Driving The network architecture of RT3D
  • 20. BirdNet: a 3D Object Detection Framework from LiDAR information  LiDAR- based 3D object detection pipeline entailing three stages:  First, laser info. is projected into a novel cell encoding for bird’s eye view projection.  Later, both object location on the plane and its heading are estimated through a convolutional neural network originally designed for image processing.  Finally, 3D oriented detections are computed in a post-processing phase.
  • 21. BirdNet: a 3D Object Detection Framework from LiDAR information Results on KITTI Benchmark test set: detections in image, BEV projection, and 3D point cloud.
  • 22. LMNet: Real-time Multiclass Object Detection on CPU using 3D LiDAR  An optimized single-stage deep CNN to detect objects in urban environments, using nothing more than point cloud data.  The network structure employs dilated convolutions to gradually increase the perceptive field as depth increases, this helps to reduce the computation time by about 30%.  The input consists of 5 perspective representations of the unorganized point cloud data.  The network outputs an objectness map and the bounding box offset values for each point.  Using reflection, range, and the position on each of the 3 axes helped to improve the location and orientation of the output bounding box.  Execution times is 50 FPS using desktop GPUs, and up to 10 FPS on a Intel Core i5 CPU.
  • 23. LMNet: Real-time Multiclass Object Detection on CPU using 3D LiDAR Used dilated layers The LMNet architecture Encoded input point cloud
  • 24. HDNET: Exploit HD Maps for 3D Object Detection  High-Definition (HD) maps provide strong priors that can boost the performance and robustness of modern 3D object detectors.  Here is a one stage detector to extract geometric and semantic features from the HD maps.  As maps might not be available everywhere, a map prediction module estimates the map on the fly from raw LiDAR data.  The whole framework runs at 20 frames per second.
  • 25. HDNET: Exploit HD Maps for 3D Object Detection BEV LiDAR representation that exploits geometric and semantic HD map information. (a) The raw LiDAR point cloud. (b) Incorporating geometric ground prior. (c) Discretization of the LiDAR point cloud. (d) Incorporating semantic road prior.
  • 26. HDNET: Exploit HD Maps for 3D Object Detection Network structures for object detection (left) and online map estimation (right).
  • 27. IPOD: Intensive Point-based Object Detector for Point Cloud  A 3D object detection framework, IPOD, based on raw point cloud.  It seeds object proposal for each point, which is the basic element.  An E2E trainable architecture, where features of all points within a proposal are extracted from the backbone network and achieve a proposal feature for final bounding inference.  These features with both context info. and precise point cloud coord.s improve the performance.
  • 28. IPOD: Intensive Point-based Object Detector for Point Cloud Illustration of point-based proposal generation. (a) Semantic segmentation result on the image. (b) Projected segmentation result on point cloud. (c) Point-based proposals on positive points after NMS.
  • 29. IPOD: Intensive Point-based Object Detector for Point Cloud Illustration of proposal feature generation module. It combines location info. and context feature to generate offsets from the centroid of interior points to the center of target instance object. The predicted residuals are added back to the location info. in order to make feature more robust to geometric transformation.
  • 30. IPOD: Intensive Point-based Object Detector for Point Cloud Backbone architecture. Bounding-box prediction network.
  • 31. PIXOR: Real-time 3D Object Detection from Point Clouds  This method utilizes the 3D data more efficiently by representing the scene from the Bird’s Eye View (BEV), and propose PIXOR (ORiented 3D object detection from PIXel-wise NN predictions), a proposal-free, single-stage detector that outputs oriented 3D object estimates decoded from pixel-wise neural network predictions.  The input representation, network architecture, and model optimization are specially designed to balance high accuracy and real-time efficiency. 3D object detector from Bird’s Eye View (BEV) of LIDAR point cloud.
  • 32. PIXOR: Real-time 3D Object Detection from Point Clouds The network architecture of PIXOR Use cross-entropy loss on the classification output and a smooth loss on the regression output. Sum the classification loss over all locations on the output map, while the regression loss is computed over positive locations only.
  • 33. DepthCN: Vehicle Detection Using 3D-LIDAR and ConvNet  Vehicle detection based on the Hypothesis Generation (HG) and Verification (HV) paradigms.  The data inputted to the system is a point cloud obtained from a 3D-LIDAR mounted on board an instrumented vehicle, which is transformed to a Dense- depth Map (DM).  The solution starts by removing ground points followed by point cloud segmentation.  Then, segmented obstacles (object hypotheses) are projected onto the DM.  Bboxes are fitted to the segmented objects as vehicle hypotheses (HG step).  Bboxes are used as inputs to a ConvNet to classify/verify the hypotheses of belonging to the category ‘vehicle’ (HV step).
  • 34. DepthCN: Vehicle Detection Using 3D-LIDAR and ConvNet 3D-LIDAR-based vehicle detection algorithm (DepthCN).
  • 35. DepthCN: Vehicle Detection Using 3D-LIDAR and ConvNet Top: the point cloud where the detected ground points are denoted with green and LIDAR points that are out of the field of view of the camera are shown in red. Bottom: the projected clusters and HG results in the form of 2D BB. Right: the zoomed view, and the vertical orange arrows indicate corresponding obstacles.
  • 36. DepthCN: Vehicle Detection Using 3D-LIDAR and ConvNet The generated Dense-depth Map (DM) with the projected hypotheses (red). The ConvNet architecture The generated hypotheses and the detection results are shown as red and dashed-green BBs, respectively, in both DM and images. The bottom figures show the result in PCD.
  • 37. SECOND: Sparsely Embedded Convolutional Detection  An improved sparse convolution method for such networks, which significantly increases the speed of both training and inference.  Introduce a new form of angle loss regression to improve the orientation estimation performance and a new data augmentation approach that can enhance the convergence speed and performance.  The proposed network produces SoA results on the KITTI 3D object detection benchmarks while maintaining a fast inference speed. The detector takes a raw point cloud as input, converts it to voxel features and coordinates, and applies two VFE (voxel feature encoding) layers and a linear layer. A sparse CNN is applied and an RPN generates the detection.
  • 38. SECOND: Sparsely Embedded Convolutional Detection The sparse convolution algorithm is shown above, and the GPU rule generation algorithm is shown below. Nin denotes the number of input features, and Nout denotes the number of output features. N is the number of gathered features. Rule is the rule matrix, where Rule[i, :, :] is the ith rule corresponding to the ith kernel matrix in the convolution kernel. The boxes with colors except white indicate points with sparse data and the white boxes indicate empty points.
  • 39. SECOND: Sparsely Embedded Convolutional Detection A GPU-based rule generation algorithm (Algorithm 1) that runs faster on a GPU. First, collect the input indexes and associated spatial indexes instead of the output indexes (1st loop). Duplicate output locations are obtained in this stage. Then execute a unique parallel algorithm on the spatial index data to obtain the output indexes and their associated spatial indexes. A buffer with the same spatial dimensions as those of the sparse data is generated from the previous results for table lookup in the next step (2nd loop). Finally, we iterate on the rules and use the stored spatial indexes to obtain the output index for each input index (3rd loop).
  • 40. SECOND: Sparsely Embedded Convolutional Detection The structure of sparse middle feature extractor. The yellow boxes represent sparse convolution, the white boxes represent submanifold convolution, and the red box represents the sparse-to-dense layer. The upper part of the figure shows the spatial dimensions of the sparse data. Lθ = SmoothL1(sin(θp − θt)), Introducing a new angle loss regression This approach to angle loss has two advantages: (1) it solves the adversarial problem btw orientations of 0, π; (2) it naturally models the IoU against the angle offset function. Structure of RPN downsampling convolutional layers concatenation transpose convolutional layers
  • 41. SECOND: Sparsely Embedded Convolutional Detection Results of 3D detection on the KITTI test set. For better visualization, the 3D boxes detected using LiDAR are projected onto images from the left camera.
  • 42. YOLO3D: E2E RT 3D Oriented Object Bounding Box Detection from LiDAR Point Cloud  Based on the success of the one-shot regression meta-architecture in the 2D perspective image space, extend it to generate oriented 3D object Bboxes from LiDAR point cloud.  The idea is extending the loss function of YOLO v2 to include the yaw angle, the 3D box center in Cartesian coordinates and the height of the box as a direct regression problem.  This formulation enables real-time performance, which is essential for automated driving.  In KITTI, it achieves real-time performance (40 fps) on Titan X GPU.
  • 43. YOLO3D: E2E RT 3D Oriented Object Bounding Box Detection from LiDAR Point Cloud The total loss Project the point cloud to get bird’s eye view grid map. create two grid maps from projection of point cloud. The first feature map contains the maximum height, where each grid cell (pixel) value represents the height of the highest point associated with that cell. The second grid map represent the density of points. In YOLO-v2, anchors are calculated using k-means clustering over width and length of ground truth boxes. The point behind using anchors, is to find priors for the boxes, onto which the model can predict modifications. The anchors must be able to cover the whole range of boxes that can appear in the data. Choose not to use clustering to calculate the anchors, and instead, calculate the mean 3D box dimensions for each object class, and use these average box dimensions as anchors.
  • 44. YOLO4D: A ST Approach for RT Multi-object Detection and Classification from LiDAR Point Clouds  YOLO4D: the 3D LiDAR point clouds are aggregated over time as a 4D tensor; 3D space dimensions in addition to the time dimension, which is fed to a one- shot fully convolutional detector, based on YOLO v2 architecture.  YOLO3D is extended with Convol. LSTM for temporal features aggregation.  The outputs are the oriented 3D Object BBox info., in addition to its length (L), width (W), height (H) and orientation (yaw), together with the objects classes and confidence scores.  Two different techniques are evaluated to incorporate the temporal dimension: recurrence and frame stacking.
  • 45. YOLO4D: A ST Approach for RT Multi-object Detection and Classification from LiDAR Point Clouds Left: Frame stacking architecture; Right: Convolutional LSTM architecture. The prediction model The total loss
  • 46. Deconvolutional Networks for Point-Cloud Vehicle Detection and Tracking in Driving Scenarios  A full vehicle detection and tracking system that works with 3D lidar information only.  The detection step uses a CNN that receives as input a featured representation of the 3D information provided by a Velodyne HDL-64 sensor and returns a per-point classification of whether it belongs to a vehicle or not.  The classified point cloud is then geometrically processed to generate observations for a multi- object tracking system implemented via a number of Multi-Hypothesis Extended Kalman Filters (MH-EKF) that estimate the position and velocity of the surrounding vehicles. The model is fed with an encoded representation of the point cloud and computes for 3D each point its probability of belonging to a vehicle. The classified points are then clustered generating trustworthy observations that are fed to MH-EKF based tracker.
  • 47. Deconvolutional Networks for Point-Cloud Vehicle Detection and Tracking in Driving Scenarios To obtain a useful input for the detector, project the 3D point cloud raw data to a featured image-like representation containing ranges and reflectivity info. by means of transformation G(·). Ground truth for learning the classification task is obtained by first projecting the image-based Kitti tracklets over the 3D Velodyne info., and then applying again transformation G(·) over the selected points.
  • 48. Deconvolutional Networks for Point-Cloud Vehicle Detection and Tracking in Driving Scenarios The network encompasses only conv. and deconv. Blocks followed by BN and ReLU nonlinearities. The first 3 blocks conduct the feature extraction step controlling, according to vehicle detection objective, the size of the receptive fields and the feature maps generated. The next 3 deconvolutional blocks expanse the info. enabling the point-wise classification. After each deconvolution, feature maps from the lower part of the network are concatenated (CAT) before applying the normalization and non-linearities, providing richer info. and better performance. During training, 3 losses are calculated at different network points.
  • 49. Deconvolutional Networks for Point-Cloud Vehicle Detection and Tracking in Driving Scenarios They show the raw input point cloud, the Deep detector output, the final tracked vehicles and the RGB projected bounding boxes submitted for evaluation.
  • 50. Fast and Furious: Real Time E2E 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net  A deep neural network to jointly reason about 3D detection, tracking and motion forecasting given data captured by a 3D sensor.  By jointly reasoning about these tasks, the holistic approach is more robust to occlusion as well as sparse data at range.  It performs 3D convolutions across space and time over a bird’s eye view representation of the 3D world, which is very efficient in terms of both memory and computation.  It can perform all tasks in as little as 30 ms. Overlay temporal & motion forecasting data. Green: bbox w/ 3D point. Grey: bbox w/o 3D point.
  • 51. Fast and Furious: Real Time E2E 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net The FaF work takes multiple frames as input and performs detection, tracking and motion forecasting.
  • 52. Fast and Furious: Real Time E2E 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net Modeling temporal information
  • 53. Fast and Furious: Real Time E2E 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net Motion forecasting The loss function classification loss The regression targets smooth L1
  • 54. SqueezeSeg: Conv. Neural Nets with Recurrent CRF for RT Road-Object Segmentation from 3D LiDAR Point Cloud  Semantic segmentation of road-objects from 3D LiDAR point clouds.  Detect and categorize instances of interest, such as cars, pedestrians and cyclists.  Formulate it as a pointwise classification problem, and propose an E2E pipeline called SqueezeSeg based on CNN: the CNN takes a transformed LiDAR point cloud as input and directly outputs a point-wise label map, which is then refined by a CRF as a recurrent layer.  Instance-level labels are then obtained by conventional clustering algorithms.  The CNN model is trained on LiDAR point clouds from the KITTI dataset, and point-wise segmentation labels are derived from 3D bounding boxes from KITTI.  To obtain extra training data, built a LiDAR simulator into Grand Theft Auto V (GTA-V), a popular video game, to synthesize large amounts of realistic training data. GT segmentation Predicted segmentation
  • 55. SqueezeSeg: Conv. Neural Nets with Recurrent CRF for RT Road-Object Segmentation from 3D LiDAR Point Cloud LiDAR Projections. Network structure of SqueezeSeg
  • 56. SqueezeSeg: Conv. Neural Nets with Recurrent CRF for RT Road-Object Segmentation from 3D LiDAR Point Cloud Structure of FireModule and FireDeconv Conditional Random Field (CRF) as an RNN layer https://github.com/BichenWuUCB/SqueezeSeg.
  • 57. SEGCloud: Semantic Segmentation of 3D Point Clouds  SEGCloud, an E2E framework to obtain 3D point-level segmentation that combines the advantages of NNs, trilinear interpolation(TI) and fully connected CRF (FC-CRF).  Coarse voxel predictions from a 3D Fully Convolutional NN are transferred back to the raw 3D points via trilinear interpolation.  FC-CRF enforces global consistency and provides fine-grained semantics on the points.  Implement the FC-CRF as a differentiable Recurrent NN to allow joint optimization.
  • 58. SEGCloud: Semantic Segmentation of 3D Point Clouds The 3D-FCNN is made of 3 residual layers sandwiched between 2 convolutional layers. Max Pooling in the early stages of the network yields a 4X downsampling.
  • 59. SEGCloud: Semantic Segmentation of 3D Point Clouds Trilinear interpolation of class scores from voxels to points: Each point’s score is computed as the weighted sum of the scores from its 8 spatially closest voxel centers.
  • 60. SEGCloud: Semantic Segmentation of 3D Point Clouds A 2-stage training by first optimizing over the point-level unary potentials (no CRF) and then over the joint framework for point-level fine-grained labeling.
  • 61.  Multi-View 3D networks (MV3D), a sensory-fusion framework that takes both LIDAR point cloud and RGB images as input and predicts oriented 3D b boxes. Composed of 2 subnetworks: one for 3D object proposal generation, one for multi-view feature fusion. The proposal network generates 3D candidate boxes from bird’s eye view representation of 3D point cloud. A deep fusion scheme to combine region-wise features from multiple views and enable interactions btw intermediate layers of different paths. Multi-View 3D Object Detection Network for Autonomous Driving
  • 62. Multi-View 3D Object Detection Network for Autonomous Driving
  • 63. Input features of the MV3D network. Multi-View 3D Object Detection Network for Autonomous Driving
  • 64. Training strategy for the Region- based Fusion Network: During training, the bottom 3 paths and losses are added to regularize the network. The auxiliary layers share weights with the corresponding layers in the main network. Multi-View 3D Object Detection Network for Autonomous Driving
  • 65. A General Pipeline for 3D Detection of Vehicles  A pipeline to adopt 2D detection net and fuse it with a 3D point cloud to generate 3D info.  To identify the 3D box, model fitting based on generalised car models and score maps.  A two-stage CNN is proposed to refine the detected 3D box. General fusion pipeline. All of the point clouds viewed from the top (bird’s eye view). The height is encoded by color, with red being the ground. A subset of points is selected based on the 2D detection. A model fitting algorithm based on the generalised car models and score maps is applied to find the car points in the subset and a two-stage refinement CNN is designed to fine tune the detected 3D box and re-assign an objectiveness score to it.
  • 66. A General Pipeline for 3D Detection of Vehicles Generalised car models Score map (scores are indicated at bottom.) Qualitative result illustration on KITTI data (top) and Boston data (bottom). Blue boxes are the 3D detection results
  • 67. Combining LiDAR Space Clustering and Convolutional Neural Networks for Pedestrian Detection  In purely image- based pedestrian detection approaches, the SoA results have been achieved with CNN and surprisingly few detection frameworks have been built upon multi-cue approaches.  This is a pedestrian detector for autonomous vehicles that exploits LiDAR data, in addition to visual info.  LiDAR data is utilized to generate region proposals by processing the 3-d point cloud that it provides.  These candidate regions are then further processed by a SoA CNN classifier that was fine-tuned for pedestrian detection.
  • 68. Combining LiDAR Space Clustering and Convolutional Neural Networks for Pedestrian Detection (a) Cluster proposal (b) Size and ratio corrections
  • 69. Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving  Taking the inner workings of CNNs into consideration, convert image- based depth maps to pseudo-LiDAR representations.  With this representation, apply different existing LiDAR-based detection algorithms.  On the popular KITTI benchmark, it raises the detection accuracy of objects within 30m range from the previous SoA of 22% to an unprecedented 74%. Pseudo-LiDAR signal from visual depth estimation.
  • 70. Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving The two-step pipeline for image-based 3D object detection. Given stereo or monocular images, first predict the depth map, followed by transforming it into a 3D point cloud in the LiDAR coordinate system. Call this representation as pseudo-LiDAR, and process it exactly like LiDAR — any LiDAR-based 3D objection algorithms thus can be applied.
  • 71. Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving Apply a single 2D convolution with a uniform kernel to the frontal view depth map (top-left). The resulting depth map (top-right), after projected into the bird’s- eye view (bottom-right), reveals a large depth distortion in comparison to the original pseudo-LiDAR view (bottom-left), especially for far-away objects. The boxes are super-imposed and contain all points of the green and yellow cars respectively.
  • 72. Fusing Bird’s Eye View LIDAR Point Cloud and Front View Camera Image for Deep Object Detection  A method for fusing LIDAR point cloud and camera-captured images in deep CNN.  The method constructs a layer called sparse non-homogeneous pooling layer to transform features between bird’s eye view and front view.  The sparse point cloud is used to construct the mapping between the two views.  The pooling layer allows fusion of multi-view features at any stage of the network.  This is favorable for 3D object detection using camera-LIDAR fusion for autonomous driving.  A corresponding one-stage detector is designed and tested, which produces 3D Bboxes from the bird’s eye view map.
  • 73. Fusing Bird’s Eye View LIDAR Point Cloud and Front View Camera Image for Deep Object Detection The vanilla fusion-based one-stage object detection network The sparse non-homogeneous pooling layer that fuses front view image and bird’s eye view LIDAR feature.
  • 74. Fusing Bird’s Eye View LIDAR Point Cloud and Front View Camera Image for Deep Object Detection (a)From camera to bird’s eye. (b)From bird’s eye to camera. (c)From front view conv4 layer to bird’s eye conv4 layer. (d)From bird’s eye conv4 to bird’s eye conv4.
  • 75. Fusing Bird’s Eye View LIDAR Point Cloud and Front View Camera Image for Deep Object Detection The fusion-based one-stage object detection network with SOA single-sensor networks.
  • 76. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation Applications of PointNet. PointNet is a deep net architecture that consumes raw point cloud (set of points) without voxelization or rendering. It is a unified architecture that learns both global and local point features, providing a simple, efficient and effective approach for a number of 3D recognition tasks.
  • 77. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation PointNet Architecture. The classification network takes n points as input, applies input and feature transformations, and then aggregates point features by max pooling. The output is classification scores for k classes. The segmentation network is an extension to the classification net. It concatenates global and local features and outputs per point scores.
  • 78. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space  PointNet does not capture local structures induced by the metric space points live in, limiting its ability to recognize fine-grained patterns and generalizability to complex scenes.  The network called PointNet++ is able to learn deep point set features efficiently and robustly.  This is a hierarchical NN that applies PointNet recursively on a nested partitioning of the input point set.  By exploiting metric space distances, the network is able to learn local features with increasing contextual scales.  With further observation that point sets are usually sampled with varying densities, which results in greatly decreased performance for networks trained on uniform densities, a set learning layers is able to adaptively combine features from multiple scales.
  • 79. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space
  • 80. PointFusion: Deep Sensor Fusion for 3D Bounding Box Estimation  PointFusion, a generic 3D object detection method that leverages both image and 3D point cloud information.  The image data and the raw point cloud data are independently processed by a CNN and a PointNet architecture, respectively.  The resulting outputs are then combined by a novel fusion network, which predicts multiple 3D box hypotheses and their confidences, using the input 3D points as spatial anchors. Sample 3D object detection results of PointFusion model on the KITTI dataset (left) and the SUN-RGBD dataset (right).
  • 81. PointFusion: Deep Sensor Fusion for 3D Bounding Box Estimation A PointNet variant that processes raw point cloud data (A), and a CNN that extracts visual features from an input image (B). A vanilla global architecture that directly regresses the box corner locations (D), and a dense architecture that predicts the spatial offset of each of the 8 corners relative to an input point (C): for each input point, the network predicts the spatial offset (white arrows) from a corner (red dot) to the input point (blue), and selects the prediction with the highest score as the final prediction (E).
  • 82. Frustum PointNets for 3D Object Detection from RGB-D Data  A 3D object detection solution from RGB-D data in both indoor and outdoor scenes.  Previous methods focus on images or 3D voxels, often obscuring natural 3D patterns and invariances of 3D data, this operate on raw point clouds by popping up RGB-D scans.  A challenge is how to efficiently localize objects in point clouds of large-scale scenes (region proposal).  Instead of solely relying on 3D proposals, it leverages both mature 2D object detectors and advanced 3D deep learning for object localization, achieving efficiency as well as high recall.  Benefited from learning directly in raw point clouds, it is also able to precisely estimate 3D Bboxes even under strong occlusion or with very sparse points.
  • 83. Frustum PointNets for 3D Object Detection from RGB-D Data 3D object detection pipeline. Given RGB-D data, first generate 2D object region proposals in the RGB image using a CNN. Each 2D region is then extruded to a 3D viewing frustum in which to get a point cloud from depth data. Finally, the frustum PointNet predicts a (oriented and amodal) 3D bounding box for the object from the points in frustum.
  • 84. Frustum PointNets for 3D Object Detection from RGB-D Data Frustum PointNets for 3D object detection. First leverage a 2D CNN object detector to propose 2D regions and classify their content. 2D regions are then lifted to 3D and thus become frustum proposals. Given a point cloud in a frustum (n × c with n points and c channels of XYZ, intensity etc. for each point), the object instance is segmented by binary classification of each point. Based on the segmented object point cloud (m × c), a light-weight regression PointNet (T-Net) tries to align points by translation such that their centroid is close to amodal box center. At last the box estimation net estimates the amodal 3D bounding box for the object.
  • 85. Frustum PointNets for 3D Object Detection from RGB-D Data Coordinate systems for point cloud. (a) default camera coordinate; (b) frustum coordinate after rotating frustums to center view; (c) mask coordinate with object points’ centroid at origin; (d) object coordinate predicted by T-Net. Basic architectures and IO for PointNets. Architecture is illustrated for PointNet++ (v2) models with set abstraction layers and feature propagation layers (for segmentation).
  • 86. Frustum PointNets for 3D Object Detection from RGB-D Data Visualizations of Frustum PointNet results on KITTI val set.
  • 87. RoarNet: A Robust 3D Object Detection based on RegiOn Approximation Refinement  RoarNet for 3D object detection from 2D image and 3D Lidar point clouds.  Based on two stage object detection framework with PointNet as backbone network, several ideas to improve 3D object detection performance.  The first part, estimates the 3D poses of objects from a monocular image, which approximates where to examine further, and derives multiple candidates that are geometrically feasible.  This step significantly narrows down feasible 3D regions, which otherwise requires demanding processing of 3D point clouds in a huge search space.  The second part, takes the candidate regions and conducts in-depth inferences to conclude final poses in a recursive manner.  Inspired by PointNet, RoarNet processes 3D point clouds directly, leading to precise detection.  RoarNet is implemented in Tensorflow and publicly available with pretrained models.
  • 88. RoarNet: A Robust 3D Object Detection based on RegiOn Approximation Refinement Detection pipeline of RoarNet. The model (a) predicts region proposals in 3D space using geometric agreement search, (b) predicts objectness in each region proposal, (c) predicts 3D bounding boxes, (d) calculates IoU (Intersection over Union) between 2D detection and 3D detection.
  • 89. RoarNet: A Robust 3D Object Detection based on RegiOn Approximation Refinement Architecture of RoarNet
  • 90. RoarNet: A Robust 3D Object Detection based on RegiOn Approximation Refinement (a) Previous Architecture (b) RoarNet 2D Architecture
  • 91. RoarNet: A Robust 3D Object Detection based on RegiOn Approximation Refinement RoarNet 2D. An unified architecture detects 2D bounding boxes and 3D poses illustrated in (a) and (b), respectively. For each object, two extreme cases are shown as non-filled boxes, and final equally-spaced candidate locations as colored dots in (b). All calculations are derived in 3D space despite bird’s eye view (i.e., X-Z plane) visualization.
  • 92. RoarNet: A Robust 3D Object Detection based on RegiOn Approximation Refinement A detection pipeline of several network architectures
  • 93. Joint 3D Proposal Generation and Object Detection from View Aggregation  AVOD, an Aggregate View Object Detection network for autonomous driving scenarios.  The network uses LIDAR point clouds and RGB images to generate features shared by two subnetworks: a region proposal network (RPN) and a second stage detector network.  The RPN is capable of performing multimodal feature fusion on high resolution feature maps to generate reliable 3D object proposals for multiple object classes in road scenes.  Using these proposals, the second stage detection network performs accurate oriented 3D bounding box regression and category classification to predict the extents, orientation, and classification of objects in 3D space.  Source code is at: https://github.com/kujason/avod. A visual representation of the 3D detection problem from Bird’s Eye View (BEV). The Bbox in green is used to determine the IoU overlap in the computation of the average precision. The importance of explicit orientation estimation can be seen as an object’s Bbox does not change when the orientation (purple) is shifted by ±π radians.
  • 94. Joint 3D Proposal Generation and Object Detection from View Aggregation The method’s architectural diagram. The feature extractors are shown in blue, the region proposal network in pink, and the second stage detection network in green.
  • 95. Joint 3D Proposal Generation and Object Detection from View Aggregation The architecture of high resolution feature extractor for the image branch. Feature maps are propagated from the encoder to the decoder section via red arrows. Fusion is then performed at every stage of the decoder by a learned upsampling layer, followed by concatenation, and then mixing via a convolutional layer, resulting in a full resolution feature map at the last layer of the decoder.
  • 96. Joint 3D Proposal Generation and Object Detection from View Aggregation Qualitative results of AVOD for cars (top) and pedestrians/cyclists (bottom). Left: 3D RPN output, Middle: 3D detection output, and Right: the projection of the detection output onto image space for all three classes.
  • 97. SPLATNet: Sparse Lattice Networks for Point Cloud Processing  A network architecture for processing point clouds that directly operates on a collection of points represented as a sparse set of samples in a high-dimensional lattice.  The network uses sparse bilateral convolutional layers as building blocks, and these layers maintain efficiency by using indexing structures to apply convolutions only on occupied parts of the lattice, so allow flexible specifications of the lattice structure enabling hierarchical and spatially-aware feature learning, as well as joint 2D-3D reasoning.  Both point-based and image-based representations can be easily incorporated in a network with such layers and the resulting model can be trained in an E2E manner. From point clouds and images to semantics. SPLATNet3D directly takes point cloud as input and predicts labels for each point. SPLATNet2D-3D, on the other hand, jointly processes both point cloud and the corresponding multi- view images for better 2D and 3D predictions.
  • 98. SPLATNet: Sparse Lattice Networks for Point Cloud Processing Bilateral Convolution Layer (BCL). Splat: BCL first interpolates input features F onto a dl- dimensional permutohedral lattice defined by the lattice features L at input points. Convolve: BCL then does dl-dimensional convolution over this sparsely populated lattice. Slice: The filtered signal is then interpolated back onto the input signal. • The input points to BCL need not be ordered or lie on a grid as they are projected onto a dl- dimensional grid defined by lattice features Lin. • The input and output points can be different for BCL with the specification of different input and output lattice features Lin and Lout. • Since BCL allows separate specifications of input and lattice features, input signals can be projected into a different dimensional space for filtering. • Just like in standard spatial convolutions, BCL allows an easy specification of filter neighborhood. • Since a signal is usually sparse in high- dimension, BCL uses hash tables to index the populated vertices and does convolutions only at those locations.
  • 99. SPLATNet: Sparse Lattice Networks for Point Cloud Processing SPLATNet. Illustration of inputs, outputs and network architectures for SPLATNet3D and SPLATNet2D-3D.
  • 100. SPLATNet: Sparse Lattice Networks for Point Cloud Processing 2D to 3D projection. Using splat and slice using splat and slice operations. Given input features of 2D images, pixels are projected onto a 3D permutohedral lattice defined by 3D positional lattice features. The splatted signal is then sliced onto the points of interest in a 3D point cloud. Facade point cloud labeling. Sample visual results of SPLATNet3D and SPLATNet2D-3D.
  • 101. PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud  PointRCNN is a deep NN method for 3D object detection from raw point cloud.  The whole framework is composed of two stages:  stage-1 for the bottom-up 3D proposal generation;  stage-2 for refining proposals in the canonical coord.s to obtain the detection results.  Instead of generating proposals from RGB image or projecting point cloud to bird’s view or voxels, this stage-1 sub-network directly generates a small number of high-quality 3D proposals from point cloud in a bottom-up manner via segmenting the point cloud of whole scene into FG points and BG.  The stage-2 sub-network transforms the pooled points of each proposal to canonical coord.s to learn local spatial features, which is combined with global semantic features of each point learned in stage-1 for accurate box refinement and confidence prediction.
  • 102. PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud Instead of generating proposals from fused feature maps of bird’s view and front view, or RGB images, this method directly generates 3D proposals from raw point cloud in a bottom-up manner. C: PointRCNN
  • 103. PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud The PointRCNN architecture. The whole network consists of two parts: (a) for generating 3D proposals from raw point cloud in a bottom-up manner. (b) for refining the 3D proposals in canonical coordinate.
  • 104. PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud Bin-based localization. The surrounding area along X and Z axes of each foreground point is split into a series of bins to locate the object center. Canonical transformation. The pooled points belonged to each proposal are transformed to the corresponding canonical coordinate system for better local spatial feature learning, where CCS denotes Canonical Coordinate System.
  • 105. PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud The upper is the image and the lower is a representative view of the corresponding point cloud.
  • 106. Deep Continuous Fusion for Multi-Sensor 3D Object Detection  A 3D object detector exploits both LIDAR and cameras to perform very accurate localization.  Design an E2E learnable architecture that exploits continuous convolutions to fuse image and LIDAR feature maps at different levels of resolution.  The continuous fusion layer encode both discrete-state image features and continuous geometric info.  Deep parametric continuous convolution is a learnable operator that operates over non-grid- structured data.  The motivation behind is to extend the standard grid-structured convolution to non-grid-structured data, while retaining high capacity and low complexity.  The key idea is to exploit multi-layer perceptrons as parameterized kernel functions for continuous convolution.  This parametric kernel function spans the full continuous domain.  The weighted summation over finite number of neighboring points is used to approximate the otherwise computationally prohibitive continuous convolution.  Each neighbor is weighted differently according to its relative geometric offset wrt the target point.  This enables a reliable and efficient E2E learnable 3D object detector based on multiple sensors.
  • 107. Deep Continuous Fusion for Multi-Sensor 3D Object Detection Continuous fusion layer: given a target pixel on BEV image, extract K nearest LIDAR points (S1); project the 3D points onto the camera image plane (S2-3); this helps retrieve corresponding image features (S4); feed the image feature + continuous geometry offset into a MLP to generate feature for the target pixel (S5).
  • 108. Deep Continuous Fusion for Multi-Sensor 3D Object Detection Qualitative results on KITTI Dataset.
  • 109. End-to-end Learning of Multi-sensor 3D Tracking by Detection  An approach of tracking by detection that can exploit both cameras as well as LIDAR data to produce very accurate 3D trajectories.  Towards this goal, formulate it as a linear program that can be solved exactly, and learn convolutional networks for detection as well as matching in an end-to-end manner. The system takes as external input a time series of RGB Frames and LIDAR point clouds. From these inputs, the system produces discrete trajectories of the targets. In particular, an architecture that is e2e trainable while still maintaining explainability, is achieved by formulating the system in a structured manner.
  • 110. End-to-end Learning of Multi-sensor 3D Tracking by Detection Forward passes over a set of detections from two frames for both scoring and matching. For each detection xj, a forward pass of a Detection Network is computed to produce θdet W(xj), the cost of using or discarding xj according to the assignment to ydet j. For each pair of detections xj and xi from subsequent frames, a forward pass of the Match Network is computed to produce θlink W(xi,xj), the cost of linking or not these two detections according to the assignment to ylink i,j. Finally, each detection might start a new trajectory or end an existing one, the costs for this are computed via θnew W(x) and θend W(x), respectively, and are associated with the assignments to ynew and yend. Formulate the problem as inference in a deep structured model (DSM), where the factors are computed using a set of feed forward neural nets that exploit both camera and LIDAR data to compute both detection and matching scores. Inference in the model can be done exactly by a set of feed forward processes followed by solving a linear program. Learning is done e2e via minimization of a structured hinge loss, optimizing simultaneously the detector and tracker.
  • 111. End-to-end Learning of Multi-sensor 3D Tracking by Detection
  • 112. End-to-end Learning of Multi-sensor 3D Tracking by Detection