SlideShare ist ein Scribd-Unternehmen logo
1 von 50
Downloaden Sie, um offline zu lesen
BEV’S OBJECT
DETECTION AND
PREDICTION
Yu Huang
Sunnyvale, California
Yu.huang07@gmail.com
OUTLINE
• DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries
• BEVDet: High-Performance Multi-Camera 3D Object Detection in BEV
• BEVDet4D: Exploit Temporal Cues in Multi-camera 3D Object Detection
• PETR: Position Embedding Transformation for Multi-View 3D Object Detection
• FIERY: Future Instance Prediction in Bird’s-Eye View from Surround
Monocular Cameras
• BEVDepth: Acquisition of Reliable Depth for Multi-view 3D Object Detection
• PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images
• ST-P3: E2E Vision-based Autonomous Driving via S-T Feature Learning
DETR3D: 3D OBJECT DETECTION FROM MULTI-
VIEW IMAGES VIA 3D-TO-2D QUERIES
• This method manipulates predictions directly in 3D space, which architecture extracts 2D features
from multiple camera images and then uses a sparse set of 3D object queries to index into these 2D
features, linking 3D positions to multi-view images using camera transformation matrices.
• Finally, the model makes a bounding box prediction per object query, using a set-to-set loss to
measure the discrepancy between the ground-truth and the prediction.
• This top-down approach outperforms its bottom-up counterpart in which object bounding box
prediction follows per-pixel depth estimation, since it does not suffer from the compounding error
introduced by a depth prediction model.
• Moreover, it does not require post-processing such as non-maximum suppression, dramatically
improving inference speed.
DETR3D: 3D OBJECT DETECTION FROM MULTI-
VIEW IMAGES VIA 3D-TO-2D QUERIES
DETR3D: 3D OBJECT DETECTION FROM MULTI-
VIEW IMAGES VIA 3D-TO-2D QUERIES
DETR3D: 3D OBJECT DETECTION FROM MULTI-
VIEW IMAGES VIA 3D-TO-2D QUERIES
BEVDET: HIGH-PERFORMANCE MULTI-CAMERA 3D
OBJECT DETECTION IN BIRD-EYE-VIEW
• BEVDet is developed by following the principle of detecting the 3D objects in Bird-Eye-View (BEV), where
route planning can be handily performed.
• In this paradigm, four kinds of modules are conducted in succession with different roles: an image-view encoder
for encoding feature in image view, a view transformer for feature transformation from image view to BEV, a
BEV encoder for further encoding feature in BEV, and a task-specific head for predicting the targets in BEV.
• reuse the existing modules for constructing BEVDet and make it feasible for multi-camera 3D object detection
by constructing an exclusive data augmentation strategy.
• The proposed paradigm works well in multi-camera 3D object detection and offers a good trade-off between
computing budget and performance.
• BEVDet with 704×256 (1/8 of the competitors) image size scores 29.4% mAP and 38.4% NDS on the nuScenes
val set, which is comparable with FCOS3D (i.e., 2008.2 GFLOPs, 1.7 FPS, 29.5% mAP, and 37.2% NDS), while
requires just 12% computing budget of 239.4 GFLOPs and runs 4.3 times faster.
BEVDET: HIGH-PERFORMANCE MULTI-CAMERA 3D
OBJECT DETECTION IN BIRD-EYE-VIEW
BEVDET: HIGH-PERFORMANCE MULTI-CAMERA 3D
OBJECT DETECTION IN BIRD-EYE-VIEW
BEVDET4D: EXPLOIT TEMPORAL CUES IN MULTI-
CAMERA 3D OBJECT DETECTION
• For fundamentally pushing the performance boundary in this area, BEVDet4D is proposed to lift the
scalable BEVDet paradigm from the spatial-only 3D space to the spatial-temporal 4D space.
• It upgrades the framework with a few modifications just for fusing the feature from the previous
frame with the corresponding one in the current frame.
• In this way, with negligible extra computing budget, enable the algorithm to access the temporal cues
by querying and comparing the two candidate features.
• Beyond this, also simplify the velocity learning task by removing the factors of ego-motion and time,
which equips BEVDet4D with robust generalization performance and reduces the velocity error by
52.8%.
BEVDET4D: EXPLOIT TEMPORAL CUES IN MULTI-
CAMERA 3D OBJECT DETECTION
BEVDET4D: EXPLOIT TEMPORAL CUES IN MULTI-
CAMERA 3D OBJECT DETECTION
BEVDET4D: EXPLOIT TEMPORAL CUES IN MULTI-
CAMERA 3D OBJECT DETECTION
BEVDET4D: EXPLOIT TEMPORAL CUES IN MULTI-
CAMERA 3D OBJECT DETECTION
PETR: POSITION EMBEDDING TRANSFORMATION
FOR MULTI-VIEW 3D OBJECT DETECTION
• In this paper, develop position embedding transformation (PETR) for multi-view 3D object
detection.
• PETR encodes the position information of 3D coordinates into image features, producing the
3D position-aware features.
• Object query can perceive the 3D position- aware features and perform end-to-end object
detection.
• PETR achieves state-of-the-art performance (50.4% NDS and 44.1% mAP) on standard
nuScenes dataset and ranks 1st place on the benchmark.
PETR: POSITION EMBEDDING TRANSFORMATION
FOR MULTI-VIEW 3D OBJECT DETECTION
(a) In DETR, the object queries interact with 2D features to perform 2D detection. (b) DETR3D
repeatedly projects the generated 3D reference points into image plane and samples the 2D features
to interact with object queries in decoder. (c) PETR generates the 3D position-aware features by
encoding the 3D position embedding into 2D image features. The object queries directly interact with
3D position- aware features and output 3D detection results.
PETR: POSITION EMBEDDING TRANSFORMATION
FOR MULTI-VIEW 3D OBJECT DETECTION
PETR: POSITION EMBEDDING TRANSFORMATION
FOR MULTI-VIEW 3D OBJECT DETECTION
3D Position Encoder
PETR: POSITION EMBEDDING TRANSFORMATION
FOR MULTI-VIEW 3D OBJECT DETECTION
PETR: POSITION EMBEDDING TRANSFORMATION
FOR MULTI-VIEW 3D OBJECT DETECTION
PETR: POSITION EMBEDDING TRANSFORMATION
FOR MULTI-VIEW 3D OBJECT DETECTION
FIERY: FUTURE INSTANCE PREDICTION IN BIRD’S-EYE
VIEW FROM SURROUND MONOCULAR CAMERAS
• Driving requires interacting with road agents and predicting their future behaviour in order to
navigate safely.
• FIERY: a probabilistic future prediction model in bird’s-eye view from monocular cameras.
• The model predicts future instance segmentation and motion of dynamic agents that can be
transformed into non-parametric future trajectories.
• The approach combines the perception, sensor fusion and prediction components of a traditional
autonomous driving stack by estimating bird’s-eye-view prediction directly from surround RGB
monocular camera inputs.
• FIERY learns to model the inherent stochastic nature of the future solely from camera driving data
in an end-to- end manner, without relying on HD maps, and predicts multimodal future trajectories.
• The code and trained models are available at https://github.com/wayveai/fiery.
FIERY: FUTURE INSTANCE PREDICTION IN BIRD’S-EYE
VIEW FROM SURROUND MONOCULAR CAMERAS
FIERY: FUTURE INSTANCE PREDICTION IN BIRD’S-EYE
VIEW FROM SURROUND MONOCULAR CAMERAS
FIERY: FUTURE INSTANCE PREDICTION IN BIRD’S-EYE
VIEW FROM SURROUND MONOCULAR CAMERAS
FIERY: FUTURE INSTANCE PREDICTION IN BIRD’S-EYE
VIEW FROM SURROUND MONOCULAR CAMERAS
FIERY: FUTURE INSTANCE PREDICTION IN BIRD’S-EYE
VIEW FROM SURROUND MONOCULAR CAMERAS
FIERY: FUTURE INSTANCE PREDICTION IN BIRD’S-EYE
VIEW FROM SURROUND MONOCULAR CAMERAS
BEVDEPTH: ACQUISITION OF RELIABLE DEPTH
FOR MULTI-VIEW 3D OBJECT DETECTION
• In this research, a new 3D object detector with a trustworthy depth estimation, dubbed BEVDepth,
for camera-based Bird’s-Eye-View (BEV) 3D object detection.
• the depth estimation is implicitly learned without camera information, making it the de-facto fake-
depth for creating the following pseudo point cloud.
• BEVDepth gets explicit depth supervision utilizing encoded intrinsic and extrinsic parameters.
• A depth correction sub-network is further introduced to counteract projecting-induced disturbances in
depth ground truth.
• To reduce the speed bottleneck while projecting features from image-view into BEV using estimated
depth, a quick view-transform operation is also proposed.
• Besides, BEVDepth can be easily extended with input from multi-frame.
BEVDEPTH: ACQUISITION OF RELIABLE DEPTH
FOR MULTI-VIEW 3D OBJECT DETECTION
BEVDEPTH: ACQUISITION OF RELIABLE DEPTH
FOR MULTI-VIEW 3D OBJECT DETECTION
BEVDEPTH: ACQUISITION OF RELIABLE DEPTH
FOR MULTI-VIEW 3D OBJECT DETECTION
BEVDEPTH: ACQUISITION OF RELIABLE DEPTH
FOR MULTI-VIEW 3D OBJECT DETECTION
BEVDEPTH: ACQUISITION OF RELIABLE DEPTH
FOR MULTI-VIEW 3D OBJECT DETECTION
BEVDEPTH: ACQUISITION OF RELIABLE DEPTH
FOR MULTI-VIEW 3D OBJECT DETECTION
PETRV2: A UNIFIED FRAMEWORK FOR 3D
PERCEPTION FROM MULTI-CAMERA IMAGES
• PETRv2, a unified framework for 3D perception from multi-view images.
• Based on PETR, PETRv2 explores the effectiveness of temporal modeling, which utilizes the temporal
information of previous frames to boost 3D object detection.
• More specifically, extend the 3D position embedding (3D PE) in PETR for temporal modeling.
• The 3D PE achieves the temporal alignment on object position of different frames.
• A feature-guided position encoder is further introduced to improve the data adaptability of 3D PE.
• To support for high-quality BEV segmentation, PETRv2 provides a simply yet effective solution by adding a set
of segmentation queries.
• Each segmentation query is responsible for segmenting one specific patch of BEV map.
• Code is available at https://github.com/megvii-research/PETR.
PETRV2: A UNIFIED FRAMEWORK FOR 3D
PERCEPTION FROM MULTI-CAMERA IMAGES
PETRV2: A UNIFIED FRAMEWORK FOR 3D
PERCEPTION FROM MULTI-CAMERA IMAGES
coordinate system transformation feature-guided position encoder
PETRV2: A UNIFIED FRAMEWORK FOR 3D
PERCEPTION FROM MULTI-CAMERA IMAGES
PETRV2: A UNIFIED FRAMEWORK FOR 3D
PERCEPTION FROM MULTI-CAMERA IMAGES
ST-P3: END-TO-END VISION-BASED AUTONOMOUS
DRIVING VIA SPATIAL-TEMPORAL FEATURE LEARNING
• While there are some pineering works on LiDAR-based input or implicit design, this paper formulates
the problem in an interpretable vision-based setting.
• In particular, propose a spatial-temporal feature learning scheme towards a set of more representative
features for perception, prediction and planning tasks simultaneously, which is called ST-P3.
• Specifically, an egocentric-aligned accumulation technique is proposed to preserve geometry
information in 3D space before the bird’s eye view transformation for perception; a dual pathway
modeling is devised to take past motion variations into account for future prediction; a temporal-based
refinement unit is introduced to compensate for recognizing vision-based elements for planning.
• Source code available at https://github.com/OpenPerceptionX/ST-P3.
ST-P3: END-TO-END VISION-BASED AUTONOMOUS
DRIVING VIA SPATIAL-TEMPORAL FEATURE LEARNING
ST-P3: END-TO-END VISION-BASED AUTONOMOUS
DRIVING VIA SPATIAL-TEMPORAL FEATURE LEARNING
Egocentric aligned accumulation for Perception
ST-P3: END-TO-END VISION-BASED AUTONOMOUS
DRIVING VIA SPATIAL-TEMPORAL FEATURE LEARNING
Dual pathway modelling for Prediction.
ST-P3: END-TO-END VISION-BASED AUTONOMOUS
DRIVING VIA SPATIAL-TEMPORAL FEATURE LEARNING
Prior knowledge integration and refinement for Planning.
ST-P3: END-TO-END VISION-BASED AUTONOMOUS
DRIVING VIA SPATIAL-TEMPORAL FEATURE LEARNING
ST-P3: END-TO-END VISION-BASED AUTONOMOUS
DRIVING VIA SPATIAL-TEMPORAL FEATURE LEARNING
ST-P3: END-TO-END VISION-BASED AUTONOMOUS
DRIVING VIA SPATIAL-TEMPORAL FEATURE LEARNING
ST-P3: END-TO-END VISION-BASED AUTONOMOUS
DRIVING VIA SPATIAL-TEMPORAL FEATURE LEARNING
BEV Object Detection and Prediction

Weitere ähnliche Inhalte

Was ist angesagt?

Feature Matching using SIFT algorithm
Feature Matching using SIFT algorithmFeature Matching using SIFT algorithm
Feature Matching using SIFT algorithmSajid Pareeth
 
Build, Train & Deploy Machine Learning Models at Scale
Build, Train & Deploy Machine Learning Models at ScaleBuild, Train & Deploy Machine Learning Models at Scale
Build, Train & Deploy Machine Learning Models at ScaleAmazon Web Services
 
2022 COMP 4010 Lecture 7: Introduction to VR
2022 COMP 4010 Lecture 7: Introduction to VR2022 COMP 4010 Lecture 7: Introduction to VR
2022 COMP 4010 Lecture 7: Introduction to VRMark Billinghurst
 
Image Classification And Support Vector Machine
Image Classification And Support Vector MachineImage Classification And Support Vector Machine
Image Classification And Support Vector MachineShao-Chuan Wang
 
Introduction To Virtual Reality
Introduction To Virtual RealityIntroduction To Virtual Reality
Introduction To Virtual RealityRashmi Bhat
 
Scenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingScenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingYu Huang
 
Pose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learningPose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learningYu Huang
 
Landscape of AI/ML in 2023
Landscape of AI/ML in 2023Landscape of AI/ML in 2023
Landscape of AI/ML in 2023HyunJoon Jung
 
Computer Vision for autonomous driving
Computer Vision for autonomous drivingComputer Vision for autonomous driving
Computer Vision for autonomous drivingBill Liu
 
Neural Radiance Field
Neural Radiance FieldNeural Radiance Field
Neural Radiance FieldDong Heon Cho
 
Transfer Learning and Fine-tuning Deep Neural Networks
 Transfer Learning and Fine-tuning Deep Neural Networks Transfer Learning and Fine-tuning Deep Neural Networks
Transfer Learning and Fine-tuning Deep Neural NetworksPyData
 
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation..."Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...Edge AI and Vision Alliance
 
PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View SynthesisPR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View SynthesisHyeongmin Lee
 
Yolos you only look one sequence
Yolos you only look one sequenceYolos you only look one sequence
Yolos you only look one sequencetaeseon ryu
 
Single Image Super Resolution Overview
Single Image Super Resolution OverviewSingle Image Super Resolution Overview
Single Image Super Resolution OverviewLEE HOSEONG
 
Depth Fusion from RGB and Depth Sensors II
Depth Fusion from RGB and Depth Sensors IIDepth Fusion from RGB and Depth Sensors II
Depth Fusion from RGB and Depth Sensors IIYu Huang
 

Was ist angesagt? (20)

Feature Matching using SIFT algorithm
Feature Matching using SIFT algorithmFeature Matching using SIFT algorithm
Feature Matching using SIFT algorithm
 
Image Formation
Image FormationImage Formation
Image Formation
 
2013 Lecture3: AR Tracking
2013 Lecture3: AR Tracking 2013 Lecture3: AR Tracking
2013 Lecture3: AR Tracking
 
Build, Train & Deploy Machine Learning Models at Scale
Build, Train & Deploy Machine Learning Models at ScaleBuild, Train & Deploy Machine Learning Models at Scale
Build, Train & Deploy Machine Learning Models at Scale
 
2022 COMP 4010 Lecture 7: Introduction to VR
2022 COMP 4010 Lecture 7: Introduction to VR2022 COMP 4010 Lecture 7: Introduction to VR
2022 COMP 4010 Lecture 7: Introduction to VR
 
Image Classification And Support Vector Machine
Image Classification And Support Vector MachineImage Classification And Support Vector Machine
Image Classification And Support Vector Machine
 
Introduction To Virtual Reality
Introduction To Virtual RealityIntroduction To Virtual Reality
Introduction To Virtual Reality
 
Scenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingScenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous Driving
 
Pose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learningPose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learning
 
Landscape of AI/ML in 2023
Landscape of AI/ML in 2023Landscape of AI/ML in 2023
Landscape of AI/ML in 2023
 
Computer Vision for autonomous driving
Computer Vision for autonomous drivingComputer Vision for autonomous driving
Computer Vision for autonomous driving
 
Image Stitching for Panorama View
Image Stitching for Panorama ViewImage Stitching for Panorama View
Image Stitching for Panorama View
 
Hog and sift
Hog and siftHog and sift
Hog and sift
 
Neural Radiance Field
Neural Radiance FieldNeural Radiance Field
Neural Radiance Field
 
Transfer Learning and Fine-tuning Deep Neural Networks
 Transfer Learning and Fine-tuning Deep Neural Networks Transfer Learning and Fine-tuning Deep Neural Networks
Transfer Learning and Fine-tuning Deep Neural Networks
 
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation..."Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
 
PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View SynthesisPR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
 
Yolos you only look one sequence
Yolos you only look one sequenceYolos you only look one sequence
Yolos you only look one sequence
 
Single Image Super Resolution Overview
Single Image Super Resolution OverviewSingle Image Super Resolution Overview
Single Image Super Resolution Overview
 
Depth Fusion from RGB and Depth Sensors II
Depth Fusion from RGB and Depth Sensors IIDepth Fusion from RGB and Depth Sensors II
Depth Fusion from RGB and Depth Sensors II
 

Ähnlich wie BEV Object Detection and Prediction

3-d interpretation from single 2-d image IV
3-d interpretation from single 2-d image IV3-d interpretation from single 2-d image IV
3-d interpretation from single 2-d image IVYu Huang
 
3-d interpretation from single 2-d image III
3-d interpretation from single 2-d image III3-d interpretation from single 2-d image III
3-d interpretation from single 2-d image IIIYu Huang
 
fusion of Camera and lidar for autonomous driving I
fusion of Camera and lidar for autonomous driving Ifusion of Camera and lidar for autonomous driving I
fusion of Camera and lidar for autonomous driving IYu Huang
 
Fisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIFisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIYu Huang
 
fusion of Camera and lidar for autonomous driving II
fusion of Camera and lidar for autonomous driving IIfusion of Camera and lidar for autonomous driving II
fusion of Camera and lidar for autonomous driving IIYu Huang
 
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdfmokamojah
 
LiDAR-based Autonomous Driving III (by Deep Learning)
LiDAR-based Autonomous Driving III (by Deep Learning)LiDAR-based Autonomous Driving III (by Deep Learning)
LiDAR-based Autonomous Driving III (by Deep Learning)Yu Huang
 
Deep VO and SLAM
Deep VO and SLAMDeep VO and SLAM
Deep VO and SLAMYu Huang
 
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4IRJET Journal
 
3-d interpretation from single 2-d image for autonomous driving II
3-d interpretation from single 2-d image for autonomous driving II3-d interpretation from single 2-d image for autonomous driving II
3-d interpretation from single 2-d image for autonomous driving IIYu Huang
 
Understanding the world in 3D with AI.pdf
Understanding the world in 3D with AI.pdfUnderstanding the world in 3D with AI.pdf
Understanding the world in 3D with AI.pdfQualcomm Research
 
CVGIP 2010 Part 3
CVGIP 2010 Part 3CVGIP 2010 Part 3
CVGIP 2010 Part 3Cody Liu
 
Goal location prediction based on deep learning using RGB-D camera
Goal location prediction based on deep learning using RGB-D cameraGoal location prediction based on deep learning using RGB-D camera
Goal location prediction based on deep learning using RGB-D camerajournalBEEI
 
Real-time 3D Object Pose Estimation and Tracking for Natural Landmark Based V...
Real-time 3D Object Pose Estimation and Tracking for Natural Landmark Based V...Real-time 3D Object Pose Estimation and Tracking for Natural Landmark Based V...
Real-time 3D Object Pose Estimation and Tracking for Natural Landmark Based V...c.choi
 
Lidar for Autonomous Driving II (via Deep Learning)
Lidar for Autonomous Driving II (via Deep Learning)Lidar for Autonomous Driving II (via Deep Learning)
Lidar for Autonomous Driving II (via Deep Learning)Yu Huang
 
Fisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VFisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VYu Huang
 
Deep learning fundamental and Research project on IBM POWER9 system from NUS
Deep learning fundamental and Research project on IBM POWER9 system from NUSDeep learning fundamental and Research project on IBM POWER9 system from NUS
Deep learning fundamental and Research project on IBM POWER9 system from NUSGanesan Narayanasamy
 
ppt 20BET1024.pptx
ppt 20BET1024.pptxppt 20BET1024.pptx
ppt 20BET1024.pptxManeetBali
 

Ähnlich wie BEV Object Detection and Prediction (20)

3-d interpretation from single 2-d image IV
3-d interpretation from single 2-d image IV3-d interpretation from single 2-d image IV
3-d interpretation from single 2-d image IV
 
3-d interpretation from single 2-d image III
3-d interpretation from single 2-d image III3-d interpretation from single 2-d image III
3-d interpretation from single 2-d image III
 
fusion of Camera and lidar for autonomous driving I
fusion of Camera and lidar for autonomous driving Ifusion of Camera and lidar for autonomous driving I
fusion of Camera and lidar for autonomous driving I
 
Fisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIFisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VI
 
fusion of Camera and lidar for autonomous driving II
fusion of Camera and lidar for autonomous driving IIfusion of Camera and lidar for autonomous driving II
fusion of Camera and lidar for autonomous driving II
 
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
 
LiDAR-based Autonomous Driving III (by Deep Learning)
LiDAR-based Autonomous Driving III (by Deep Learning)LiDAR-based Autonomous Driving III (by Deep Learning)
LiDAR-based Autonomous Driving III (by Deep Learning)
 
Deep VO and SLAM
Deep VO and SLAMDeep VO and SLAM
Deep VO and SLAM
 
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4
 
3-d interpretation from single 2-d image for autonomous driving II
3-d interpretation from single 2-d image for autonomous driving II3-d interpretation from single 2-d image for autonomous driving II
3-d interpretation from single 2-d image for autonomous driving II
 
Nadia2013 research
Nadia2013 researchNadia2013 research
Nadia2013 research
 
Understanding the world in 3D with AI.pdf
Understanding the world in 3D with AI.pdfUnderstanding the world in 3D with AI.pdf
Understanding the world in 3D with AI.pdf
 
CVGIP 2010 Part 3
CVGIP 2010 Part 3CVGIP 2010 Part 3
CVGIP 2010 Part 3
 
Goal location prediction based on deep learning using RGB-D camera
Goal location prediction based on deep learning using RGB-D cameraGoal location prediction based on deep learning using RGB-D camera
Goal location prediction based on deep learning using RGB-D camera
 
Real-time 3D Object Pose Estimation and Tracking for Natural Landmark Based V...
Real-time 3D Object Pose Estimation and Tracking for Natural Landmark Based V...Real-time 3D Object Pose Estimation and Tracking for Natural Landmark Based V...
Real-time 3D Object Pose Estimation and Tracking for Natural Landmark Based V...
 
Lidar for Autonomous Driving II (via Deep Learning)
Lidar for Autonomous Driving II (via Deep Learning)Lidar for Autonomous Driving II (via Deep Learning)
Lidar for Autonomous Driving II (via Deep Learning)
 
Fisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VFisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving V
 
Deep learning fundamental and Research project on IBM POWER9 system from NUS
Deep learning fundamental and Research project on IBM POWER9 system from NUSDeep learning fundamental and Research project on IBM POWER9 system from NUS
Deep learning fundamental and Research project on IBM POWER9 system from NUS
 
ppt 20BET1024.pptx
ppt 20BET1024.pptxppt 20BET1024.pptx
ppt 20BET1024.pptx
 
kanimozhi2019.pdf
kanimozhi2019.pdfkanimozhi2019.pdf
kanimozhi2019.pdf
 

Mehr von Yu Huang

Application of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingApplication of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingYu Huang
 
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...Yu Huang
 
Data Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous DrivingData Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous DrivingYu Huang
 
Techniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingTechniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingYu Huang
 
Fisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IVFisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IVYu Huang
 
Prediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduPrediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduYu Huang
 
Cruise AI under the Hood
Cruise AI under the HoodCruise AI under the Hood
Cruise AI under the HoodYu Huang
 
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)Yu Huang
 
How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?Yu Huang
 
Annotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingAnnotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingYu Huang
 
Simulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgSimulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgYu Huang
 
Multi sensor calibration by deep learning
Multi sensor calibration by deep learningMulti sensor calibration by deep learning
Multi sensor calibration by deep learningYu Huang
 
Prediction and planning for self driving at waymo
Prediction and planning for self driving at waymoPrediction and planning for self driving at waymo
Prediction and planning for self driving at waymoYu Huang
 
Jointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planningJointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planningYu Huang
 
Data pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingData pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingYu Huang
 
Open Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planningOpen Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planningYu Huang
 
Lidar in the adverse weather: dust, fog, snow and rain
Lidar in the adverse weather: dust, fog, snow and rainLidar in the adverse weather: dust, fog, snow and rain
Lidar in the adverse weather: dust, fog, snow and rainYu Huang
 
Autonomous Driving of L3/L4 Commercial trucks
Autonomous Driving of L3/L4 Commercial trucksAutonomous Driving of L3/L4 Commercial trucks
Autonomous Driving of L3/L4 Commercial trucksYu Huang
 
3-d interpretation from single 2-d image V
3-d interpretation from single 2-d image V3-d interpretation from single 2-d image V
3-d interpretation from single 2-d image VYu Huang
 
BEV Semantic Segmentation
BEV Semantic SegmentationBEV Semantic Segmentation
BEV Semantic SegmentationYu Huang
 

Mehr von Yu Huang (20)

Application of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingApplication of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous Driving
 
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
 
Data Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous DrivingData Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous Driving
 
Techniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingTechniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous Driving
 
Fisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IVFisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IV
 
Prediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduPrediction,Planninng & Control at Baidu
Prediction,Planninng & Control at Baidu
 
Cruise AI under the Hood
Cruise AI under the HoodCruise AI under the Hood
Cruise AI under the Hood
 
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
 
How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?
 
Annotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingAnnotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous Driving
 
Simulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgSimulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atg
 
Multi sensor calibration by deep learning
Multi sensor calibration by deep learningMulti sensor calibration by deep learning
Multi sensor calibration by deep learning
 
Prediction and planning for self driving at waymo
Prediction and planning for self driving at waymoPrediction and planning for self driving at waymo
Prediction and planning for self driving at waymo
 
Jointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planningJointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planning
 
Data pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingData pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous driving
 
Open Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planningOpen Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planning
 
Lidar in the adverse weather: dust, fog, snow and rain
Lidar in the adverse weather: dust, fog, snow and rainLidar in the adverse weather: dust, fog, snow and rain
Lidar in the adverse weather: dust, fog, snow and rain
 
Autonomous Driving of L3/L4 Commercial trucks
Autonomous Driving of L3/L4 Commercial trucksAutonomous Driving of L3/L4 Commercial trucks
Autonomous Driving of L3/L4 Commercial trucks
 
3-d interpretation from single 2-d image V
3-d interpretation from single 2-d image V3-d interpretation from single 2-d image V
3-d interpretation from single 2-d image V
 
BEV Semantic Segmentation
BEV Semantic SegmentationBEV Semantic Segmentation
BEV Semantic Segmentation
 

Kürzlich hochgeladen

Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 

Kürzlich hochgeladen (20)

Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 

BEV Object Detection and Prediction

  • 1. BEV’S OBJECT DETECTION AND PREDICTION Yu Huang Sunnyvale, California Yu.huang07@gmail.com
  • 2. OUTLINE • DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries • BEVDet: High-Performance Multi-Camera 3D Object Detection in BEV • BEVDet4D: Exploit Temporal Cues in Multi-camera 3D Object Detection • PETR: Position Embedding Transformation for Multi-View 3D Object Detection • FIERY: Future Instance Prediction in Bird’s-Eye View from Surround Monocular Cameras • BEVDepth: Acquisition of Reliable Depth for Multi-view 3D Object Detection • PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images • ST-P3: E2E Vision-based Autonomous Driving via S-T Feature Learning
  • 3. DETR3D: 3D OBJECT DETECTION FROM MULTI- VIEW IMAGES VIA 3D-TO-2D QUERIES • This method manipulates predictions directly in 3D space, which architecture extracts 2D features from multiple camera images and then uses a sparse set of 3D object queries to index into these 2D features, linking 3D positions to multi-view images using camera transformation matrices. • Finally, the model makes a bounding box prediction per object query, using a set-to-set loss to measure the discrepancy between the ground-truth and the prediction. • This top-down approach outperforms its bottom-up counterpart in which object bounding box prediction follows per-pixel depth estimation, since it does not suffer from the compounding error introduced by a depth prediction model. • Moreover, it does not require post-processing such as non-maximum suppression, dramatically improving inference speed.
  • 4. DETR3D: 3D OBJECT DETECTION FROM MULTI- VIEW IMAGES VIA 3D-TO-2D QUERIES
  • 5. DETR3D: 3D OBJECT DETECTION FROM MULTI- VIEW IMAGES VIA 3D-TO-2D QUERIES
  • 6. DETR3D: 3D OBJECT DETECTION FROM MULTI- VIEW IMAGES VIA 3D-TO-2D QUERIES
  • 7. BEVDET: HIGH-PERFORMANCE MULTI-CAMERA 3D OBJECT DETECTION IN BIRD-EYE-VIEW • BEVDet is developed by following the principle of detecting the 3D objects in Bird-Eye-View (BEV), where route planning can be handily performed. • In this paradigm, four kinds of modules are conducted in succession with different roles: an image-view encoder for encoding feature in image view, a view transformer for feature transformation from image view to BEV, a BEV encoder for further encoding feature in BEV, and a task-specific head for predicting the targets in BEV. • reuse the existing modules for constructing BEVDet and make it feasible for multi-camera 3D object detection by constructing an exclusive data augmentation strategy. • The proposed paradigm works well in multi-camera 3D object detection and offers a good trade-off between computing budget and performance. • BEVDet with 704×256 (1/8 of the competitors) image size scores 29.4% mAP and 38.4% NDS on the nuScenes val set, which is comparable with FCOS3D (i.e., 2008.2 GFLOPs, 1.7 FPS, 29.5% mAP, and 37.2% NDS), while requires just 12% computing budget of 239.4 GFLOPs and runs 4.3 times faster.
  • 8. BEVDET: HIGH-PERFORMANCE MULTI-CAMERA 3D OBJECT DETECTION IN BIRD-EYE-VIEW
  • 9. BEVDET: HIGH-PERFORMANCE MULTI-CAMERA 3D OBJECT DETECTION IN BIRD-EYE-VIEW
  • 10. BEVDET4D: EXPLOIT TEMPORAL CUES IN MULTI- CAMERA 3D OBJECT DETECTION • For fundamentally pushing the performance boundary in this area, BEVDet4D is proposed to lift the scalable BEVDet paradigm from the spatial-only 3D space to the spatial-temporal 4D space. • It upgrades the framework with a few modifications just for fusing the feature from the previous frame with the corresponding one in the current frame. • In this way, with negligible extra computing budget, enable the algorithm to access the temporal cues by querying and comparing the two candidate features. • Beyond this, also simplify the velocity learning task by removing the factors of ego-motion and time, which equips BEVDet4D with robust generalization performance and reduces the velocity error by 52.8%.
  • 11. BEVDET4D: EXPLOIT TEMPORAL CUES IN MULTI- CAMERA 3D OBJECT DETECTION
  • 12. BEVDET4D: EXPLOIT TEMPORAL CUES IN MULTI- CAMERA 3D OBJECT DETECTION
  • 13. BEVDET4D: EXPLOIT TEMPORAL CUES IN MULTI- CAMERA 3D OBJECT DETECTION
  • 14. BEVDET4D: EXPLOIT TEMPORAL CUES IN MULTI- CAMERA 3D OBJECT DETECTION
  • 15. PETR: POSITION EMBEDDING TRANSFORMATION FOR MULTI-VIEW 3D OBJECT DETECTION • In this paper, develop position embedding transformation (PETR) for multi-view 3D object detection. • PETR encodes the position information of 3D coordinates into image features, producing the 3D position-aware features. • Object query can perceive the 3D position- aware features and perform end-to-end object detection. • PETR achieves state-of-the-art performance (50.4% NDS and 44.1% mAP) on standard nuScenes dataset and ranks 1st place on the benchmark.
  • 16. PETR: POSITION EMBEDDING TRANSFORMATION FOR MULTI-VIEW 3D OBJECT DETECTION (a) In DETR, the object queries interact with 2D features to perform 2D detection. (b) DETR3D repeatedly projects the generated 3D reference points into image plane and samples the 2D features to interact with object queries in decoder. (c) PETR generates the 3D position-aware features by encoding the 3D position embedding into 2D image features. The object queries directly interact with 3D position- aware features and output 3D detection results.
  • 17. PETR: POSITION EMBEDDING TRANSFORMATION FOR MULTI-VIEW 3D OBJECT DETECTION
  • 18. PETR: POSITION EMBEDDING TRANSFORMATION FOR MULTI-VIEW 3D OBJECT DETECTION 3D Position Encoder
  • 19. PETR: POSITION EMBEDDING TRANSFORMATION FOR MULTI-VIEW 3D OBJECT DETECTION
  • 20. PETR: POSITION EMBEDDING TRANSFORMATION FOR MULTI-VIEW 3D OBJECT DETECTION
  • 21. PETR: POSITION EMBEDDING TRANSFORMATION FOR MULTI-VIEW 3D OBJECT DETECTION
  • 22. FIERY: FUTURE INSTANCE PREDICTION IN BIRD’S-EYE VIEW FROM SURROUND MONOCULAR CAMERAS • Driving requires interacting with road agents and predicting their future behaviour in order to navigate safely. • FIERY: a probabilistic future prediction model in bird’s-eye view from monocular cameras. • The model predicts future instance segmentation and motion of dynamic agents that can be transformed into non-parametric future trajectories. • The approach combines the perception, sensor fusion and prediction components of a traditional autonomous driving stack by estimating bird’s-eye-view prediction directly from surround RGB monocular camera inputs. • FIERY learns to model the inherent stochastic nature of the future solely from camera driving data in an end-to- end manner, without relying on HD maps, and predicts multimodal future trajectories. • The code and trained models are available at https://github.com/wayveai/fiery.
  • 23. FIERY: FUTURE INSTANCE PREDICTION IN BIRD’S-EYE VIEW FROM SURROUND MONOCULAR CAMERAS
  • 24. FIERY: FUTURE INSTANCE PREDICTION IN BIRD’S-EYE VIEW FROM SURROUND MONOCULAR CAMERAS
  • 25. FIERY: FUTURE INSTANCE PREDICTION IN BIRD’S-EYE VIEW FROM SURROUND MONOCULAR CAMERAS
  • 26. FIERY: FUTURE INSTANCE PREDICTION IN BIRD’S-EYE VIEW FROM SURROUND MONOCULAR CAMERAS
  • 27. FIERY: FUTURE INSTANCE PREDICTION IN BIRD’S-EYE VIEW FROM SURROUND MONOCULAR CAMERAS
  • 28. FIERY: FUTURE INSTANCE PREDICTION IN BIRD’S-EYE VIEW FROM SURROUND MONOCULAR CAMERAS
  • 29. BEVDEPTH: ACQUISITION OF RELIABLE DEPTH FOR MULTI-VIEW 3D OBJECT DETECTION • In this research, a new 3D object detector with a trustworthy depth estimation, dubbed BEVDepth, for camera-based Bird’s-Eye-View (BEV) 3D object detection. • the depth estimation is implicitly learned without camera information, making it the de-facto fake- depth for creating the following pseudo point cloud. • BEVDepth gets explicit depth supervision utilizing encoded intrinsic and extrinsic parameters. • A depth correction sub-network is further introduced to counteract projecting-induced disturbances in depth ground truth. • To reduce the speed bottleneck while projecting features from image-view into BEV using estimated depth, a quick view-transform operation is also proposed. • Besides, BEVDepth can be easily extended with input from multi-frame.
  • 30. BEVDEPTH: ACQUISITION OF RELIABLE DEPTH FOR MULTI-VIEW 3D OBJECT DETECTION
  • 31. BEVDEPTH: ACQUISITION OF RELIABLE DEPTH FOR MULTI-VIEW 3D OBJECT DETECTION
  • 32. BEVDEPTH: ACQUISITION OF RELIABLE DEPTH FOR MULTI-VIEW 3D OBJECT DETECTION
  • 33. BEVDEPTH: ACQUISITION OF RELIABLE DEPTH FOR MULTI-VIEW 3D OBJECT DETECTION
  • 34. BEVDEPTH: ACQUISITION OF RELIABLE DEPTH FOR MULTI-VIEW 3D OBJECT DETECTION
  • 35. BEVDEPTH: ACQUISITION OF RELIABLE DEPTH FOR MULTI-VIEW 3D OBJECT DETECTION
  • 36. PETRV2: A UNIFIED FRAMEWORK FOR 3D PERCEPTION FROM MULTI-CAMERA IMAGES • PETRv2, a unified framework for 3D perception from multi-view images. • Based on PETR, PETRv2 explores the effectiveness of temporal modeling, which utilizes the temporal information of previous frames to boost 3D object detection. • More specifically, extend the 3D position embedding (3D PE) in PETR for temporal modeling. • The 3D PE achieves the temporal alignment on object position of different frames. • A feature-guided position encoder is further introduced to improve the data adaptability of 3D PE. • To support for high-quality BEV segmentation, PETRv2 provides a simply yet effective solution by adding a set of segmentation queries. • Each segmentation query is responsible for segmenting one specific patch of BEV map. • Code is available at https://github.com/megvii-research/PETR.
  • 37. PETRV2: A UNIFIED FRAMEWORK FOR 3D PERCEPTION FROM MULTI-CAMERA IMAGES
  • 38. PETRV2: A UNIFIED FRAMEWORK FOR 3D PERCEPTION FROM MULTI-CAMERA IMAGES coordinate system transformation feature-guided position encoder
  • 39. PETRV2: A UNIFIED FRAMEWORK FOR 3D PERCEPTION FROM MULTI-CAMERA IMAGES
  • 40. PETRV2: A UNIFIED FRAMEWORK FOR 3D PERCEPTION FROM MULTI-CAMERA IMAGES
  • 41. ST-P3: END-TO-END VISION-BASED AUTONOMOUS DRIVING VIA SPATIAL-TEMPORAL FEATURE LEARNING • While there are some pineering works on LiDAR-based input or implicit design, this paper formulates the problem in an interpretable vision-based setting. • In particular, propose a spatial-temporal feature learning scheme towards a set of more representative features for perception, prediction and planning tasks simultaneously, which is called ST-P3. • Specifically, an egocentric-aligned accumulation technique is proposed to preserve geometry information in 3D space before the bird’s eye view transformation for perception; a dual pathway modeling is devised to take past motion variations into account for future prediction; a temporal-based refinement unit is introduced to compensate for recognizing vision-based elements for planning. • Source code available at https://github.com/OpenPerceptionX/ST-P3.
  • 42. ST-P3: END-TO-END VISION-BASED AUTONOMOUS DRIVING VIA SPATIAL-TEMPORAL FEATURE LEARNING
  • 43. ST-P3: END-TO-END VISION-BASED AUTONOMOUS DRIVING VIA SPATIAL-TEMPORAL FEATURE LEARNING Egocentric aligned accumulation for Perception
  • 44. ST-P3: END-TO-END VISION-BASED AUTONOMOUS DRIVING VIA SPATIAL-TEMPORAL FEATURE LEARNING Dual pathway modelling for Prediction.
  • 45. ST-P3: END-TO-END VISION-BASED AUTONOMOUS DRIVING VIA SPATIAL-TEMPORAL FEATURE LEARNING Prior knowledge integration and refinement for Planning.
  • 46. ST-P3: END-TO-END VISION-BASED AUTONOMOUS DRIVING VIA SPATIAL-TEMPORAL FEATURE LEARNING
  • 47. ST-P3: END-TO-END VISION-BASED AUTONOMOUS DRIVING VIA SPATIAL-TEMPORAL FEATURE LEARNING
  • 48. ST-P3: END-TO-END VISION-BASED AUTONOMOUS DRIVING VIA SPATIAL-TEMPORAL FEATURE LEARNING
  • 49. ST-P3: END-TO-END VISION-BASED AUTONOMOUS DRIVING VIA SPATIAL-TEMPORAL FEATURE LEARNING