SlideShare a Scribd company logo
1 of 27
Download to read offline
The New Perception Framework
in Autonomous Driving:
Yu Huang
Chief Scientist
AnIntroductionofBEVNetwork
01
02
03
04
05
06
07
Autonomous Driving is one of the most challenging AI applications in the world, defined from L2 to L5,
with Operation Design Domain, like Highway pilot, Urban pilot, Traffic Jam pilot, Robtaxi/bus/truck etc.
A solution could be modular, i.e. a pipeline of perception, mapping & localization, prediction, planning
and control, or end-to-end (E2E) or partially E2E;
n There are roughly two research & development routes, progressive step by step (L2->L4) or leaps and
bounds (L4), additionally acting like dimension reduction (L4->L2+);
n Challenging problems in AV: long tailed with corner cases, safety-critical scenarios, and mass
production requirements (closed loop).
BEV Network
The Bird’s-Eye-View (BEV) is a natural view to serve as a unified representation for 3-D environment
understanding for perception module in autonomous driving;
BEV contains rich semantic info, precise localization, and absolute scales, which can be directly deployed
by many downstream real-world applications such as behavior prediction, motion planning, etc.
BEVerse for 3D detection/map segmentation/motion prediction
n BEV provides a physics-interpretable way to fuse information from different views, modalities, time
series, and agents.
v Spatial and temporal fusion
BEVFormer for multiple cameras’ spatial-temporal fusion
n BEV provides a physics-interpretable way to fuse information from different views, modalities, time
series, and agents.
v Sensor fusion
Multi-task Fusion framework in BEVFusion
n BEV provides a physics-interpretable way to fuse information from different views, modalities, time
series, and agents.
v V2X collaboration
UniBEV
View transformation plays a vital role in camera-only 3D perception, from Perspective View (PV) to BEV.
Copied from the survey paper “Delving into the Devils of Bird’s-eye-view Perception”
Current BEV approaches can be divided into two main categories based on view transformation:
geometry-based and network-based;
Copied from the survey paper “Delving into the Devils of Bird’s-eye-view Perception”
geometry-based
network-based
In geometry-based methods, earlier work tries homograph based on the flat-ground constraints.
Sim2Real for BEV Segmentation
The state-of-art solution in geometry-based approaches is lifting 2D features to 3D space by explicit or
implicit depth estimation, i.e. depth-based (point-based or voxel-based).
Lift, Splat, Shoot (LSS)
In network-based methods, the straightforward idea is to use MLP in a bottom-up strategy to project the
PV features to BEV;
Fishing Net for Semantic Segmentation
Another framework in network-based BEV employs a top-down strategy by directly constructing BEV
queries and searching corresponding features on PV images by the cross attention mechanism, i.e.
transformer (with either sparse queries or dense queries).
Ego3RT: Ego 3D Representation
Though by a hard flat-ground assumption, homograph-based methods has good
interpretability, where IPM (inverse perspective mapping) plays a role in image
projection or feature projection for downstream perception tasks;
Depth-based methods are usually built on an explicit 3D representation, quantized
voxels or point clouds (like pseudo-LiDAR) scattering in continuous 3D space.
l Point-based suffer from the model complexity and lower performance;
l Voxel-based is popular due to computation efficiency and flexibility.
MLP-based view transform is hard due to lack of depth info, occlusion etc.;
Transformer with either sparse (detection) or dense (map segmentation as well) queries,
gains impressive performance with strong relation modeling and data-dependent
property, but the efficiency is still a problem.
01
02
03 • Backbone (RegNet)/Bottleneck (FPN)
04 • Shared backbone or not?
05 • Auxiliary task design, multiple stage training
06
07
To apply BEV for autonomous driving, a data closed loop is required to build:
• Data selection is performed at both the vehicle and server side, where the data is selected from the vehicles based
on rough rules ,like shadow modes, abnormal driving operations or specific scenario detection, and then the
collected data at the server selectively goes to annotation and training based on AI rules, such as active learning;
To apply BEV for autonomous driving, a data closed loop is required to build:
• A big model (offline, non-real-time) for BEV only works at the server, where transformer network with dense
queries is used for view transform;
毫末
To apply BEV for autonomous driving, a data closed loop is required to build:
• A light model (real-time online) for BEV is deployed only for the vehicle on board, where the voxel-based view
transform with depth supervision is designed;
To apply BEV for autonomous driving, a data closed loop is required to build:
• BEV data annotation is specific due to innate 3-D structure, captured either from 3-D sensor (LiDAR)
NuScenes
To apply BEV for autonomous driving, a data closed loop is required to build:
• BEV data annotation is specific due to innate 3-D structure, captured either from 3-D sensor (LiDAR) or from 3-D
visual reconstruction of cameras;
Images
IMU
Odometry
GPS
Big
Neural Net
Model
Segment
Depth
Flow
Static BG & Ego Traject
Moving Objects & Kine
Tesla
Elevation
To apply BEV for autonomous driving, a data closed loop is required to build:
• Simulation platform is used for photo-realistic image data synthesis, digital twin (from real-to-sim) , scenario
generalization and style transfer (from sim-to-real);
Google Block-NeRF Simulation with ground truth Carla Simulator
Nvidia OmniVerse
To apply BEV for autonomous driving, a data closed loop is required to build:
• A teacher-student training framework assists the knowledge distillation in BEV model training and deployment.
BEV network is the new paradigm for computer vision, showing its strong potential in
autonomous driving application;
BEV’s network design relies on the computing platform, either at the server side or the client
side (vehicle in ADS);
The data closed loop is a must for autonomous driving R&D, where BEV needs pay more attention
to data selection and annotation;
Simulation platform can relieve the burden of BEV data annotation with State-of-art techniques
like photorealistic rendering, digital twin, scenario generalization and style transfer etc.;
To optimize the best deployment of BEV, knowledge distillation is helpful in trade-off of
performance and computation complexity.
Questions?

More Related Content

What's hot

Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Simplilearn
 

What's hot (20)

PR-214: FlowNet: Learning Optical Flow with Convolutional Networks
PR-214: FlowNet: Learning Optical Flow with Convolutional NetworksPR-214: FlowNet: Learning Optical Flow with Convolutional Networks
PR-214: FlowNet: Learning Optical Flow with Convolutional Networks
 
How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?
 
BEV Semantic Segmentation
BEV Semantic SegmentationBEV Semantic Segmentation
BEV Semantic Segmentation
 
Machine learning ppt.
Machine learning ppt.Machine learning ppt.
Machine learning ppt.
 
[Mmlab seminar 2016] deep learning for human pose estimation
[Mmlab seminar 2016] deep learning for human pose estimation[Mmlab seminar 2016] deep learning for human pose estimation
[Mmlab seminar 2016] deep learning for human pose estimation
 
Lidar for Autonomous Driving II (via Deep Learning)
Lidar for Autonomous Driving II (via Deep Learning)Lidar for Autonomous Driving II (via Deep Learning)
Lidar for Autonomous Driving II (via Deep Learning)
 
Deformable DETR Review [CDM]
Deformable DETR Review [CDM]Deformable DETR Review [CDM]
Deformable DETR Review [CDM]
 
Deep Learning for Autonomous Driving
Deep Learning for Autonomous DrivingDeep Learning for Autonomous Driving
Deep Learning for Autonomous Driving
 
ANOMALY DETECTION IN INTELLIGENT TRANSPORTATION SYSTEM using real-time video...
 ANOMALY DETECTION IN INTELLIGENT TRANSPORTATION SYSTEM using real-time video... ANOMALY DETECTION IN INTELLIGENT TRANSPORTATION SYSTEM using real-time video...
ANOMALY DETECTION IN INTELLIGENT TRANSPORTATION SYSTEM using real-time video...
 
Superpixel algorithms (whatershed, mean-shift, SLIC, BSLIC), Foolad
Superpixel algorithms (whatershed, mean-shift, SLIC, BSLIC), FooladSuperpixel algorithms (whatershed, mean-shift, SLIC, BSLIC), Foolad
Superpixel algorithms (whatershed, mean-shift, SLIC, BSLIC), Foolad
 
Deep Generative Models
Deep Generative Models Deep Generative Models
Deep Generative Models
 
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
 
Viktor Sdobnikov - Computer Vision for Advanced Driver Assistance Systems (AD...
Viktor Sdobnikov - Computer Vision for Advanced Driver Assistance Systems (AD...Viktor Sdobnikov - Computer Vision for Advanced Driver Assistance Systems (AD...
Viktor Sdobnikov - Computer Vision for Advanced Driver Assistance Systems (AD...
 
A thesis presentation on pothole detection
A thesis presentation on pothole detectionA thesis presentation on pothole detection
A thesis presentation on pothole detection
 
Machine Learning & Self-Driving Cars
Machine Learning & Self-Driving CarsMachine Learning & Self-Driving Cars
Machine Learning & Self-Driving Cars
 
Deep Learning for Computer Vision: Object Detection (UPC 2016)
Deep Learning for Computer Vision: Object Detection (UPC 2016)Deep Learning for Computer Vision: Object Detection (UPC 2016)
Deep Learning for Computer Vision: Object Detection (UPC 2016)
 
road lane detection.pptx
road lane detection.pptxroad lane detection.pptx
road lane detection.pptx
 
Artificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsArtificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning Models
 
Human Action Recognition
Human Action RecognitionHuman Action Recognition
Human Action Recognition
 
Deep learning-for-pose-estimation-wyang-defense
Deep learning-for-pose-estimation-wyang-defenseDeep learning-for-pose-estimation-wyang-defense
Deep learning-for-pose-estimation-wyang-defense
 

Similar to The New Perception Framework in Autonomous Driving: An Introduction of BEV Network

20100117US001c-3DVisualizationOfRailroadWheelFlaws
20100117US001c-3DVisualizationOfRailroadWheelFlaws20100117US001c-3DVisualizationOfRailroadWheelFlaws
20100117US001c-3DVisualizationOfRailroadWheelFlaws
Ben Rayner
 
“Using a Collaborative Network of Distributed Cameras for Object Tracking,” a...
“Using a Collaborative Network of Distributed Cameras for Object Tracking,” a...“Using a Collaborative Network of Distributed Cameras for Object Tracking,” a...
“Using a Collaborative Network of Distributed Cameras for Object Tracking,” a...
Edge AI and Vision Alliance
 
Real Time Object Identification for Intelligent Video Surveillance Applications
Real Time Object Identification for Intelligent Video Surveillance ApplicationsReal Time Object Identification for Intelligent Video Surveillance Applications
Real Time Object Identification for Intelligent Video Surveillance Applications
Editor IJCATR
 

Similar to The New Perception Framework in Autonomous Driving: An Introduction of BEV Network (20)

Fisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIFisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VI
 
An Experimental Analysis on Self Driving Car Using CNN
An Experimental Analysis on Self Driving Car Using CNNAn Experimental Analysis on Self Driving Car Using CNN
An Experimental Analysis on Self Driving Car Using CNN
 
Unsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object trackingUnsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object tracking
 
IRJET- Semantic Segmentation using Deep Learning
IRJET- Semantic Segmentation using Deep LearningIRJET- Semantic Segmentation using Deep Learning
IRJET- Semantic Segmentation using Deep Learning
 
Fisheye-Omnidirectional View in Autonomous Driving III
Fisheye-Omnidirectional View in Autonomous Driving IIIFisheye-Omnidirectional View in Autonomous Driving III
Fisheye-Omnidirectional View in Autonomous Driving III
 
IRJET - Multi-Label Road Scene Prediction for Autonomous Vehicles using Deep ...
IRJET - Multi-Label Road Scene Prediction for Autonomous Vehicles using Deep ...IRJET - Multi-Label Road Scene Prediction for Autonomous Vehicles using Deep ...
IRJET - Multi-Label Road Scene Prediction for Autonomous Vehicles using Deep ...
 
Car Steering Angle Prediction Using Deep Learning
Car Steering Angle Prediction Using Deep LearningCar Steering Angle Prediction Using Deep Learning
Car Steering Angle Prediction Using Deep Learning
 
BEV Joint Detection and Segmentation
BEV Joint Detection and SegmentationBEV Joint Detection and Segmentation
BEV Joint Detection and Segmentation
 
Review On Different Feature Extraction Algorithms
Review On Different Feature Extraction AlgorithmsReview On Different Feature Extraction Algorithms
Review On Different Feature Extraction Algorithms
 
20100117US001c-3DVisualizationOfRailroadWheelFlaws
20100117US001c-3DVisualizationOfRailroadWheelFlaws20100117US001c-3DVisualizationOfRailroadWheelFlaws
20100117US001c-3DVisualizationOfRailroadWheelFlaws
 
IRJET- Automatic Traffic Sign Detection and Recognition using CNN
IRJET- Automatic Traffic Sign Detection and Recognition using CNNIRJET- Automatic Traffic Sign Detection and Recognition using CNN
IRJET- Automatic Traffic Sign Detection and Recognition using CNN
 
“Using a Collaborative Network of Distributed Cameras for Object Tracking,” a...
“Using a Collaborative Network of Distributed Cameras for Object Tracking,” a...“Using a Collaborative Network of Distributed Cameras for Object Tracking,” a...
“Using a Collaborative Network of Distributed Cameras for Object Tracking,” a...
 
IRJET- Traffic Sign Classification and Detection using Deep Learning
IRJET- Traffic Sign Classification and Detection using Deep LearningIRJET- Traffic Sign Classification and Detection using Deep Learning
IRJET- Traffic Sign Classification and Detection using Deep Learning
 
IRJET- Front View Identification of Vehicles by using Machine Learning Te...
IRJET-  	  Front View Identification of Vehicles by using Machine Learning Te...IRJET-  	  Front View Identification of Vehicles by using Machine Learning Te...
IRJET- Front View Identification of Vehicles by using Machine Learning Te...
 
Java image processing ieee projects 2012 @ Seabirds ( Chennai, Bangalore, Hyd...
Java image processing ieee projects 2012 @ Seabirds ( Chennai, Bangalore, Hyd...Java image processing ieee projects 2012 @ Seabirds ( Chennai, Bangalore, Hyd...
Java image processing ieee projects 2012 @ Seabirds ( Chennai, Bangalore, Hyd...
 
Bandit framework for systematic learning in wireless video based face recogni...
Bandit framework for systematic learning in wireless video based face recogni...Bandit framework for systematic learning in wireless video based face recogni...
Bandit framework for systematic learning in wireless video based face recogni...
 
Fisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VFisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving V
 
Real Time Object Identification for Intelligent Video Surveillance Applications
Real Time Object Identification for Intelligent Video Surveillance ApplicationsReal Time Object Identification for Intelligent Video Surveillance Applications
Real Time Object Identification for Intelligent Video Surveillance Applications
 
Vision-Based Motorcycle Crash Detection and Reporting Using Deep Learning
Vision-Based Motorcycle Crash Detection and Reporting Using Deep LearningVision-Based Motorcycle Crash Detection and Reporting Using Deep Learning
Vision-Based Motorcycle Crash Detection and Reporting Using Deep Learning
 
IRJET - A Survey Paper on Efficient Object Detection and Matching using F...
IRJET -  	  A Survey Paper on Efficient Object Detection and Matching using F...IRJET -  	  A Survey Paper on Efficient Object Detection and Matching using F...
IRJET - A Survey Paper on Efficient Object Detection and Matching using F...
 

More from Yu Huang

More from Yu Huang (20)

Application of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingApplication of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous Driving
 
BEV Object Detection and Prediction
BEV Object Detection and PredictionBEV Object Detection and Prediction
BEV Object Detection and Prediction
 
Fisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IVFisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IV
 
Prediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduPrediction,Planninng & Control at Baidu
Prediction,Planninng & Control at Baidu
 
Cruise AI under the Hood
Cruise AI under the HoodCruise AI under the Hood
Cruise AI under the Hood
 
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
 
Scenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingScenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous Driving
 
Annotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingAnnotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous Driving
 
Simulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgSimulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atg
 
Multi sensor calibration by deep learning
Multi sensor calibration by deep learningMulti sensor calibration by deep learning
Multi sensor calibration by deep learning
 
Prediction and planning for self driving at waymo
Prediction and planning for self driving at waymoPrediction and planning for self driving at waymo
Prediction and planning for self driving at waymo
 
Jointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planningJointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planning
 
Data pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingData pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous driving
 
Open Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planningOpen Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planning
 
Lidar in the adverse weather: dust, fog, snow and rain
Lidar in the adverse weather: dust, fog, snow and rainLidar in the adverse weather: dust, fog, snow and rain
Lidar in the adverse weather: dust, fog, snow and rain
 
Autonomous Driving of L3/L4 Commercial trucks
Autonomous Driving of L3/L4 Commercial trucksAutonomous Driving of L3/L4 Commercial trucks
Autonomous Driving of L3/L4 Commercial trucks
 
3-d interpretation from single 2-d image V
3-d interpretation from single 2-d image V3-d interpretation from single 2-d image V
3-d interpretation from single 2-d image V
 
3-d interpretation from single 2-d image IV
3-d interpretation from single 2-d image IV3-d interpretation from single 2-d image IV
3-d interpretation from single 2-d image IV
 
3-d interpretation from single 2-d image III
3-d interpretation from single 2-d image III3-d interpretation from single 2-d image III
3-d interpretation from single 2-d image III
 
Unsupervised semisupervised semantic or instance segmentation
Unsupervised semisupervised semantic or instance segmentationUnsupervised semisupervised semantic or instance segmentation
Unsupervised semisupervised semantic or instance segmentation
 

Recently uploaded

VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Dr.Costas Sachpazis
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
rknatarajan
 

Recently uploaded (20)

Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 

The New Perception Framework in Autonomous Driving: An Introduction of BEV Network

  • 1. The New Perception Framework in Autonomous Driving: Yu Huang Chief Scientist AnIntroductionofBEVNetwork
  • 3. Autonomous Driving is one of the most challenging AI applications in the world, defined from L2 to L5, with Operation Design Domain, like Highway pilot, Urban pilot, Traffic Jam pilot, Robtaxi/bus/truck etc.
  • 4. A solution could be modular, i.e. a pipeline of perception, mapping & localization, prediction, planning and control, or end-to-end (E2E) or partially E2E;
  • 5. n There are roughly two research & development routes, progressive step by step (L2->L4) or leaps and bounds (L4), additionally acting like dimension reduction (L4->L2+); n Challenging problems in AV: long tailed with corner cases, safety-critical scenarios, and mass production requirements (closed loop).
  • 6. BEV Network The Bird’s-Eye-View (BEV) is a natural view to serve as a unified representation for 3-D environment understanding for perception module in autonomous driving;
  • 7. BEV contains rich semantic info, precise localization, and absolute scales, which can be directly deployed by many downstream real-world applications such as behavior prediction, motion planning, etc. BEVerse for 3D detection/map segmentation/motion prediction
  • 8. n BEV provides a physics-interpretable way to fuse information from different views, modalities, time series, and agents. v Spatial and temporal fusion BEVFormer for multiple cameras’ spatial-temporal fusion
  • 9. n BEV provides a physics-interpretable way to fuse information from different views, modalities, time series, and agents. v Sensor fusion Multi-task Fusion framework in BEVFusion
  • 10. n BEV provides a physics-interpretable way to fuse information from different views, modalities, time series, and agents. v V2X collaboration UniBEV
  • 11. View transformation plays a vital role in camera-only 3D perception, from Perspective View (PV) to BEV. Copied from the survey paper “Delving into the Devils of Bird’s-eye-view Perception”
  • 12. Current BEV approaches can be divided into two main categories based on view transformation: geometry-based and network-based; Copied from the survey paper “Delving into the Devils of Bird’s-eye-view Perception” geometry-based network-based
  • 13. In geometry-based methods, earlier work tries homograph based on the flat-ground constraints. Sim2Real for BEV Segmentation
  • 14. The state-of-art solution in geometry-based approaches is lifting 2D features to 3D space by explicit or implicit depth estimation, i.e. depth-based (point-based or voxel-based). Lift, Splat, Shoot (LSS)
  • 15. In network-based methods, the straightforward idea is to use MLP in a bottom-up strategy to project the PV features to BEV; Fishing Net for Semantic Segmentation
  • 16. Another framework in network-based BEV employs a top-down strategy by directly constructing BEV queries and searching corresponding features on PV images by the cross attention mechanism, i.e. transformer (with either sparse queries or dense queries). Ego3RT: Ego 3D Representation
  • 17. Though by a hard flat-ground assumption, homograph-based methods has good interpretability, where IPM (inverse perspective mapping) plays a role in image projection or feature projection for downstream perception tasks; Depth-based methods are usually built on an explicit 3D representation, quantized voxels or point clouds (like pseudo-LiDAR) scattering in continuous 3D space. l Point-based suffer from the model complexity and lower performance; l Voxel-based is popular due to computation efficiency and flexibility. MLP-based view transform is hard due to lack of depth info, occlusion etc.; Transformer with either sparse (detection) or dense (map segmentation as well) queries, gains impressive performance with strong relation modeling and data-dependent property, but the efficiency is still a problem.
  • 18. 01 02 03 • Backbone (RegNet)/Bottleneck (FPN) 04 • Shared backbone or not? 05 • Auxiliary task design, multiple stage training 06 07
  • 19. To apply BEV for autonomous driving, a data closed loop is required to build: • Data selection is performed at both the vehicle and server side, where the data is selected from the vehicles based on rough rules ,like shadow modes, abnormal driving operations or specific scenario detection, and then the collected data at the server selectively goes to annotation and training based on AI rules, such as active learning;
  • 20. To apply BEV for autonomous driving, a data closed loop is required to build: • A big model (offline, non-real-time) for BEV only works at the server, where transformer network with dense queries is used for view transform; 毫末
  • 21. To apply BEV for autonomous driving, a data closed loop is required to build: • A light model (real-time online) for BEV is deployed only for the vehicle on board, where the voxel-based view transform with depth supervision is designed;
  • 22. To apply BEV for autonomous driving, a data closed loop is required to build: • BEV data annotation is specific due to innate 3-D structure, captured either from 3-D sensor (LiDAR) NuScenes
  • 23. To apply BEV for autonomous driving, a data closed loop is required to build: • BEV data annotation is specific due to innate 3-D structure, captured either from 3-D sensor (LiDAR) or from 3-D visual reconstruction of cameras; Images IMU Odometry GPS Big Neural Net Model Segment Depth Flow Static BG & Ego Traject Moving Objects & Kine Tesla Elevation
  • 24. To apply BEV for autonomous driving, a data closed loop is required to build: • Simulation platform is used for photo-realistic image data synthesis, digital twin (from real-to-sim) , scenario generalization and style transfer (from sim-to-real); Google Block-NeRF Simulation with ground truth Carla Simulator Nvidia OmniVerse
  • 25. To apply BEV for autonomous driving, a data closed loop is required to build: • A teacher-student training framework assists the knowledge distillation in BEV model training and deployment.
  • 26. BEV network is the new paradigm for computer vision, showing its strong potential in autonomous driving application; BEV’s network design relies on the computing platform, either at the server side or the client side (vehicle in ADS); The data closed loop is a must for autonomous driving R&D, where BEV needs pay more attention to data selection and annotation; Simulation platform can relieve the burden of BEV data annotation with State-of-art techniques like photorealistic rendering, digital twin, scenario generalization and style transfer etc.; To optimize the best deployment of BEV, knowledge distillation is helpful in trade-off of performance and computation complexity.