"Can We Have Both Safety and Performance in AI for Autonomous Vehicles?," a Presentation from Codeplay Software

© 2019 Codeplay Software Ltd
Can We Have Both Safety and
Performance in AI for
Autonomous Vehicles?
Andrew Richards
Codeplay
May 2019

Outline
About Codeplay
What is functional safety?
What does an automotive AI system look like in terms of architecture?
=> The wide variety of compute-intensive algorithms
Why do we need high performance for safety?
And why accelerators are the only way to get to high performance
Requirements for safe engineering
Challenges in bringing existing CPU safe engineering practices to
accelerators
2

Functional safety
Safety doesn’t mean the system doesn’t fail
➢ Safety means the system fails safely
How do you know if the system fails?
➢ You have to detect the failure with a high level of accuracy
➢ Both incorrect results and late results are a failure
What do you do if the system fails?
➢ You have to come up with a safe state to return to
3

Functional Safety
“Absence of unreasonable risk due to
hazards caused by malfunctioning
behavior of electrical/electronic
systems”
The standard requires the
Development of the Product to be
“State of the Art”
Functional Safety lifecycle top down
approach from Vehicle to IPs & SW
Components
Safety Compliance from Project
Initiation to Project decommission

Safety failure types
Systematic Failures:
Result from a failure in
design or manufacturing
Often a result of failure to
follow best practices
Rate of systematic failures
can be reduced through
continual and rigorous
process improvement
Random Failures:
Result from random
defects inherent to process
or usage condition
Rate of random failures
cannot generally be
reduced; focus must be on
the detection and handling
of random failures in the
application

SOTIF: Safety Of The Intended Function
Systems or subsystems can cause
hazards based on erroneous
decision on the environment and
not necessarily caused by
malfunction of Electrical/Electronic
components (Addressed by ISO26262)
SOTIF answers the question of “How
do you intend to behave” by utilizing
the PAS guidance on design,
verification and validation.
SOTIF intends to address sensor
limitations (i.e. bad reflection,
snow), decision algorithms
(environment, location, highway
construction etc.), misuses by
drivers
6
2 1
43
Known
Unknown
Unsafe Safe
Reduction of scenarios in
areas 2 and 3 is the key,
by developing them onto
known scenarios
SAE levels for autonomous vehicles
SOTIF ISO 21448SOTIF
PAS 21448
Level 5
Fully
autonomous
Level 4
Deep self
control
Level 3
Limited
overall
control
Level 2
Execute
automated
manoeuvres
Level 1
Adaptive
assist
Level 0
Warnings

Safety of Autonomous Driving needs
High Performance
- High Performance makes Safety Hard
7

From sensing to control
Car controlPath planningSensor fusion
Deep learning
front-camera
Machine vision
and SLAM
surround cameras
LIDAR
RADAR
8
Redundancy is achieved by having multiple, independent, sensors and
perception algorithms combined via sensor fusion

Performance cannot be achieved with CPUs
Car
control
Path
plannin
g
Object
trajectory
tracking /
prediction
Sensor
fusion
3D
mapping
Semantic
segmentation
Frame
capture
Camera
9
625
million
pixels per
second
1.5-7.5
TOPS for
each deep
learning
algorithm
250
million
cells
updated
per frame
/ sensor
Combine
all the
data
together
and check
Far beyond the processing power of a multi-core CPU
This level of processing can only be achieved with a
different AI accelerator designed for each class of
algorithm and sensor
Passive (fanless)
cooling requires no
more than 8 W-15 W
per processor
Adding a fan is a safety
challenge, as well as
adding a lot of cost

Types of AI accelerator
Deep learning
inference
accelerator
•Fixed-point
precision (8-bit or
16-bit)
•Can execute fast
convolutions and
some basic CNN
layers
•Very high
performance, but
low
programmability
Programmable
accelerator (vision
tasks e.g. SLAM)
•Mix of
programmable
and fixed-function
•Mix of fixed-point
and floating-point
•Highly data
parallel with on-
chip memory
•Throughput
optimized
Sensor fusion
accelerator
•Very
programmable
•Floating-point
•On-chip memory
and caches
•Latency
optimized
•Complex
algorithms
Fixed-function
accelerator
•Simpler LIDAR
and Radar
processing
•Some machine
vision tasks, e.g.
scaling
10

Requirements for safety

Requirements for safe engineering
• Redundancy (multiple systems)
• Fault detection (both timing and accuracy)
• Fault handling
• Fault injection (to test fault detection &
handling)
• Coverage checking (to ensure test coverage)
• Coding guidelines (e.g. MISRA)
• Little or no dynamic memory management
12
How do we
bring these
capabilities to
accelerators?

Redundancy: Systematic vs Random Faults
This architecture allows Processor #1 to fail
and Processor #2 to take over
But: what if the reason Processor #1 fails is
a fault that also applies to Processor #2?
➢ e.g. software failure in software that both
Processor #1 and Processor #2 run
A random fault may be solvable with two
identical redundant systems
But a systematic fault can only be solved
with two fundamentally different redundant
systems
13
Sensor
Processor
#1
Processor
#2
Fusion

Redundancy
Redundancy is much easier to achieve with sensors and perception than
sensor fusion, planning and control
Redundancy from two identical systems does not solve systematic faults,
only transient faults
By using standard programming models, much easier to achieve
redundancy: much easier to mix-and-match components from different
suppliers to avoid systematic faults
By using standard programming models, much easier to integrate tools
from multiple vendors, e.g. static checkers, or memory checkers
The OpenCL SC (“Safety Critical”), Vulkan SC and SYCL SC working-groups
are working towards defining safer versions of these standards
14

Fault detection
Timing faults can be detected with a watch-dog-timer
All operations must have a maximum timeout
The quantity of processing required for various perception algorithms can
vary by the scene: e.g. the more potential pedestrians discovered means
running pedestrian-classification on more regions of an image
One solution is to periodically pass known input data into each algorithm
and check it against known correct output data
The algorithms used must be deterministic (always give the same outputs
for the same inputs) which is not true of all parallel algorithms
15

Fault handling
Handling faults in highly parallel software is a surprisingly tough challenge
Faults detected asynchronously need to be stored somewhere and then
processed. They can’t be handled immediately without consuming
resources asynchronously. This is a safety challenge
For massively parallel software, large numbers of faults could be created at
once: how to handle?
Most parallel programming models handle faults very badly. It’s a much
harder challenge than people expect
16

Multi-threaded error handling
• Errors triggered on an accelerator are asynchronous
• Error handling can’t be executed on the accelerator
• When does the main CPU thread process error(s)?
17
Main CPU Thread
Offload
Accelerator ‘Thread’
OffloadRunkernel
Accelerator Handler CPU Thread
Error
Time
Accelerator
threads are
grouped
Where does this thread
store the error?
This thread waits for the
accelerator to complete. Is that
fast enough to process the error?

Pre-emption and independent forward progress
• Most accelerators are groups of SIMD/SIMT
units: this gives high performance per Watt
• “Single Instruction Multiple Data/Thread”
• This means each thread executes the same
instruction in “lock-step”
• Some threads may be inside the false branch
of a conditional: they “predicate” to not apply
effects of instructions until the condition ends
• This means that if one “thread” in a group
goes into an infinite loop, the others will also
pause indefinitely
• The accelerator does not complete until all
groups complete
18

Putting an accelerator in a safe state
With a CPU thread, you stop the thread by no
longer giving it CPU cycles
Stopping a CPU thread is instant. Stopping an
accelerator thread is not
Stopping one group of accelerator threads
doesn’t necessarily stop other groups of
threads
You can’t safely free accelerator-accessed
memory until all accelerator threads have
safely stopped. You can’t easily predict how
long this will take
Simple solution: Shut down the whole chip
19
Kill
threads
Accelerator
memory buffer

Fault injection
Can only test good fault handling if can inject faults into a system during
testing
Fault injection must happen asynchronously to be sure of finding bugs
Fault injection needs to work across multiple AI accelerators
Fault injection must be included into continuous test processes
Faults to consider:
• Transient hardware faults
• Overheating causing throttling
• Threading errors
• Algorithms taking an unusually long time to complete due to complex input data
We need fault injection tools for accelerators (e.g. NVIDIA SASSIFI)
20

Coverage checking
A standard ISO 26262 process is to require line-coverage and condition-
coverage for test suites.
Tests that each line (or condition combination) is tested in a test suite
Commonly-supported on CPUs, but what about AI accelerators?
The compilers for AI accelerators typically perform transformations, such as
data-parallel vectorization used with GPUs, that significantly changes the
control-flow of the program relative to the source code
How do we define coverage-checking for AI accelerators?
21

Coverage checking in a heterogeneous environment
Coverage checking is a way of applying a metric to a test-suite: does the
test-suite test every line in a program?
Stricter coverage checking ensures every condition in a conditional is also
tested
In a heterogeneous environment, a single source line may be compiled for
different accelerator cores
Each accelerator core may execute the source line in a slightly different way
• How do we define coverage in an accelerator model?
• How do we test coverage in an accelerator model?
• If a SIMT compiler has transformed code, what does coverage mean?
22

Coding guidelines: MISRA C++
Standardized coding guidelines for writing safe software. Can be checked
with source code static checker tools
• Originated by the automotive industry, for the automotive industry
• But is applicable to any industry that requires high-integrity software
• Originally, Misra suggests (in its vision) its use in safety-related software
• But now suggests (in its vision) its applicability to any application with
high integrity or high reliability requirements
The MISRA C++ group is updating the MISRA C++ standard to support
accelerator programming, in collaboration with AUTOSAR. Being written as
an update to MISRA C++ 2008. This is where the AI and SYCL accelerator
support will go for autonomous driving coding guidelines
23

Dynamic memory management
Accelerator programming models rely extensively on dynamic memory
management
This is a real challenge for AI accelerators: how to define a standard way of
statically-allocating memory for AI acceleration
How to free memory safely in a fault situation
How to isolate different safety domains in a program without corruption
between memory allocated in different safety domains
24

Accelerator memory management
• Accelerators have a much more direct view of memory than a CPU
• The simplest approach is pinned memory: at a known physical address
• Accelerators have much simpler memory protection than a CPU
25
CPU
Virtual memory management system
Operating
System
Physical Memory (e.g. DDR)Storage (e.g. hard disk)
Accelerator
(There maybe a
memory
management unit
here, but usually
much simpler than
for a CPU)

CPU
Hypervisor
Virtualization
Virtualization is well-defined for CPUs and can contribute to safety isolation
But for accelerators, virtualization is not clearly-defined
Can’t switch instantly between accelerator threads. Can’t shut down
accelerator thread instantly. Memory protection isn’t same as on CPU
26
Virtual memory management system
Operating
System
Physical Memory (e.g. DDR)Storage (e.g. hard disk)
AcceleratorVirtualization
goes here
How does
virtualization
go here?

Package non-safety-qualified
systems via decomposition
•ISO 26262 defines “Quality Managed”
(“QM”)
•These systems can adopt latest
technologies, without developed to full
safety standards
•We can wrap QM systems inside ASIL
systems
•We need to monitor the running of the
system and be able to shut down a faulty
system
•Requires ability to detect failures
Build full safety-qualified
systems
•Build from the ground up: Safe
RTOS that supports accelerators
•Safe programming models
•Safety analysis tools
•Independent testing and
validation
Multiple, independent,
redundant systems
•If we independently develop
systems to perform specific tasks,
we can achieve fully safe
redundancy
Pragmatic solutions
27
OutputInput QM AI
System
ASIL B Monitoring
system
Safety monitoring for AI
Safe heterogeneous
programming tools
Safe RTOS
CPU AI Accelerator
Combine
& check
results
System
#1
System
#2
Dev
Team #1
Dev
Team #2

Summary
1. We need to use a range of AI accelerators to achieve AI in automotive.
• We can’t just assume CPU safety processes can easily transfer to accelerators
• We need all the tools we have for safety on CPUs brought to accelerators
2. There are a lot of unexpected challenges
3. Standards are critical for building out these tools and ecosystem
• There are industry-wide standards being developed, but we need to get more
people involved to deliver safe solutions
28

About Codeplay
Accelerator silicon
enablement
•OpenCL and Vulkan
implementations with
ComputeAorta product for
customers’ processors
•Custom LLVM compiler
back-ends and runtime
drivers
•Accelerator processor
optimizations
Open accelerator ecosystem
•Open standards and open-
source ecosystem for AI
acceleration
•SYCL ecosystem: the open
alternative ecosystem to
CUDA
•TensorFlow, Eigen
•SYCL-BLAS, SYCL-DNN,
SYCL-M
•Open-source accelerator
libraries: clSPV, SPIR-V
tools
Automotive AI tools
•Support for Renesas R-Car
and Imagination
Technologies PowerVR
•Optimized SYCL-BLAS and
SYCL-DNN libraries for
automotive AI processors
•Profiler to analyse
performance
•Working towards ISO
26262 ASIL B standards-
based acceleration
29
70+ expert AI and graphics acceleration engineers in Edinburgh, Scotland, UK
Ready to provide all the tech & services to deliver ground-breaking AI technologies

Resource
30
SYCL standard & ecosystem
http://sycl.tech/
MISRA
MISRA C and C++ standards body
https://www.misra.org.uk/
Codeplay automotive tools
https://developer.codeplay.com/home/
Codeplay booth
See our tools on Renesas and Imagination
Technologies ADAS accelerator processors
Khronos Workshop at EVS
Will cover OpenVX, Vulkan, OpenCL, NNEF
and SYCL in much more detail
Thursday May 23rd, 9am-5pm
https://www.khronos.org/events/2019-
embedded-vision-summit

Backup

Tesla FSD chip
mm2 GOPS
GPU 40.9 600
CPU 22.1 211
NNA 15.4 72,000
SRAM 67.6
Cache 18.6
Total 260 72,811
NNA
•Fast, low-precision
convolutions
SRAM
•Needed to keep
processors supplied
with data
CPU
•Highly general-
purpose at lower
performance
GPU
•Most of the
programmable
performance
https://www.youtube.com/watch?v=Ucp0TTmvqOE

From sensing to control
Car control
Path
planning
Trajectory
tracking
Sensor
fusion
3D mapping
Semantic
segmentation
Frame
capture
Camera
33
• These systems typically operate at 15-25 frames per second (depending
on maximum speed and safety requirements)
• Roughly 8 input frames are required to make a processing decision
• Includes tracking movement over several frames
• Includes pipelining for higher throughput
• At 70mph (112 km/h), braking distance is 75 m and “thinking distance”
(for a human) is 21 m, or 1.5 seconds

Car controlPath planning
Trajectory
tracking
Sensor fusion3D mapping
Semantic
segmentation
Frame captureCamera
Frame capture
If a camera can capture a
compete view of a 2m
pedestrian at 2m distance,
then a pedestrian at a 100m
distance will cover no more
than 1/50th the height of the
image, or 1/2,500th of the
area of the image.
34
2m 2m
100m
2m
If an algorithm can recognize a pedestrian with 100 pixels, the camera must be 25
megapixels to recognize a pedestrian at 100m, which is required to drive at 70mph

Trajectory
tracking
Semantic
segmentation
Frame captureCamera
Semantic segmentation
60-300GFLOPS per
frame
At 25fps = 1.5TFLOPS to
7.5TFLOPS, but for inference
can often be doing in fixed,
point, which is TOPS, not
TFLOPS
35
Recurrent Segmentation for Variable Computational Budgets: Stanford
University & Google Brain: L McIntosh, N Maheswaranathan
D Sussillo, J Shlens, arXiv:1711.10151v2 [cs.CV] 15 Mar 2018

Trajectory
tracking
Semantic
segmentation
Frame captureCamera
3D Mapping
Each sensor (cameras, LIDAR, Radar)
and each perception algorithm (deep
learning, SLAM, point cloud, etc)
needs to generate a 3D map of the
environment it detects and a list of
objects (pedestrians, cars etc) to
track)
36
A 100 m × 100 m × 10 m
occupancy grid of 100 cm ×
100 cm x 100 cm cells
contains 100,000,000 cells
updated every frame

Trajectory
tracking
Semantic
segmentation
Frame captureCamera
Sensor Fusion
Sensor fusion combines data from all
sensors and perception algorithms.
It detects inconsistencies between
different sensors to detect errors
This is where the redundancy in the
sensors is used to achieve safety. But
how do you achieve redundancy in
the sensor fusion?
37
Needs to process all data
from all perception
algorithms combined

To achieve performance, create a pipeline
Object
trajectory
tracking/
prediction
Semantic
segmentation
Frame captureCamera
38
• To achieve maximum throughput, this will be pipelined
• It can also take at least 3 frames to track movement
Path planning
Object
trajectory
tracking/
prediction
Semantic
segmentation
Frame captureCamera
Object
trajectory
tracking/
prediction
Semantic
segmentation
Frame captureCamera

"Can We Have Both Safety and Performance in AI for Autonomous Vehicles?," a Presentation from Codeplay Software

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to "Can We Have Both Safety and Performance in AI for Autonomous Vehicles?," a Presentation from Codeplay Software

Similar to "Can We Have Both Safety and Performance in AI for Autonomous Vehicles?," a Presentation from Codeplay Software (20)

More from Edge AI and Vision Alliance

More from Edge AI and Vision Alliance (20)

Recently uploaded

Recently uploaded (20)

"Can We Have Both Safety and Performance in AI for Autonomous Vehicles?," a Presentation from Codeplay Software