SlideShare ist ein Scribd-Unternehmen logo
1 von 26
Downloaden Sie, um offline zu lesen
© 2021 The Khronos Group Inc.
Khronos Standards:
Powering the Future of
Embedded Vision
Neil Trevett, Khronos President
NVIDIA VP Developer Ecosystems
© 2021 The Khronos Group Inc.
Open, royalty-free interoperability
standards to harness the power of
GPU, multiprocessor and XR hardware
3D graphics, augmented and virtual
reality, parallel programming,
inferencing and vision acceleration
Non-profit, member-driven standards
organization, open to any company
Well-defined multi-company
governance and IP Framework
Founded in 2000
>150 Members ~ 40% US, 30% Europe, 30% Asia
Khronos Connects Software to Silicon
© 2021 The Khronos Group Inc.
Khronos Active Initiatives
3D Graphics
Desktop, Mobile
and Web
3D Assets
Authoring
and Delivery
Portable XR
Augmented and
Virtual Reality
Parallel Computation
Vision, Inferencing,
Machine Learning
Safety Critical APIs
© 2021 The Khronos Group Inc.
Khronos Compute Acceleration Standards
GPU
GPU rendering +
compute
acceleration
Heterogeneous
compute
acceleration
Single source C++ programming
with compute acceleration
Graph-based vision and
inferencing acceleration
Intermediate
Representation
(IR) supporting
parallel execution
and graphics
Higher-level
Languages and APIs
Streamlined development
and performance portability
GPU
FPGA DSP
Custom Hardware
GPU
CPU
CPU
CPU
AI/Tensor HW
Increasing industry interest in
parallel compute acceleration to
combat the ‘End of Moore’s Law’
SYCL and SPIR were
originally OpenCL projects
Lower-level
Languages and APIs
Direct Hardware Control
© 2021 The Khronos Group Inc.
OpenCL – Low-level Parallel Programing
Complements GPU-only APIs
Simpler programming model
Relatively lightweight run-time
More language flexibility, e.g., pointers
Rigorously defined numeric precision
OpenCL
Kernel
Code
OpenCL
Kernel
Code
OpenCL
Kernel
Code
OpenCL C
Kernel
Code
GPU
DSP
CPU
CPU
FPGA
OpenCL
Devices
Host
CPU
NN HW
Runtime OpenCL API to
compile, load and execute
kernels across devices
Programming and Runtime Framework
for Application Acceleration
Offload compute-intensive kernels onto parallel
heterogeneous processors
CPUs, GPUs, DSPs, FPGAs, Tensor Processors
OpenCL C or C++ kernel languages
Platform Layer API
Query, select and initialize compute devices
Runtime API
Build and execute kernels programs on multiple devices
Explicit Application Control
Which programs execute on what device
Where data is stored in memories in the system
When programs are run, and what operations are
dependent on earlier operations
© 2021 The Khronos Group Inc.
OpenCL Open-Source Project Momentum
Tripling in the last four years
© 2021 The Khronos Group Inc.
OpenCL is Widely Deployed and Used
Accelerated Implementations
Modo
Desktop Creative Apps
Math and Physics
Libraries
Vision, Imaging
and Video Libraries
The industry’s most pervasive, cross-vendor, open standard for
low-level heterogeneous parallel programming
Arm Compute Library
SYCL-DNN
Machine Learning
Libraries and Frameworks
TI DL Library (TIDL)
VeriSilicon
Xiaomi
clDNN
Intel
Intel
Synopsis
MetaWare EV
NNAPI
https://en.wikipedia.org/wiki/List_of_OpenCL_applications
Vegas Pro
ForceBalance
Molecular Modelling Libraries
Machine Learning
Compilers
Parallel
Languages
CLBlast
SYCL-BLAS
Linear Algebra
and FFT Libraries
VkFFT
© 2021 The Khronos Group Inc.
ML Compiler Steps
1.Import Trained
Network Description
2. Graph-level optimizations
e.g., node fusion, node
lowering and memory tiling
3. Decompose to primitive
instructions and emit programs
for accelerated run-times
Consistent Steps
Fast progress but still area of intense research
If compiler optimizations are effective - hardware accelerator APIs can stay ‘simple’ and
won’t need complex metacommands (e.g., combined primitive commands like DirectML)
Embedded NN Compilers
CEVA Deep Neural Network (CDNN)
Cadence Xtensa Neural Network Compiler (XNNC)
© 2021 The Khronos Group Inc.
OpenCL 3.0
OpenCL C:
- kernels,
- address spaces,
- special types,
...
Most of C++17:
- inheritance,
- templates,
- type deduction,
...
C++ for OpenCL
Increased Ecosystem Flexibility
All functionality beyond OpenCL 1.2 queryable plus
macros for optional OpenCL C language features
New extensions that become widely adopted will be
integrated into new OpenCL core specifications
OpenCL C++ for OpenCL
Open-source C++ for OpenCL front end compiler
combines OpenCL C and C++17 replacing
OpenCL C++ language specification
Unified Specification
All versions of OpenCL in one specification for easier
maintenance, evolution and accessibility
Source on Khronos GitHub for community feedback,
functionality requests and bug fixes
Moving Applications to OpenCL 3.0
OpenCL 1.2 applications – no change
OpenCL 2.X applications - no code changes if all used
functionality is present
Queries recommended for future portability
C++ for OpenCL
Supported by Clang and uses the LLVM
compiler infrastructure
OpenCL C code is valid and fully compatible
Supports most C++17 features
Generates SPIR-V kernels
© 2021 The Khronos Group Inc.
OpenCL 3.0 Adoption
OpenCL 3.0
Adopters
OpenCL 3.0
Adopters Already
Shipping
Conformant
Implementations
Product Conformance Status
https://www.khronos.org/conformance/adopters/conformant-products/opencl
© 2021 The Khronos Group Inc.
Asynchronous DMA Extensions
OpenCL embraces a new class of Embedded Processors
Many DSP-like devices have Direct Memory Access hardware
Transfer data between global and local memories via DMA transactions
Transactions run asynchronously in parallel to device compute enabling wait for transactions to complete
Multiple transactions can be queued to run concurrently or in order via fences
OpenCL abstracts DMA capabilities via extended asynchronous workgroup copy built-ins
(New!) 2- and 3-dimensional async workgroup copy extensions support complex memory transfers
(New!) async workgroup fence built-in controls execution order of dependent transactions
New extensions complement the existing 1-dimensional async workgroup copy built-ins
Async Fence controls order of dependent transactions
All transactions prior to async_fence must complete
before any new transaction starts, without a
synchronous wait
async_copy1
async_copy2
async_fence
async_copy3
Async 3D-3D Copy Transaction
Copy
Transaction
Reshaping possible
Vglobal = Vlocal
Volume
global
Volume
local
The first of significant upcoming advances in OpenCL to
enhance support for embedded processors
© 2021 The Khronos Group Inc.
Layered OpenCL Implementations
DX12
Runtime
Clang+LLVM+
SPIR-V LLVM
OpenCL C or
C++ for OpenCL
Kernel Sources
OpenCL
Application
Host Code
CLOn12 Run-time
API Translator
Mesa SPIR-V
to DXIL
DXIL
OpenCLon12
OpenCL over DX12
https://github.com/microsoft/OpenCLOn12
Translates through
MESA’s NIR Intermediate
Representation
OpenCL
SPIR-V
Clang+Clspv
Compiler
OpenCL C or
C++ for OpenCL
Kernel Sources
Vulkan
Runtime
OpenCL
Application
Host Code
Clvk run-time
API Translator
Vulkan
SPIR-V
clspv + clvk
OpenCL over Vulkan
https://github.com/google/clspv
https://github.com/kpet/clvk
OpenCLOn12
Microsoft and COLLABORA
GPU-accelerated OpenCL on any DX12 PC and
Cloud instance (x86 or Arm)
Leverages Clang/LLVM AND MESA
OpenGLOn12 – OpenGL 3.3 over DX12 is
already conformant
clspv + clvk
clspv - Google’s open-source OpenCL kernel to
Vulkan SPIR-V compiler
Tracks top-of-tree LLVM and Clang - not a fork
Clvk – prototype open-source OpenCL to Vulkan
run-time API translator
Used by shipping apps and engines on Android
e.g., Adobe Premiere Rush video editor – 200K lines of
OpenCL C kernel code
DXIL
© 2021 The Khronos Group Inc.
SPIR-V Language Ecosystem
OpenCL C
C++ for OpenCL
clspv
triSYCL
Intel DPC++
Codeplay
ComputeCpp
LLVM
Clang
SYCL
SPIR-V LLVM
IR Translator
Khronos Open Source
3rd Party Open Source
Language Definitions
Closed Source
OpenCLon12
Inc. Mesa SPIR-V
to DXIL
SPIRV-Cross
GLSL
HLSL
Metal
Shading
Language
glslang
GLSL
HLSL DXC
DXIL
SPIR-V Tools
(Dis)Assembler
Validator
Optimize/Remap
Fuzzer
Reducer
OpenCL C
Online
Compilation
IREE
SPIR-V enables a rich ecosystem of languages and compilers to
target low-level APIs such as Vulkan and OpenCL, including
deployment flexibility: e.g., running OpenCL kernels on Vulkan
Support for SPIR-V
Environment Specs
OpenCL
Vulkan
C++ for OpenCL
(using
cl_ext_cxx_for_opencl)
© 2021 The Khronos Group Inc.
SYCL Single Source C++ Parallel Programming
GPU
FPGA DSP
Custom Hardware
GPU
CPU
CPU
CPU
Standard C++
Application
Code
C++
Libraries
ML
Frameworks
C++ Template
Libraries
C++ Template
Libraries
C++ Template
Libraries
SYCL
Compiler
CPU
Compiler
CPU
One-MKL
One-DNN
OneDPC
SYCL-BLAS
SYCL-Eigen
SYCL-DNN
SYCL Parallel STL
...
C++ templates and lambda
functions separate host &
accelerated device code
Accelerated code
passed into device
OpenCL compilers
Complex ML frameworks
can be directly compiled
and accelerated
SYCL is ideal for accelerating larger
C++-based engines and applications
with performance portability
C++ Kernel Fusion can
give better performance
on complex apps and libs
than hand-coding
AI/Tensor HW
GPU
FPGA DSP
Custom Hardware
GPU
CPU
CPU
CPU
AI/Tensor HW
Other
Backends
© 2021 The Khronos Group Inc.
Expressiveness and simplicity for heterogeneous programming in modern C++
Closer alignment and integration with ISO C++ to simplify porting of standard C++ applications
Improved programmability, smaller code size, faster performance
Based on C++17, backwards compatible with SYCL 1.2.1
Backend acceleration API independent
New Features
Unified Shared Memory | Parallel Reductions | Subgroup Operations | Class template Argument Deduction
Significant SYCL adoption in Embedded, Desktop and HPC Markets
SYCL 2020 Launched February 2021
© 2021 The Khronos Group Inc.
SYCL Implementations in Development
Multiple Backends in Development
SYCL beginning to be supported on multiple
low-level APIs in addition to OpenCL
e.g., ROCm and CUDA
For more information: http://sycl.tech
Source Code
DPC++
Uses LLVM/Clang
Part of oneAPI
ComputeCpp
Multiple
Backends
triSYCL
Open source
test bed
hipSYCL
CUDA and
HIP/ROCm
Any CPU
OpenCL +
SPIR-V
Any CPU
OpenCL +
SPIR(-V)
OpenCL+PTX
Intel CPUs
Intel GPUs
Intel FPGAs
Intel CPUs
Intel GPUs
Intel FPGAs
AMD GPUs
(depends on driver stack)
Arm Mali
IMG PowerVR
Renesas R-Car
NVIDIA GPUs
OpenMP
OpenCL +
SPIR/LLVM
XILINX FPGAs
POCL
(open-source OpenCL
supporting CPUs and NVIDIA
GPUs and more)
Any CPU
Experimental
OpenMP
ROCm
CUDA
AMD GPUs
NVIDIA GPUs
Any CPU
CUDA+PTX
NVIDIA GPUs
SYCL enables Khronos to
influence ISO C++ to (eventually)
support heterogeneous compute
SYCL, OpenCL and SPIR-V, as open industry
standards, enable flexible integration and
deployment of multiple acceleration technologies
VEO
Intel CPUs
NEC VEs
neoSYCL
SX-AURORA
TSUBASA
© 2021 The Khronos Group Inc.
The Origin of OpenVX
Engines and
Applications
GPU
3D Graphics API
Driver
HW CPU
DSP
GPU
Vision API
Driver
Engines and
Applications
Driver Model
An open API standard
enables multiple silicon
vendors to ship drivers
with their silicon
Silicon vendors can
aggressively optimize
drivers for their own
silicon architecture
OpenVX is the industry’s
only API standard enabling
portable access to vendor-
optimized vision drivers
High-level Abstraction
3D graphics is always
accelerated by a GPU – so a
low-level GPU-centric API
still provides cross-vendor
portability
Vision processing can be
accelerated by a wide
variety of hardware
architectures
OpenVX needs a higher-
level graph abstraction to
enable optimized cross-
vendor drivers
Vision
Node
Vision
Node
Vision
Node
Vision
Node
Vision Processing Graph
© 2021 The Khronos Group Inc.
OpenVX Cross-Vendor Vision and Inferencing
High-level graph-based abstraction for portable, efficient vision processing
Optimized OpenVX drivers created, optimized and shipped by processor vendors
Implementable on almost any hardware or processor with performance portability
Graph can contain vision processing and NN nodes for global optimization
Run-time graph execution need very little host CPU interaction
Vision
Node
Vision
Node
Vision
Node
Downstream
Application
Processing
Native
Camera
Control CNN Nodes
NNEF Import converts a trained Neural
Network into OpenVX Graph
Layers are represented as OpenVX nodes
OpenVX Graph
Open
Source
Convertors
https://github.com/KhronosGroup/NNEF-Tools
Stable Specification
Open-Source
Projects
Vendors optimize and ship drivers
for their platform
Full list of conformant OpenVX implementations here:
https://www.khronos.org/conformance/adopters/conformant-products/openvx
© 2021 The Khronos Group Inc.
OpenVX Efficiency through Graphs..
Reuse
pre-allocated
memory for
multiple
intermediate data
Memory
Management
Less allocation overhead,
more memory for
other applications
Replace a sub-
graph with a single
faster node
Kernel
Fusion
Better memory
locality, less kernel
launch overhead
Split graph
execution across
the whole system:
CPU / GPU /
dedicated HW
Graph
Scheduling
Faster execution
or lower power
consumption
Execute a sub-
graph at tile
granularity instead
of image
granularity
Data
Tiling
Better use of
data cache and
local memory
Performance comparable to hand-optimized, non-portable code
Real, complex applications on real-world hardware
Much lower development effort and higher portability than hand-optimized code
© 2021 The Khronos Group Inc.
OpenVX 1.3 and Extensibility
OpenCL Command Queue
Application
cl_mem buffers
Fully asynchronous host-device
operations during data exchange
OpenVX data objects
Runtime
Runtime Map or copy OpenVX data objects
into cl_mem buffers
Copy or export
cl_mem buffers into OpenVX data
objects
OpenVX user-kernels can access command
queue and cl_mem objects to asynchronously
schedule OpenCL kernel execution
OpenVX/OpenCL
Interop
OpenVX 1.3 core specification defines market-targeted feature sets
Baseline Graph Infrastructure (enables other Feature Sets)
Default Vision Functions
Enhanced Vision Functions
Neural Network Inferencing (including tensor objects)
NNEF Kernel import (including tensor objects)
Binary Images
Safety Critical (reduced features and graph import for easier safety certification)
OpenVX is Extensible
Fully accelerated custom nodes can be integrated into the OpenVX graph with OpenCL interop
© 2021 The Khronos Group Inc.
Open Source OpenVX & Samples
Open Source OpenVX Tutorial and Code Samples
https://github.com/rgiduthuri/openvx_tutorial
https://github.com/KhronosGroup/openvx-samples
Fully Conformant
Open Source OpenVX 1.3
for Raspberry Pi
Raspberry Pi 3 and 4 Model B with Raspbian OS
Memory access optimization via tiling/chaining
Highly optimized kernels on multimedia instruction set
Automatic parallelization for multicore CPUs and GPUs
Automatic merging of common kernel sequences
Check out the OpenVX 1.3 Session
here at Embedded Vision Summit for more details!
© 2021 The Khronos Group Inc.
Sensor Data
Training Data
Trained
Networks
Neural Network
Training
C++ Application
Code
APIs for Embedded Compute
Compilation Ingestion
FPGA
DSP
Dedicated
Hardware
GPU
Vision / Inferencing
Engine
Compiled
Code
Hardware Acceleration APIs
Diverse Embedded Hardware
Multi-core CPUs, GPUs
DSPs, FPGAs, Tensor Cores
* Vulkan only runs on GPUs
Applications link to compiled
inferencing code or call
vision/inferencing API
Networks trained on high-end
desktop and cloud systems
Open industry standards, enable
flexible integration and deployment
of multiple acceleration technologies
© 2021 The Khronos Group Inc.
Need for Embedded Camera API Standards
Increasing Sensor Diversity
Including camera arrays and
depth sensors such as Lidar
Sophisticated Sensor Processing
Including inferencing. Sensor streams need to
be efficiently generated and fed into
acceleration APIs and processors
Multiple Sensors Per System
Synchronization and coordination
become essential
Proprietary Interfaces
Vendor-specific APIs to control
cameras, sensors and
close-to-sensor ISPs
Cost and time to integrate and utilize
sensors in embedded systems is a major
constraint on innovation and efficiency in
the embedded vision market
© 2021 The Khronos Group Inc.
Benefits of Embedded Camera API Standard
Embedded System
Camera
Sensor and
ISP
Vision and
Inferencing
Acceleration
Sensor
Stream
Application controlling sensor stream
generation and processing in real time
Choice of
multiple open
standard, cross-
vendor APIs
No widely
adopted, open
standard, cross-
vendor APIs
An effective open, cross-vendor open
standard for camera, sensor and ISP
control could provide multiple benefits
Cross-vendor portability of camera/sensor code
for easier system integration of new sensors
Preservation of application code across
multiple generations of cameras and sensors
Sophisticated control over sensor stream
generation increases effectiveness of
downstream accelerated processing
Development of Camera and sensor APIs may also generate new
requirements for downstream vision and inferencing acceleration APIs
© 2021 The Khronos Group Inc.
Embedded Camera API Exploratory Group
Embedded Camera API
Exploratory Group
Hosted by EMVA and Khronos
Online discussion forum and weekly Zoom
calls, probably for a few months
Discuss industry requirements
for open, royalty-free camera API(s)
No detailed design activity
to protect participants IP
Explore if consensus can be built around an
agreed Scope of Work document
Discuss what standardization activities can
best execute actions in the Scope of Work
Any company is
welcome to join
No cost or IP
Licensing obligations
Project NDA to cover
Exploratory Group
Discussions
Scope of
Work
Document
Agreed SOW
document released
from NDA and
made public
Agreement with
standardization bodies
and/or open source
projects on
initiative(s) to
execute the SOW
under proven
processes and IP
Frameworks
Proven Khronos Process to ensuring
industry requirements are fully
understood before starting
standardization initiatives
Join and
get involved!
https://www.khronos.org/embedded-camera/#getinvolved
© 2021 The Khronos Group Inc.
Khronos for Global Industry Collaboration
Khronos membership is open
to any company
Influence the design and direction
of key open standards that will
drive your business
Accelerate time-to-market with
early access to specification drafts
Provide industry thought
leadership and gain insights into
industry trends and directions
Benefit from Adopter discounts
www.khronos.org/members/
www.khronos.org

Weitere ähnliche Inhalte

Was ist angesagt?

Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Linaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
Linaro
 
SDVIs and In-Situ Visualization on TACC's Stampede
SDVIs and In-Situ Visualization on TACC's StampedeSDVIs and In-Situ Visualization on TACC's Stampede
SDVIs and In-Situ Visualization on TACC's Stampede
Intel® Software
 

Was ist angesagt? (20)

Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
 
"APIs for Accelerating Vision and Inferencing: An Industry Overview of Option...
"APIs for Accelerating Vision and Inferencing: An Industry Overview of Option..."APIs for Accelerating Vision and Inferencing: An Industry Overview of Option...
"APIs for Accelerating Vision and Inferencing: An Industry Overview of Option...
 
"An Update on Open Standard APIs for Vision Processing," a Presentation from ...
"An Update on Open Standard APIs for Vision Processing," a Presentation from ..."An Update on Open Standard APIs for Vision Processing," a Presentation from ...
"An Update on Open Standard APIs for Vision Processing," a Presentation from ...
 
Challenges for Deploying a High-Performance Computing Application to the Cloud
Challenges for Deploying a High-Performance Computing Application to the CloudChallenges for Deploying a High-Performance Computing Application to the Cloud
Challenges for Deploying a High-Performance Computing Application to the Cloud
 
Intel Developer Program
Intel Developer ProgramIntel Developer Program
Intel Developer Program
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
Redfish & python redfish
Redfish & python redfishRedfish & python redfish
Redfish & python redfish
 
Intel the-latest-on-ofi
Intel the-latest-on-ofiIntel the-latest-on-ofi
Intel the-latest-on-ofi
 
"Current and Planned Standards for Computer Vision and Machine Learning," a P...
"Current and Planned Standards for Computer Vision and Machine Learning," a P..."Current and Planned Standards for Computer Vision and Machine Learning," a P...
"Current and Planned Standards for Computer Vision and Machine Learning," a P...
 
Overview of Intel® Omni-Path Architecture
Overview of Intel® Omni-Path ArchitectureOverview of Intel® Omni-Path Architecture
Overview of Intel® Omni-Path Architecture
 
BXI: Bull eXascale Interconnect
BXI: Bull eXascale InterconnectBXI: Bull eXascale Interconnect
BXI: Bull eXascale Interconnect
 
SDVIs and In-Situ Visualization on TACC's Stampede
SDVIs and In-Situ Visualization on TACC's StampedeSDVIs and In-Situ Visualization on TACC's Stampede
SDVIs and In-Situ Visualization on TACC's Stampede
 
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaArm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
 
Best Practices and Performance Studies for High-Performance Computing Clusters
Best Practices and Performance Studies for High-Performance Computing ClustersBest Practices and Performance Studies for High-Performance Computing Clusters
Best Practices and Performance Studies for High-Performance Computing Clusters
 
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
 
Intel python 2017
Intel python 2017Intel python 2017
Intel python 2017
 
Accelerate Big Data Processing with High-Performance Computing Technologies
Accelerate Big Data Processing with High-Performance Computing TechnologiesAccelerate Big Data Processing with High-Performance Computing Technologies
Accelerate Big Data Processing with High-Performance Computing Technologies
 
Simulation Directed Co-Design from Smartphones to Supercomputers
Simulation Directed Co-Design from Smartphones to SupercomputersSimulation Directed Co-Design from Smartphones to Supercomputers
Simulation Directed Co-Design from Smartphones to Supercomputers
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening Keynote
 
Lenovo HPC Strategy Update
Lenovo HPC Strategy UpdateLenovo HPC Strategy Update
Lenovo HPC Strategy Update
 

Ähnlich wie “Khronos Group Standards: Powering the Future of Embedded Vision,” a Presentation from the Khronos Group

"APIs for Accelerating Vision and Inferencing: Options and Trade-offs," a Pre...
"APIs for Accelerating Vision and Inferencing: Options and Trade-offs," a Pre..."APIs for Accelerating Vision and Inferencing: Options and Trade-offs," a Pre...
"APIs for Accelerating Vision and Inferencing: Options and Trade-offs," a Pre...
Edge AI and Vision Alliance
 
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono..."The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
Edge AI and Vision Alliance
 
Linux @ IBM © 2003 IBM Corporation
Linux @ IBM © 2003 IBM Corporation Linux @ IBM © 2003 IBM Corporation
Linux @ IBM © 2003 IBM Corporation
webhostingguy
 

Ähnlich wie “Khronos Group Standards: Powering the Future of Embedded Vision,” a Presentation from the Khronos Group (20)

OpenCL Overview Japan Virtual Open House Feb 2021
OpenCL Overview Japan Virtual Open House Feb 2021OpenCL Overview Japan Virtual Open House Feb 2021
OpenCL Overview Japan Virtual Open House Feb 2021
 
"APIs for Accelerating Vision and Inferencing: Options and Trade-offs," a Pre...
"APIs for Accelerating Vision and Inferencing: Options and Trade-offs," a Pre..."APIs for Accelerating Vision and Inferencing: Options and Trade-offs," a Pre...
"APIs for Accelerating Vision and Inferencing: Options and Trade-offs," a Pre...
 
“Khronos Standard APIs for Accelerating Vision and Inferencing,” a Presentati...
“Khronos Standard APIs for Accelerating Vision and Inferencing,” a Presentati...“Khronos Standard APIs for Accelerating Vision and Inferencing,” a Presentati...
“Khronos Standard APIs for Accelerating Vision and Inferencing,” a Presentati...
 
“Open Standards: Powering the Future of Embedded Vision,” a Presentation from...
“Open Standards: Powering the Future of Embedded Vision,” a Presentation from...“Open Standards: Powering the Future of Embedded Vision,” a Presentation from...
“Open Standards: Powering the Future of Embedded Vision,” a Presentation from...
 
"Update on Khronos Standards for Vision and Machine Learning," a Presentation...
"Update on Khronos Standards for Vision and Machine Learning," a Presentation..."Update on Khronos Standards for Vision and Machine Learning," a Presentation...
"Update on Khronos Standards for Vision and Machine Learning," a Presentation...
 
SYCL 2020 Specification
SYCL 2020 SpecificationSYCL 2020 Specification
SYCL 2020 Specification
 
Coscup2018 itri android-in-cloud
Coscup2018 itri android-in-cloudCoscup2018 itri android-in-cloud
Coscup2018 itri android-in-cloud
 
The DevOps paradigm - the evolution of IT professionals and opensource toolkit
The DevOps paradigm - the evolution of IT professionals and opensource toolkitThe DevOps paradigm - the evolution of IT professionals and opensource toolkit
The DevOps paradigm - the evolution of IT professionals and opensource toolkit
 
The DevOps Paradigm
The DevOps ParadigmThe DevOps Paradigm
The DevOps Paradigm
 
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono..."The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
 
ERTS 2008 - Using Linux for industrial projects
ERTS 2008 - Using Linux for industrial projectsERTS 2008 - Using Linux for industrial projects
ERTS 2008 - Using Linux for industrial projects
 
"The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati...
"The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati..."The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati...
"The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati...
 
The Ultimate List of Opensource Software for #docker #decentralized #selfhost...
The Ultimate List of Opensource Software for #docker #decentralized #selfhost...The Ultimate List of Opensource Software for #docker #decentralized #selfhost...
The Ultimate List of Opensource Software for #docker #decentralized #selfhost...
 
Eclipse Che - A Revolutionary IDE for Distributed & Mainframe Development
Eclipse Che - A Revolutionary IDE for Distributed & Mainframe DevelopmentEclipse Che - A Revolutionary IDE for Distributed & Mainframe Development
Eclipse Che - A Revolutionary IDE for Distributed & Mainframe Development
 
KIRANKUMAR_MV
KIRANKUMAR_MVKIRANKUMAR_MV
KIRANKUMAR_MV
 
Delivering IaaS with Open Source Software
Delivering IaaS with Open Source SoftwareDelivering IaaS with Open Source Software
Delivering IaaS with Open Source Software
 
oneAPI: Industry Initiative & Intel Product
oneAPI: Industry Initiative & Intel ProductoneAPI: Industry Initiative & Intel Product
oneAPI: Industry Initiative & Intel Product
 
AWS Summit Singapore 2019 | Latest Trends for Cloud-Native Application Develo...
AWS Summit Singapore 2019 | Latest Trends for Cloud-Native Application Develo...AWS Summit Singapore 2019 | Latest Trends for Cloud-Native Application Develo...
AWS Summit Singapore 2019 | Latest Trends for Cloud-Native Application Develo...
 
XPDDS18: CPUFreq in Xen on ARM - Oleksandr Tyshchenko, EPAM Systems
XPDDS18: CPUFreq in Xen on ARM - Oleksandr Tyshchenko, EPAM SystemsXPDDS18: CPUFreq in Xen on ARM - Oleksandr Tyshchenko, EPAM Systems
XPDDS18: CPUFreq in Xen on ARM - Oleksandr Tyshchenko, EPAM Systems
 
Linux @ IBM © 2003 IBM Corporation
Linux @ IBM © 2003 IBM Corporation Linux @ IBM © 2003 IBM Corporation
Linux @ IBM © 2003 IBM Corporation
 

Mehr von Edge AI and Vision Alliance

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
Edge AI and Vision Alliance
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
Edge AI and Vision Alliance
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...
Edge AI and Vision Alliance
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
Edge AI and Vision Alliance
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
Edge AI and Vision Alliance
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
Edge AI and Vision Alliance
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara
Edge AI and Vision Alliance
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
Edge AI and Vision Alliance
 

Mehr von Edge AI and Vision Alliance (20)

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
 
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
 
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara
 
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
 
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
 
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 

“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presentation from the Khronos Group

  • 1. © 2021 The Khronos Group Inc. Khronos Standards: Powering the Future of Embedded Vision Neil Trevett, Khronos President NVIDIA VP Developer Ecosystems
  • 2. © 2021 The Khronos Group Inc. Open, royalty-free interoperability standards to harness the power of GPU, multiprocessor and XR hardware 3D graphics, augmented and virtual reality, parallel programming, inferencing and vision acceleration Non-profit, member-driven standards organization, open to any company Well-defined multi-company governance and IP Framework Founded in 2000 >150 Members ~ 40% US, 30% Europe, 30% Asia Khronos Connects Software to Silicon
  • 3. © 2021 The Khronos Group Inc. Khronos Active Initiatives 3D Graphics Desktop, Mobile and Web 3D Assets Authoring and Delivery Portable XR Augmented and Virtual Reality Parallel Computation Vision, Inferencing, Machine Learning Safety Critical APIs
  • 4. © 2021 The Khronos Group Inc. Khronos Compute Acceleration Standards GPU GPU rendering + compute acceleration Heterogeneous compute acceleration Single source C++ programming with compute acceleration Graph-based vision and inferencing acceleration Intermediate Representation (IR) supporting parallel execution and graphics Higher-level Languages and APIs Streamlined development and performance portability GPU FPGA DSP Custom Hardware GPU CPU CPU CPU AI/Tensor HW Increasing industry interest in parallel compute acceleration to combat the ‘End of Moore’s Law’ SYCL and SPIR were originally OpenCL projects Lower-level Languages and APIs Direct Hardware Control
  • 5. © 2021 The Khronos Group Inc. OpenCL – Low-level Parallel Programing Complements GPU-only APIs Simpler programming model Relatively lightweight run-time More language flexibility, e.g., pointers Rigorously defined numeric precision OpenCL Kernel Code OpenCL Kernel Code OpenCL Kernel Code OpenCL C Kernel Code GPU DSP CPU CPU FPGA OpenCL Devices Host CPU NN HW Runtime OpenCL API to compile, load and execute kernels across devices Programming and Runtime Framework for Application Acceleration Offload compute-intensive kernels onto parallel heterogeneous processors CPUs, GPUs, DSPs, FPGAs, Tensor Processors OpenCL C or C++ kernel languages Platform Layer API Query, select and initialize compute devices Runtime API Build and execute kernels programs on multiple devices Explicit Application Control Which programs execute on what device Where data is stored in memories in the system When programs are run, and what operations are dependent on earlier operations
  • 6. © 2021 The Khronos Group Inc. OpenCL Open-Source Project Momentum Tripling in the last four years
  • 7. © 2021 The Khronos Group Inc. OpenCL is Widely Deployed and Used Accelerated Implementations Modo Desktop Creative Apps Math and Physics Libraries Vision, Imaging and Video Libraries The industry’s most pervasive, cross-vendor, open standard for low-level heterogeneous parallel programming Arm Compute Library SYCL-DNN Machine Learning Libraries and Frameworks TI DL Library (TIDL) VeriSilicon Xiaomi clDNN Intel Intel Synopsis MetaWare EV NNAPI https://en.wikipedia.org/wiki/List_of_OpenCL_applications Vegas Pro ForceBalance Molecular Modelling Libraries Machine Learning Compilers Parallel Languages CLBlast SYCL-BLAS Linear Algebra and FFT Libraries VkFFT
  • 8. © 2021 The Khronos Group Inc. ML Compiler Steps 1.Import Trained Network Description 2. Graph-level optimizations e.g., node fusion, node lowering and memory tiling 3. Decompose to primitive instructions and emit programs for accelerated run-times Consistent Steps Fast progress but still area of intense research If compiler optimizations are effective - hardware accelerator APIs can stay ‘simple’ and won’t need complex metacommands (e.g., combined primitive commands like DirectML) Embedded NN Compilers CEVA Deep Neural Network (CDNN) Cadence Xtensa Neural Network Compiler (XNNC)
  • 9. © 2021 The Khronos Group Inc. OpenCL 3.0 OpenCL C: - kernels, - address spaces, - special types, ... Most of C++17: - inheritance, - templates, - type deduction, ... C++ for OpenCL Increased Ecosystem Flexibility All functionality beyond OpenCL 1.2 queryable plus macros for optional OpenCL C language features New extensions that become widely adopted will be integrated into new OpenCL core specifications OpenCL C++ for OpenCL Open-source C++ for OpenCL front end compiler combines OpenCL C and C++17 replacing OpenCL C++ language specification Unified Specification All versions of OpenCL in one specification for easier maintenance, evolution and accessibility Source on Khronos GitHub for community feedback, functionality requests and bug fixes Moving Applications to OpenCL 3.0 OpenCL 1.2 applications – no change OpenCL 2.X applications - no code changes if all used functionality is present Queries recommended for future portability C++ for OpenCL Supported by Clang and uses the LLVM compiler infrastructure OpenCL C code is valid and fully compatible Supports most C++17 features Generates SPIR-V kernels
  • 10. © 2021 The Khronos Group Inc. OpenCL 3.0 Adoption OpenCL 3.0 Adopters OpenCL 3.0 Adopters Already Shipping Conformant Implementations Product Conformance Status https://www.khronos.org/conformance/adopters/conformant-products/opencl
  • 11. © 2021 The Khronos Group Inc. Asynchronous DMA Extensions OpenCL embraces a new class of Embedded Processors Many DSP-like devices have Direct Memory Access hardware Transfer data between global and local memories via DMA transactions Transactions run asynchronously in parallel to device compute enabling wait for transactions to complete Multiple transactions can be queued to run concurrently or in order via fences OpenCL abstracts DMA capabilities via extended asynchronous workgroup copy built-ins (New!) 2- and 3-dimensional async workgroup copy extensions support complex memory transfers (New!) async workgroup fence built-in controls execution order of dependent transactions New extensions complement the existing 1-dimensional async workgroup copy built-ins Async Fence controls order of dependent transactions All transactions prior to async_fence must complete before any new transaction starts, without a synchronous wait async_copy1 async_copy2 async_fence async_copy3 Async 3D-3D Copy Transaction Copy Transaction Reshaping possible Vglobal = Vlocal Volume global Volume local The first of significant upcoming advances in OpenCL to enhance support for embedded processors
  • 12. © 2021 The Khronos Group Inc. Layered OpenCL Implementations DX12 Runtime Clang+LLVM+ SPIR-V LLVM OpenCL C or C++ for OpenCL Kernel Sources OpenCL Application Host Code CLOn12 Run-time API Translator Mesa SPIR-V to DXIL DXIL OpenCLon12 OpenCL over DX12 https://github.com/microsoft/OpenCLOn12 Translates through MESA’s NIR Intermediate Representation OpenCL SPIR-V Clang+Clspv Compiler OpenCL C or C++ for OpenCL Kernel Sources Vulkan Runtime OpenCL Application Host Code Clvk run-time API Translator Vulkan SPIR-V clspv + clvk OpenCL over Vulkan https://github.com/google/clspv https://github.com/kpet/clvk OpenCLOn12 Microsoft and COLLABORA GPU-accelerated OpenCL on any DX12 PC and Cloud instance (x86 or Arm) Leverages Clang/LLVM AND MESA OpenGLOn12 – OpenGL 3.3 over DX12 is already conformant clspv + clvk clspv - Google’s open-source OpenCL kernel to Vulkan SPIR-V compiler Tracks top-of-tree LLVM and Clang - not a fork Clvk – prototype open-source OpenCL to Vulkan run-time API translator Used by shipping apps and engines on Android e.g., Adobe Premiere Rush video editor – 200K lines of OpenCL C kernel code DXIL
  • 13. © 2021 The Khronos Group Inc. SPIR-V Language Ecosystem OpenCL C C++ for OpenCL clspv triSYCL Intel DPC++ Codeplay ComputeCpp LLVM Clang SYCL SPIR-V LLVM IR Translator Khronos Open Source 3rd Party Open Source Language Definitions Closed Source OpenCLon12 Inc. Mesa SPIR-V to DXIL SPIRV-Cross GLSL HLSL Metal Shading Language glslang GLSL HLSL DXC DXIL SPIR-V Tools (Dis)Assembler Validator Optimize/Remap Fuzzer Reducer OpenCL C Online Compilation IREE SPIR-V enables a rich ecosystem of languages and compilers to target low-level APIs such as Vulkan and OpenCL, including deployment flexibility: e.g., running OpenCL kernels on Vulkan Support for SPIR-V Environment Specs OpenCL Vulkan C++ for OpenCL (using cl_ext_cxx_for_opencl)
  • 14. © 2021 The Khronos Group Inc. SYCL Single Source C++ Parallel Programming GPU FPGA DSP Custom Hardware GPU CPU CPU CPU Standard C++ Application Code C++ Libraries ML Frameworks C++ Template Libraries C++ Template Libraries C++ Template Libraries SYCL Compiler CPU Compiler CPU One-MKL One-DNN OneDPC SYCL-BLAS SYCL-Eigen SYCL-DNN SYCL Parallel STL ... C++ templates and lambda functions separate host & accelerated device code Accelerated code passed into device OpenCL compilers Complex ML frameworks can be directly compiled and accelerated SYCL is ideal for accelerating larger C++-based engines and applications with performance portability C++ Kernel Fusion can give better performance on complex apps and libs than hand-coding AI/Tensor HW GPU FPGA DSP Custom Hardware GPU CPU CPU CPU AI/Tensor HW Other Backends
  • 15. © 2021 The Khronos Group Inc. Expressiveness and simplicity for heterogeneous programming in modern C++ Closer alignment and integration with ISO C++ to simplify porting of standard C++ applications Improved programmability, smaller code size, faster performance Based on C++17, backwards compatible with SYCL 1.2.1 Backend acceleration API independent New Features Unified Shared Memory | Parallel Reductions | Subgroup Operations | Class template Argument Deduction Significant SYCL adoption in Embedded, Desktop and HPC Markets SYCL 2020 Launched February 2021
  • 16. © 2021 The Khronos Group Inc. SYCL Implementations in Development Multiple Backends in Development SYCL beginning to be supported on multiple low-level APIs in addition to OpenCL e.g., ROCm and CUDA For more information: http://sycl.tech Source Code DPC++ Uses LLVM/Clang Part of oneAPI ComputeCpp Multiple Backends triSYCL Open source test bed hipSYCL CUDA and HIP/ROCm Any CPU OpenCL + SPIR-V Any CPU OpenCL + SPIR(-V) OpenCL+PTX Intel CPUs Intel GPUs Intel FPGAs Intel CPUs Intel GPUs Intel FPGAs AMD GPUs (depends on driver stack) Arm Mali IMG PowerVR Renesas R-Car NVIDIA GPUs OpenMP OpenCL + SPIR/LLVM XILINX FPGAs POCL (open-source OpenCL supporting CPUs and NVIDIA GPUs and more) Any CPU Experimental OpenMP ROCm CUDA AMD GPUs NVIDIA GPUs Any CPU CUDA+PTX NVIDIA GPUs SYCL enables Khronos to influence ISO C++ to (eventually) support heterogeneous compute SYCL, OpenCL and SPIR-V, as open industry standards, enable flexible integration and deployment of multiple acceleration technologies VEO Intel CPUs NEC VEs neoSYCL SX-AURORA TSUBASA
  • 17. © 2021 The Khronos Group Inc. The Origin of OpenVX Engines and Applications GPU 3D Graphics API Driver HW CPU DSP GPU Vision API Driver Engines and Applications Driver Model An open API standard enables multiple silicon vendors to ship drivers with their silicon Silicon vendors can aggressively optimize drivers for their own silicon architecture OpenVX is the industry’s only API standard enabling portable access to vendor- optimized vision drivers High-level Abstraction 3D graphics is always accelerated by a GPU – so a low-level GPU-centric API still provides cross-vendor portability Vision processing can be accelerated by a wide variety of hardware architectures OpenVX needs a higher- level graph abstraction to enable optimized cross- vendor drivers Vision Node Vision Node Vision Node Vision Node Vision Processing Graph
  • 18. © 2021 The Khronos Group Inc. OpenVX Cross-Vendor Vision and Inferencing High-level graph-based abstraction for portable, efficient vision processing Optimized OpenVX drivers created, optimized and shipped by processor vendors Implementable on almost any hardware or processor with performance portability Graph can contain vision processing and NN nodes for global optimization Run-time graph execution need very little host CPU interaction Vision Node Vision Node Vision Node Downstream Application Processing Native Camera Control CNN Nodes NNEF Import converts a trained Neural Network into OpenVX Graph Layers are represented as OpenVX nodes OpenVX Graph Open Source Convertors https://github.com/KhronosGroup/NNEF-Tools Stable Specification Open-Source Projects Vendors optimize and ship drivers for their platform Full list of conformant OpenVX implementations here: https://www.khronos.org/conformance/adopters/conformant-products/openvx
  • 19. © 2021 The Khronos Group Inc. OpenVX Efficiency through Graphs.. Reuse pre-allocated memory for multiple intermediate data Memory Management Less allocation overhead, more memory for other applications Replace a sub- graph with a single faster node Kernel Fusion Better memory locality, less kernel launch overhead Split graph execution across the whole system: CPU / GPU / dedicated HW Graph Scheduling Faster execution or lower power consumption Execute a sub- graph at tile granularity instead of image granularity Data Tiling Better use of data cache and local memory Performance comparable to hand-optimized, non-portable code Real, complex applications on real-world hardware Much lower development effort and higher portability than hand-optimized code
  • 20. © 2021 The Khronos Group Inc. OpenVX 1.3 and Extensibility OpenCL Command Queue Application cl_mem buffers Fully asynchronous host-device operations during data exchange OpenVX data objects Runtime Runtime Map or copy OpenVX data objects into cl_mem buffers Copy or export cl_mem buffers into OpenVX data objects OpenVX user-kernels can access command queue and cl_mem objects to asynchronously schedule OpenCL kernel execution OpenVX/OpenCL Interop OpenVX 1.3 core specification defines market-targeted feature sets Baseline Graph Infrastructure (enables other Feature Sets) Default Vision Functions Enhanced Vision Functions Neural Network Inferencing (including tensor objects) NNEF Kernel import (including tensor objects) Binary Images Safety Critical (reduced features and graph import for easier safety certification) OpenVX is Extensible Fully accelerated custom nodes can be integrated into the OpenVX graph with OpenCL interop
  • 21. © 2021 The Khronos Group Inc. Open Source OpenVX & Samples Open Source OpenVX Tutorial and Code Samples https://github.com/rgiduthuri/openvx_tutorial https://github.com/KhronosGroup/openvx-samples Fully Conformant Open Source OpenVX 1.3 for Raspberry Pi Raspberry Pi 3 and 4 Model B with Raspbian OS Memory access optimization via tiling/chaining Highly optimized kernels on multimedia instruction set Automatic parallelization for multicore CPUs and GPUs Automatic merging of common kernel sequences Check out the OpenVX 1.3 Session here at Embedded Vision Summit for more details!
  • 22. © 2021 The Khronos Group Inc. Sensor Data Training Data Trained Networks Neural Network Training C++ Application Code APIs for Embedded Compute Compilation Ingestion FPGA DSP Dedicated Hardware GPU Vision / Inferencing Engine Compiled Code Hardware Acceleration APIs Diverse Embedded Hardware Multi-core CPUs, GPUs DSPs, FPGAs, Tensor Cores * Vulkan only runs on GPUs Applications link to compiled inferencing code or call vision/inferencing API Networks trained on high-end desktop and cloud systems Open industry standards, enable flexible integration and deployment of multiple acceleration technologies
  • 23. © 2021 The Khronos Group Inc. Need for Embedded Camera API Standards Increasing Sensor Diversity Including camera arrays and depth sensors such as Lidar Sophisticated Sensor Processing Including inferencing. Sensor streams need to be efficiently generated and fed into acceleration APIs and processors Multiple Sensors Per System Synchronization and coordination become essential Proprietary Interfaces Vendor-specific APIs to control cameras, sensors and close-to-sensor ISPs Cost and time to integrate and utilize sensors in embedded systems is a major constraint on innovation and efficiency in the embedded vision market
  • 24. © 2021 The Khronos Group Inc. Benefits of Embedded Camera API Standard Embedded System Camera Sensor and ISP Vision and Inferencing Acceleration Sensor Stream Application controlling sensor stream generation and processing in real time Choice of multiple open standard, cross- vendor APIs No widely adopted, open standard, cross- vendor APIs An effective open, cross-vendor open standard for camera, sensor and ISP control could provide multiple benefits Cross-vendor portability of camera/sensor code for easier system integration of new sensors Preservation of application code across multiple generations of cameras and sensors Sophisticated control over sensor stream generation increases effectiveness of downstream accelerated processing Development of Camera and sensor APIs may also generate new requirements for downstream vision and inferencing acceleration APIs
  • 25. © 2021 The Khronos Group Inc. Embedded Camera API Exploratory Group Embedded Camera API Exploratory Group Hosted by EMVA and Khronos Online discussion forum and weekly Zoom calls, probably for a few months Discuss industry requirements for open, royalty-free camera API(s) No detailed design activity to protect participants IP Explore if consensus can be built around an agreed Scope of Work document Discuss what standardization activities can best execute actions in the Scope of Work Any company is welcome to join No cost or IP Licensing obligations Project NDA to cover Exploratory Group Discussions Scope of Work Document Agreed SOW document released from NDA and made public Agreement with standardization bodies and/or open source projects on initiative(s) to execute the SOW under proven processes and IP Frameworks Proven Khronos Process to ensuring industry requirements are fully understood before starting standardization initiatives Join and get involved! https://www.khronos.org/embedded-camera/#getinvolved
  • 26. © 2021 The Khronos Group Inc. Khronos for Global Industry Collaboration Khronos membership is open to any company Influence the design and direction of key open standards that will drive your business Accelerate time-to-market with early access to specification drafts Provide industry thought leadership and gain insights into industry trends and directions Benefit from Adopter discounts www.khronos.org/members/ www.khronos.org