SlideShare ist ein Scribd-Unternehmen logo
1 von 23
Downloaden Sie, um offline zu lesen
Copyright © 2017 Your Company Name 1
Your Company Logo goes here.
See Slide 8 for instructions
Neil Trevett, Khronos President
VP Developer Ecosystem, NVIDIA
May2017
Vision Acceleration API Landscape:
Options and Trade-offs
© Copyright Khronos Group 2017 - Page 2© Copyright Khronos Group 2017 - Page 2© Copyright Khronos Group 2017 - Page 2© Copyright Khronos Group 2017 - Page 2
The Search for Vision Performance and Portability
Leads to a layered Ecosystem – and Graph Programming
Explicit
Kernels
Diverse Hardware
(GPUs, DSPs, FPGAs)
GPU FPGA
DSP
Dedicated
Hardware
Single Source C++
Languages
Graph
Programming
TensorRT
Code Intermediate
Representations PTX HSAIL
© Copyright Khronos Group 2017 - Page 3
© Copyright Khronos Group 2017 - Page 3
OpenVX – Performance Portable Vision
• OpenVX developers express a graph of image operations (‘Nodes’)
- Using a C API
• Nodes can be executed on any hardware or processor coded in any language
- Implementers can optimize under the high-level graph abstraction
• Graphs are the key to run-time optimization opportunities…
Array of
Keypoints
YUV
Frame
Gray
Frame
Camera
Input
Rendering
Output
Pyrt
Color
Conversion
Channel
Extract
Optical
Flow
Harris
Track
Image
Pyramid
RGB
Frame
Array of
Features
Ftrt-1OpenVX Graph
OpenVX Nodes
Feature Extraction Example Graph
© Copyright Khronos Group 2017 - Page 4© Copyright Khronos Group 2017 - Page 4© Copyright Khronos Group 2017 - Page 4© Copyright Khronos Group 2017 - Page 4
OpenVX Efficiency through Graphs..
Reuse
pre-allocated
memory for
multiple
intermediate data
Memory
Management
Less allocation overhead,
more memory for
other applications
Replace a sub-
graph with a
single faster node
Kernel
Fusion
Better memory
locality, less kernel
launch overhead
Split the graph
execution across
the whole
system: CPU /
GPU / dedicated
HW
Graph
Scheduling
Faster execution
or lower power
consumption
Execute a sub-
graph at tile
granularity
instead of image
granularity
Data
Tiling
Better use of
data cache and
local memory
© Copyright Khronos Group 2017 - Page 5© Copyright Khronos Group 2017 - Page 5© Copyright Khronos Group 2017 - Page 5© Copyright Khronos Group 2017 - Page 5
OpenVX Evolution
OpenVX 1.0
Spec released October 2014
Conformant
Implementations
OpenVX 1.1
Spec released May 2016
Conformant
Implementations
AMD OpenVX Tools
- Open source, highly
optimized for x86 CPU and
OpenCL for GPU
- “Graph Optimizer” looks at
entire processing pipeline and
removes, replaces, merges
functions to improve
performance and bandwidth
- Scripting for rapid
prototyping, without re-
compiling, at production
performance levels
http://gpuopen.com/compute-product/amd-openvx/
New Functionality
Expanded Nodes Functionality
Enhanced Graph Framework
OpenVX 1.2
Spec released 1st May 2017
New Functionality
Conditional node execution
Feature detection
Classification operators
Expanded imaging operations
Extensions
Neural Network Acceleration
Graph Save and Restore
16-bit image operation
OpenVX
Roadmap Discussions
New Functionality
Under Discussion
NNEF Import
Accelerated
Programmable user
nodes
Come to the next
presentation for more
details!
© Copyright Khronos Group 2017 - Page 6© Copyright Khronos Group 2017 - Page 6© Copyright Khronos Group 2017 - Page 6© Copyright Khronos Group 2017 - Page 6
Khronos NNEF (Neural Net Exchange Format
• Range of Neural Network tools and inferencing architectures is rapidly increasing
• NNEF encapsulates neural network formal semantics
- Structure, Data formats
- Commonly used operations (such as convolution, pooling, normalization, etc.)
• Cross-vendor Neural Net file format removes industry friction
- Simple exchange between tools and inferencing engines
- Unified format for network optimizations
NN Authoring Framework 1
NN Authoring Framework 2
NN Authoring Framework 3
Inference Engine 1
Inference Engine 2
Inference Engine 3
NN Authoring Framework 1
NN Authoring Framework 2
NN Authoring Framework 3
Inference Engine 1
Inference Engine 2
Inference Engine 3
Every Tool Needs an Exporter to
Every Accelerator
© Copyright Khronos Group 2017 - Page 7
© Copyright Khronos Group 2017 - Page 7
OpenVX 1.2 and Neural Net Extension
• Convolution Neural Network topologies can be represented as OpenVX graphs
- Layers are represented as OpenVX nodes
- Layers connected by multi-dimensional tensors objects
- Layer types include convolution, activation, pooling, fully-connected, soft-max
- CNN nodes can be mixed with traditional vision nodes
• Import/Export Extension
- Efficient handling of network Weights/Biases or complete networks
• Will be able to import NNEF files into OpenVX Neural Nets
Vision
Node
Vision
Node
Vision
Node
Downstream
Application
Processing
Native
Camera
Control CNN Nodes
An OpenVX graph mixing CNN nodes
with traditional vision nodes
© Copyright Khronos Group 2017 - Page 8© Copyright Khronos Group 2017 - Page 8© Copyright Khronos Group 2017 - Page 8© Copyright Khronos Group 2017 - Page 8
Building Graphs in C++ Rather than an API
Programmatic approaches to Graph Programming
Explicit
Kernels
Diverse Hardware
(GPUs, DSPs, FPGAs)
GPU FPGA
DSP
Dedicated
Hardware
Single Source C++
Languages
Graph
Programming
TensorRT
Code Intermediate
Representations PTX HSAIL
© Copyright Khronos Group 2017 - Page 9
© Copyright Khronos Group 2017 - Page 9
Halide – C++ Based Image Processing
• Halide is a programming language embedded in C++
- C++ code builds an in-memory representation of a Halide graph
- Compiled to an object file, to be run in the same process
• Separate descriptions of ‘algorithm’ per output pixel and execution ‘schedule’
- Can hand or auto-optimize schedule without affecting the algorithm correctness
• Compiler based on LLVM/Clang
- Compiler targets: X86, ARM, CUDA, OpenCL, and OpenGL
Halide
Program
(Image processing
Algorithm)
Human
Expert
Autoscheduler
Halide
Schedule
(mapping of
algorithm to
hardware)
Halide Code
Generator
(LLVM/Clang)
Platform-
optimized binary
© Copyright Khronos Group 2017 - Page 10© Copyright Khronos Group 2017 - Page 10© Copyright Khronos Group 2017 - Page 10© Copyright Khronos Group 2017 - Page 10
Khronos SYCL - Single Source C++
• Single-source heterogeneous programming using STANDARD C++
- Use C++ templates and lambda functions for host & device code
• Kernel Fusion in C++ is a widely used compiler technique - proven to work
- Halide, Eigen, Boost.Compute, …
- Optimization at the C++, not assembly, level
- Achieves better performance on complex software than hand-coding
• Rapid optimization of multiple libraries - more information at http://sycl.tech
- SYCLBLAS
- SYCL Eigen
- SYCL TensorFlow
- SYCL GTX
- triSYCL
- ComputeCpp
- VisionCpp
- ComputeCpp SDK
© Copyright Khronos Group 2017 - Page 11© Copyright Khronos Group 2017 - Page 11© Copyright Khronos Group 2017 - Page 11© Copyright Khronos Group 2017 - Page 11
Graph Programming - Fusion Results
• C++ Kernel fusion provides optimization benefits
- Tiled operations in local memory
- Reduced bandwidth to off-chip memory
Courtesy Codeplay:
https://www.slideshare.net/AndrewRichards28/open-standards-for-adas-andrew-richards-codeplay-at-autosens-2016-66476890
© Copyright Khronos Group 2017 - Page 12© Copyright Khronos Group 2017 - Page 12© Copyright Khronos Group 2017 - Page 12© Copyright Khronos Group 2017 - Page 12
Convergence with Standard ISO C++
• SYCL aligns the hardware acceleration of OpenCL with direction of the C++ standard
- Use of C++14 with open source C++17 Parallel STL hosted by Khronos
• Khronos working with others on bringing proposals to ISO C++ for:
- Executors – for scheduling work
- “Managed pointers” or “channels” – for sharing data
• Hoping to target C++ 20
- But timescales are tight
• The eventual aim is for standard C++ to
support advanced acceleration
offload
© Copyright Khronos Group 2017 - Page 13© Copyright Khronos Group 2017 - Page 13© Copyright Khronos Group 2017 - Page 13© Copyright Khronos Group 2017 - Page 13
Explicit Programming and Run Times
Run-time support for accelerator offload
Explicit
Kernels
Diverse Hardware
(GPUs, DSPs, FPGAs)
GPU FPGA
DSP
Dedicated
Hardware
Single Source C++
Languages
Graph
Programming
TensorRT
Code Intermediate
Representations PTX HSAIL
© Copyright Khronos Group 2017 - Page 14© Copyright Khronos Group 2017 - Page 14© Copyright Khronos Group 2017 - Page 14© Copyright Khronos Group 2017 - Page 14
OpenCL as Parallel Language/Library Backend
C++ based
Neural
network
framework
MulticoreWare
open source
project on
Bitbucket
Compiler
directives for
Fortran,
C and C++
Java language
extensions
for
parallelism
Language for
image
processing and
computational
photography
Single
Source C++
Programming
for OpenCL
Approaching 200 languages, frameworks
and projects using OpenCL as a compiler
target to access vendor-optimized,
heterogeneous compute runtimes
Low Level Explicit APIs
Vision
processing
open source
project
Open source
software library
for machine
learning
© Copyright Khronos Group 2017 - Page 15© Copyright Khronos Group 2017 - Page 15© Copyright Khronos Group 2017 - Page 15© Copyright Khronos Group 2017 - Page 15
OpenCL 2.2 - Top to Bottom C++
OpenCL 1.0
Specification
Dec08 Jun10
OpenCL 1.1
Specification
Nov11
OpenCL 1.2
Specification
OpenCL 2.0
Specification
Nov13
Device partitioning
Separate compilation and linking
Enhanced image support
Built-in kernels / custom devices
Enhanced DX and OpenGL Interop
Shared Virtual Memory
On-device dispatch
Generic Address Space
Enhanced Image Support
C11 Atomics
Pipes
Android ICD
3-component vectors
Additional image formats
Multiple hosts and devices
Buffer region operations
Enhanced event-driven execution
Additional OpenCL C built-ins
Improved OpenGL data/event interop
18 months 18 months 24 months
OpenCL 2.1
Specification
Nov1524 months
SPIR-V in Core
Subgroups into core
Subgroup query operations
clCloneKernel
Low-latency device
timer queries
OpenCL 2.2
PROVISIONAL
May167months
Single Source C++ Programming
Full support for features in C++14-based Kernel Language
API and Language Specs
Brings C++14-based Kernel Language into core specification
Portable Kernel Intermediate Language
Support for C++14-based kernel language e.g.
constructors/destructors
OpenCL C++ Kernel Language
SPIR-V 1.1 with C++ support
SYCL 2.2 for single source C++
© Copyright Khronos Group 2017 - Page 16© Copyright Khronos Group 2017 - Page 16© Copyright Khronos Group 2017 - Page 16© Copyright Khronos Group 2017 - Page 16
OpenCL Conformant Implementations
OpenCL 1.0
Specification
Dec08 Jun10
OpenCL 1.1
Specification
Nov11
OpenCL 1.2
Specification
OpenCL 2.0
Specification
Nov13
1.0|Jul13
1.0|Aug09
1.0|May09
1.0|May10
1.0 | Feb11
1.0|May09
1.0|Jan10
1.1|Aug10
1.1|Jul11
1.2|May12
1.2|Jun12
1.1|Feb11
1.1|Mar11
1.1|Jun10
1.1|Aug12
1.1|Nov12
1.1|May13
1.1|Apr12
1.2|Apr14
1.2|Sep13
1.2|Dec12
Desktop
Mobile
FPGA
2.0|Jul14
OpenCL 2.1
Specification
Nov15
1.2|May15
2.0|Dec14
1.0|Dec14
1.2|Dec14
1.2|Sep14
Vendor timelines are
first implementation of
each spec generation
1.2|May15
Embedded
1.2|Aug15
1.2|Mar16
2.0|Nov15
2.0|Apr17
© Copyright Khronos Group 2017 - Page 17© Copyright Khronos Group 2017 - Page 17© Copyright Khronos Group 2017 - Page 17© Copyright Khronos Group 2017 - Page 17
Strong Vulkan Momentum
Games Studios publicly
confirming that work is
ongoing on Vulkan Titles
All Major GPU Companies shipping Vulkan Drivers – for Desktop and Mobile Platforms
Mobile, Embedded and Console Platforms Supporting Vulkan
Embedded Linux
In first 12months :
#Vulkan Games on PC = 11
In first 18 months
#DX12 Games on PC = 19
https://en.wikipedia.org/wiki/List_of_games_with_Vulkan_support
Android TVAndroid 7.0 Nintendo Switch
© Copyright Khronos Group 2017 - Page 18© Copyright Khronos Group 2017 - Page 18© Copyright Khronos Group 2017 - Page 18© Copyright Khronos Group 2017 - Page 18
Vulkan Explicit GPU Control
GPU
High-level Driver
Abstraction
Context management
Memory allocation
Full GLSL compiler
Error detection
Layered GPU Control
Application
Single thread per context
GPU
Thin Driver
Explicit GPU Control
Application
Memory allocation
Thread management
Synchronization
Multi-threaded generation
of command buffers
Language Front-end
Compilers
Initially GLSL
Loadable debug and
validation layers
Vulkan 1.0 provides access to
OpenGL ES 3.1 / OpenGL 4.X-class GPU functionality
but with increased performance and flexibility
Loadable Layers
No error handling overhead in
production code
SPIR-V Pre-compiled Shaders:
No front-end compiler in driver
Future shading language flexibility
Simpler drivers:
Improved efficiency/performance
Reduced CPU bottlenecks
Lower latency
Increased portability
Graphics, compute and DMA queues:
Work dispatch flexibility
Command Buffers:
Command creation can be multi-threaded
Multiple CPU cores increase performance
Resource management in app code:
Less hitches and surprises
Vulkan Benefits
SPIR-V pre-compiled
shaders
© Copyright Khronos Group 2017 - Page 19© Copyright Khronos Group 2017 - Page 19© Copyright Khronos Group 2017 - Page 19© Copyright Khronos Group 2017 - Page 19
Vulkan Influence on OpenCL Roadmap
• OpenCL converging with Vulkan – expanding beyond graphics + more processor types
- Thin, powerful, explicit run-time for control and predictability
- Feature sets and dial-able precision for target market agility
- Installable tools and three layer ecosystem for flexibility
- Vulkan renderpasses are already a way to enabled tiled processing
• OpenCL also considering optional reduced precision for vision and inferencing
- Will enable significant number of DSP implementations to become conformant
Thin, explicit run-time with rigorous
memory/execution model.
Low-latency, fine-grain pre-emption
and synchronization
Dial-able types
and precision
Features that can be enabled for particular target markets
Real-time Pre-
emption and
QoS scheduling
Explicit
Asynch
DMA
Self-synchronized,
self-scheduled
graphs
Stream
Processing …
Math
Libraries
Vendor-supplied and open
source middleware
Language
Front-ends
Tool
Layers
Installable tool &
validation layers
Applications
API
Definitions
© Copyright Khronos Group 2017 - Page 20© Copyright Khronos Group 2017 - Page 20© Copyright Khronos Group 2017 - Page 20© Copyright Khronos Group 2017 - Page 20
SPIR-V Ecosystem
LLVM
Third party kernel and
shader Languages
SPIR-V
• Khronos defined and controlled
cross-API intermediate language
• Native support for graphics
and parallel constructs
• 32-bit Word Stream
• Extensible and easily parsed
• Retains data object and control
flow information for effective
code generation and translation
OpenCL C++OpenCL C
GLSL
Khronos has open sourced
these tools and translators
IHV Driver
Runtimes
Other
Intermediate
Forms
SPIR-V Validator
SPIR-V (Dis)Assembler LLVM to SPIR-V
Bi-directional
Translator
Khronos plans to open
source these tools soon
HLSL
https://github.com/KhronosGroup/SPIRV-Tools
‘glslang’ GLSL to
SPIR-V compiler
Front-end compilers leverage clang
© Copyright Khronos Group 2017 - Page 21© Copyright Khronos Group 2017 - Page 21© Copyright Khronos Group 2017 - Page 21© Copyright Khronos Group 2017 - Page 21
Possible Convergence of Graph Technologies
• API-created Graphs such as OpenVX benefit from flexibility of user-programmed nodes
- OpenVX Tiling extension lets them participate in tiled/fused optimizations
- But currently user nodes can run only on the CPU
• Perhaps use a C++ language, such such as a Halide algorithm, to program user nodes
- That can be offloaded and scheduled within the OpenVX graph
- Perhaps use SPIR-V to define node capabilities and store portable Node programs
Vision
Node
Vision
Node
Vision
Node
Downstream
Application
Processing
Native
Camera
Control
CNN Nodes
Discussion – should an OpenVX user
programmed node in a C++ domain
specific language may have its
executable stored as SPIR-V?
Programmed
User Node
© Copyright Khronos Group 2017 - Page 22© Copyright Khronos Group 2017 - Page 22© Copyright Khronos Group 2017 - Page 22© Copyright Khronos Group 2017 - Page 22
Safety Critical APIs
New Generation APIs for safety
certifiable vision, graphics and
compute
e.g. ISO 26262 and DO-178B/C
OpenGL ES 1.0 - 2003
Fixed function graphics
OpenGL ES 2.0 - 2007
Shader programmable pipeline
OpenGL SC 1.0 - 2005
Fixed function graphics subset
OpenGL SC 2.0 - April 2016
Shader programmable pipeline subset
Experience and Guidelines
Vulkan SC being discussed
Small driver size
Advanced functionality
Graphics and compute
OpenVX SC 1.1 Released Today
Restricted “deployment” implementation
executes on the target hardware by reading
the binary format and executing the pre-
compiled graphs
Khronos SCAP
‘Safety Critical Advisory Panel’
Guidelines for designing APIs that
ease system certification.
Open to Khronos member AND
industry experts. If interested to
join contact ntrevett@nvidia.com
© Copyright Khronos Group 2017 - Page 23
© Copyright Khronos Group 2017 - Page 23
Key Takeaways and What’s Next?
• Vision Tools and APIs are becoming increasingly sophisticated
- Ecosystem is layering libraries, language and run-times
• Compiler technologies becoming increasingly central – especially C++
- To enable graph-based language-based solutions with kernel fusion
- LLVM and Clang open source projects being used by many
• Safety-critical APIs becoming increasingly essential
- Many vision applications need system certification
• Still no cross-vendor camera APIs?
- Is the time yet right for this to be a target for standardization?
• Please join if your company interested in Khronos open standards
- Neil Trevett | ntrevett@nvidia.com | @neilt3d

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...
PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...
PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...
 
"The OpenVX Hardware Acceleration API for Embedded Vision Applications and Li...
"The OpenVX Hardware Acceleration API for Embedded Vision Applications and Li..."The OpenVX Hardware Acceleration API for Embedded Vision Applications and Li...
"The OpenVX Hardware Acceleration API for Embedded Vision Applications and Li...
 
WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by Mikael ...
WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by  Mikael ...WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by  Mikael ...
WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by Mikael ...
 
Introduction to GPUs in HPC
Introduction to GPUs in HPCIntroduction to GPUs in HPC
Introduction to GPUs in HPC
 
"Deep Learning on Arm Cortex-M Microcontrollers," a Presentation from Arm
"Deep Learning on Arm Cortex-M Microcontrollers," a Presentation from Arm"Deep Learning on Arm Cortex-M Microcontrollers," a Presentation from Arm
"Deep Learning on Arm Cortex-M Microcontrollers," a Presentation from Arm
 
Microsoft Project Olympus AI Accelerator Chassis (HGX-1)
Microsoft Project Olympus AI Accelerator Chassis (HGX-1)Microsoft Project Olympus AI Accelerator Chassis (HGX-1)
Microsoft Project Olympus AI Accelerator Chassis (HGX-1)
 
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael MantorGS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor
 
Utilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmapUtilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmap
 
Evaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI SupercomputerEvaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI Supercomputer
 
NVIDIA OpenGL 4.6 in 2017
NVIDIA OpenGL 4.6 in 2017NVIDIA OpenGL 4.6 in 2017
NVIDIA OpenGL 4.6 in 2017
 
PT-4055, Optimizing Raytracing on GCN with AMD Development Tools, by Tzachi C...
PT-4055, Optimizing Raytracing on GCN with AMD Development Tools, by Tzachi C...PT-4055, Optimizing Raytracing on GCN with AMD Development Tools, by Tzachi C...
PT-4055, Optimizing Raytracing on GCN with AMD Development Tools, by Tzachi C...
 
NVIDIA OpenGL and Vulkan Support for 2017
NVIDIA OpenGL and Vulkan Support for 2017NVIDIA OpenGL and Vulkan Support for 2017
NVIDIA OpenGL and Vulkan Support for 2017
 
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary Demos
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary DemosMM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary Demos
MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary Demos
 
Final lisa opening_keynote_draft_-_v12.1tb
Final lisa opening_keynote_draft_-_v12.1tbFinal lisa opening_keynote_draft_-_v12.1tb
Final lisa opening_keynote_draft_-_v12.1tb
 
IS-4081, Rabbit: Reinventing Video Chat, by Philippe Clavel
IS-4081, Rabbit: Reinventing Video Chat, by Philippe ClavelIS-4081, Rabbit: Reinventing Video Chat, by Philippe Clavel
IS-4081, Rabbit: Reinventing Video Chat, by Philippe Clavel
 
HSA-4123, HSA Memory Model, by Ben Gaster
HSA-4123, HSA Memory Model, by Ben GasterHSA-4123, HSA Memory Model, by Ben Gaster
HSA-4123, HSA Memory Model, by Ben Gaster
 
OpenCL - The Open Standard for Heterogeneous Parallel Programming
OpenCL - The Open Standard for Heterogeneous Parallel ProgrammingOpenCL - The Open Standard for Heterogeneous Parallel Programming
OpenCL - The Open Standard for Heterogeneous Parallel Programming
 
Lenovo HPC Strategy Update
Lenovo HPC Strategy UpdateLenovo HPC Strategy Update
Lenovo HPC Strategy Update
 
00 opencapi acceleration framework yonglu_ver2
00 opencapi acceleration framework yonglu_ver200 opencapi acceleration framework yonglu_ver2
00 opencapi acceleration framework yonglu_ver2
 
WT-4073, ANGLE and cross-platform WebGL support, by Shannon Woods
WT-4073, ANGLE and cross-platform WebGL support, by Shannon WoodsWT-4073, ANGLE and cross-platform WebGL support, by Shannon Woods
WT-4073, ANGLE and cross-platform WebGL support, by Shannon Woods
 

Ähnlich wie "The Vision Acceleration API Landscape: Options and Trade-offs," a Presentation from the Khronos Group

"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono..."The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
Edge AI and Vision Alliance
 
“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...
“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...
“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...
Edge AI and Vision Alliance
 
"Portable Performance via the OpenVX Computer Vision Library: Case Studies," ...
"Portable Performance via the OpenVX Computer Vision Library: Case Studies," ..."Portable Performance via the OpenVX Computer Vision Library: Case Studies," ...
"Portable Performance via the OpenVX Computer Vision Library: Case Studies," ...
Edge AI and Vision Alliance
 
Массовый параллелизм для гетерогенных вычислений на C++ для беспилотных автом...
Массовый параллелизм для гетерогенных вычислений на C++ для беспилотных автом...Массовый параллелизм для гетерогенных вычислений на C++ для беспилотных автом...
Массовый параллелизм для гетерогенных вычислений на C++ для беспилотных автом...
CEE-SEC(R)
 
"Combining Flexibility and Low-Power in Embedded Vision Subsystems: An Applic...
"Combining Flexibility and Low-Power in Embedded Vision Subsystems: An Applic..."Combining Flexibility and Low-Power in Embedded Vision Subsystems: An Applic...
"Combining Flexibility and Low-Power in Embedded Vision Subsystems: An Applic...
Edge AI and Vision Alliance
 

Ähnlich wie "The Vision Acceleration API Landscape: Options and Trade-offs," a Presentation from the Khronos Group (20)

"New Standards for Embedded Vision and Neural Networks," a Presentation from ...
"New Standards for Embedded Vision and Neural Networks," a Presentation from ..."New Standards for Embedded Vision and Neural Networks," a Presentation from ...
"New Standards for Embedded Vision and Neural Networks," a Presentation from ...
 
"Update on Khronos Standards for Vision and Machine Learning," a Presentation...
"Update on Khronos Standards for Vision and Machine Learning," a Presentation..."Update on Khronos Standards for Vision and Machine Learning," a Presentation...
"Update on Khronos Standards for Vision and Machine Learning," a Presentation...
 
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono..."The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
 
"An Update on Open Standard APIs for Vision Processing," a Presentation from ...
"An Update on Open Standard APIs for Vision Processing," a Presentation from ..."An Update on Open Standard APIs for Vision Processing," a Presentation from ...
"An Update on Open Standard APIs for Vision Processing," a Presentation from ...
 
“Open Standards: Powering the Future of Embedded Vision,” a Presentation from...
“Open Standards: Powering the Future of Embedded Vision,” a Presentation from...“Open Standards: Powering the Future of Embedded Vision,” a Presentation from...
“Open Standards: Powering the Future of Embedded Vision,” a Presentation from...
 
“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...
“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...
“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...
 
"Portable Performance via the OpenVX Computer Vision Library: Case Studies," ...
"Portable Performance via the OpenVX Computer Vision Library: Case Studies," ..."Portable Performance via the OpenVX Computer Vision Library: Case Studies," ...
"Portable Performance via the OpenVX Computer Vision Library: Case Studies," ...
 
"Current and Planned Standards for Computer Vision and Machine Learning," a P...
"Current and Planned Standards for Computer Vision and Machine Learning," a P..."Current and Planned Standards for Computer Vision and Machine Learning," a P...
"Current and Planned Standards for Computer Vision and Machine Learning," a P...
 
"OpenCV on Zynq: Accelerating 4k60 Dense Optical Flow and Stereo Vision," a P...
"OpenCV on Zynq: Accelerating 4k60 Dense Optical Flow and Stereo Vision," a P..."OpenCV on Zynq: Accelerating 4k60 Dense Optical Flow and Stereo Vision," a P...
"OpenCV on Zynq: Accelerating 4k60 Dense Optical Flow and Stereo Vision," a P...
 
OpenCL Overview Japan Virtual Open House Feb 2021
OpenCL Overview Japan Virtual Open House Feb 2021OpenCL Overview Japan Virtual Open House Feb 2021
OpenCL Overview Japan Virtual Open House Feb 2021
 
“Khronos Standard APIs for Accelerating Vision and Inferencing,” a Presentati...
“Khronos Standard APIs for Accelerating Vision and Inferencing,” a Presentati...“Khronos Standard APIs for Accelerating Vision and Inferencing,” a Presentati...
“Khronos Standard APIs for Accelerating Vision and Inferencing,” a Presentati...
 
XPDDS18: CPUFreq in Xen on ARM - Oleksandr Tyshchenko, EPAM Systems
XPDDS18: CPUFreq in Xen on ARM - Oleksandr Tyshchenko, EPAM SystemsXPDDS18: CPUFreq in Xen on ARM - Oleksandr Tyshchenko, EPAM Systems
XPDDS18: CPUFreq in Xen on ARM - Oleksandr Tyshchenko, EPAM Systems
 
System Design on Zynq using SDSoC
System Design on Zynq using SDSoCSystem Design on Zynq using SDSoC
System Design on Zynq using SDSoC
 
Массовый параллелизм для гетерогенных вычислений на C++ для беспилотных автом...
Массовый параллелизм для гетерогенных вычислений на C++ для беспилотных автом...Массовый параллелизм для гетерогенных вычислений на C++ для беспилотных автом...
Массовый параллелизм для гетерогенных вычислений на C++ для беспилотных автом...
 
"Update on Khronos Standards for Vision and Machine Learning," a Presentation...
"Update on Khronos Standards for Vision and Machine Learning," a Presentation..."Update on Khronos Standards for Vision and Machine Learning," a Presentation...
"Update on Khronos Standards for Vision and Machine Learning," a Presentation...
 
SYCL 2020 Specification
SYCL 2020 SpecificationSYCL 2020 Specification
SYCL 2020 Specification
 
Redfish and python-redfish for Software Defined Infrastructure
Redfish and python-redfish for Software Defined InfrastructureRedfish and python-redfish for Software Defined Infrastructure
Redfish and python-redfish for Software Defined Infrastructure
 
How to Enterprise Node
How to Enterprise NodeHow to Enterprise Node
How to Enterprise Node
 
Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by ...
Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by ...Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by ...
Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by ...
 
"Combining Flexibility and Low-Power in Embedded Vision Subsystems: An Applic...
"Combining Flexibility and Low-Power in Embedded Vision Subsystems: An Applic..."Combining Flexibility and Low-Power in Embedded Vision Subsystems: An Applic...
"Combining Flexibility and Low-Power in Embedded Vision Subsystems: An Applic...
 

Mehr von Edge AI and Vision Alliance

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
Edge AI and Vision Alliance
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
Edge AI and Vision Alliance
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...
Edge AI and Vision Alliance
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
Edge AI and Vision Alliance
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
Edge AI and Vision Alliance
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
Edge AI and Vision Alliance
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara
Edge AI and Vision Alliance
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
Edge AI and Vision Alliance
 

Mehr von Edge AI and Vision Alliance (20)

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
 
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
 
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara
 
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
 
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
 
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
 

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

"The Vision Acceleration API Landscape: Options and Trade-offs," a Presentation from the Khronos Group

  • 1. Copyright © 2017 Your Company Name 1 Your Company Logo goes here. See Slide 8 for instructions Neil Trevett, Khronos President VP Developer Ecosystem, NVIDIA May2017 Vision Acceleration API Landscape: Options and Trade-offs
  • 2. © Copyright Khronos Group 2017 - Page 2© Copyright Khronos Group 2017 - Page 2© Copyright Khronos Group 2017 - Page 2© Copyright Khronos Group 2017 - Page 2 The Search for Vision Performance and Portability Leads to a layered Ecosystem – and Graph Programming Explicit Kernels Diverse Hardware (GPUs, DSPs, FPGAs) GPU FPGA DSP Dedicated Hardware Single Source C++ Languages Graph Programming TensorRT Code Intermediate Representations PTX HSAIL
  • 3. © Copyright Khronos Group 2017 - Page 3 © Copyright Khronos Group 2017 - Page 3 OpenVX – Performance Portable Vision • OpenVX developers express a graph of image operations (‘Nodes’) - Using a C API • Nodes can be executed on any hardware or processor coded in any language - Implementers can optimize under the high-level graph abstraction • Graphs are the key to run-time optimization opportunities… Array of Keypoints YUV Frame Gray Frame Camera Input Rendering Output Pyrt Color Conversion Channel Extract Optical Flow Harris Track Image Pyramid RGB Frame Array of Features Ftrt-1OpenVX Graph OpenVX Nodes Feature Extraction Example Graph
  • 4. © Copyright Khronos Group 2017 - Page 4© Copyright Khronos Group 2017 - Page 4© Copyright Khronos Group 2017 - Page 4© Copyright Khronos Group 2017 - Page 4 OpenVX Efficiency through Graphs.. Reuse pre-allocated memory for multiple intermediate data Memory Management Less allocation overhead, more memory for other applications Replace a sub- graph with a single faster node Kernel Fusion Better memory locality, less kernel launch overhead Split the graph execution across the whole system: CPU / GPU / dedicated HW Graph Scheduling Faster execution or lower power consumption Execute a sub- graph at tile granularity instead of image granularity Data Tiling Better use of data cache and local memory
  • 5. © Copyright Khronos Group 2017 - Page 5© Copyright Khronos Group 2017 - Page 5© Copyright Khronos Group 2017 - Page 5© Copyright Khronos Group 2017 - Page 5 OpenVX Evolution OpenVX 1.0 Spec released October 2014 Conformant Implementations OpenVX 1.1 Spec released May 2016 Conformant Implementations AMD OpenVX Tools - Open source, highly optimized for x86 CPU and OpenCL for GPU - “Graph Optimizer” looks at entire processing pipeline and removes, replaces, merges functions to improve performance and bandwidth - Scripting for rapid prototyping, without re- compiling, at production performance levels http://gpuopen.com/compute-product/amd-openvx/ New Functionality Expanded Nodes Functionality Enhanced Graph Framework OpenVX 1.2 Spec released 1st May 2017 New Functionality Conditional node execution Feature detection Classification operators Expanded imaging operations Extensions Neural Network Acceleration Graph Save and Restore 16-bit image operation OpenVX Roadmap Discussions New Functionality Under Discussion NNEF Import Accelerated Programmable user nodes Come to the next presentation for more details!
  • 6. © Copyright Khronos Group 2017 - Page 6© Copyright Khronos Group 2017 - Page 6© Copyright Khronos Group 2017 - Page 6© Copyright Khronos Group 2017 - Page 6 Khronos NNEF (Neural Net Exchange Format • Range of Neural Network tools and inferencing architectures is rapidly increasing • NNEF encapsulates neural network formal semantics - Structure, Data formats - Commonly used operations (such as convolution, pooling, normalization, etc.) • Cross-vendor Neural Net file format removes industry friction - Simple exchange between tools and inferencing engines - Unified format for network optimizations NN Authoring Framework 1 NN Authoring Framework 2 NN Authoring Framework 3 Inference Engine 1 Inference Engine 2 Inference Engine 3 NN Authoring Framework 1 NN Authoring Framework 2 NN Authoring Framework 3 Inference Engine 1 Inference Engine 2 Inference Engine 3 Every Tool Needs an Exporter to Every Accelerator
  • 7. © Copyright Khronos Group 2017 - Page 7 © Copyright Khronos Group 2017 - Page 7 OpenVX 1.2 and Neural Net Extension • Convolution Neural Network topologies can be represented as OpenVX graphs - Layers are represented as OpenVX nodes - Layers connected by multi-dimensional tensors objects - Layer types include convolution, activation, pooling, fully-connected, soft-max - CNN nodes can be mixed with traditional vision nodes • Import/Export Extension - Efficient handling of network Weights/Biases or complete networks • Will be able to import NNEF files into OpenVX Neural Nets Vision Node Vision Node Vision Node Downstream Application Processing Native Camera Control CNN Nodes An OpenVX graph mixing CNN nodes with traditional vision nodes
  • 8. © Copyright Khronos Group 2017 - Page 8© Copyright Khronos Group 2017 - Page 8© Copyright Khronos Group 2017 - Page 8© Copyright Khronos Group 2017 - Page 8 Building Graphs in C++ Rather than an API Programmatic approaches to Graph Programming Explicit Kernels Diverse Hardware (GPUs, DSPs, FPGAs) GPU FPGA DSP Dedicated Hardware Single Source C++ Languages Graph Programming TensorRT Code Intermediate Representations PTX HSAIL
  • 9. © Copyright Khronos Group 2017 - Page 9 © Copyright Khronos Group 2017 - Page 9 Halide – C++ Based Image Processing • Halide is a programming language embedded in C++ - C++ code builds an in-memory representation of a Halide graph - Compiled to an object file, to be run in the same process • Separate descriptions of ‘algorithm’ per output pixel and execution ‘schedule’ - Can hand or auto-optimize schedule without affecting the algorithm correctness • Compiler based on LLVM/Clang - Compiler targets: X86, ARM, CUDA, OpenCL, and OpenGL Halide Program (Image processing Algorithm) Human Expert Autoscheduler Halide Schedule (mapping of algorithm to hardware) Halide Code Generator (LLVM/Clang) Platform- optimized binary
  • 10. © Copyright Khronos Group 2017 - Page 10© Copyright Khronos Group 2017 - Page 10© Copyright Khronos Group 2017 - Page 10© Copyright Khronos Group 2017 - Page 10 Khronos SYCL - Single Source C++ • Single-source heterogeneous programming using STANDARD C++ - Use C++ templates and lambda functions for host & device code • Kernel Fusion in C++ is a widely used compiler technique - proven to work - Halide, Eigen, Boost.Compute, … - Optimization at the C++, not assembly, level - Achieves better performance on complex software than hand-coding • Rapid optimization of multiple libraries - more information at http://sycl.tech - SYCLBLAS - SYCL Eigen - SYCL TensorFlow - SYCL GTX - triSYCL - ComputeCpp - VisionCpp - ComputeCpp SDK
  • 11. © Copyright Khronos Group 2017 - Page 11© Copyright Khronos Group 2017 - Page 11© Copyright Khronos Group 2017 - Page 11© Copyright Khronos Group 2017 - Page 11 Graph Programming - Fusion Results • C++ Kernel fusion provides optimization benefits - Tiled operations in local memory - Reduced bandwidth to off-chip memory Courtesy Codeplay: https://www.slideshare.net/AndrewRichards28/open-standards-for-adas-andrew-richards-codeplay-at-autosens-2016-66476890
  • 12. © Copyright Khronos Group 2017 - Page 12© Copyright Khronos Group 2017 - Page 12© Copyright Khronos Group 2017 - Page 12© Copyright Khronos Group 2017 - Page 12 Convergence with Standard ISO C++ • SYCL aligns the hardware acceleration of OpenCL with direction of the C++ standard - Use of C++14 with open source C++17 Parallel STL hosted by Khronos • Khronos working with others on bringing proposals to ISO C++ for: - Executors – for scheduling work - “Managed pointers” or “channels” – for sharing data • Hoping to target C++ 20 - But timescales are tight • The eventual aim is for standard C++ to support advanced acceleration offload
  • 13. © Copyright Khronos Group 2017 - Page 13© Copyright Khronos Group 2017 - Page 13© Copyright Khronos Group 2017 - Page 13© Copyright Khronos Group 2017 - Page 13 Explicit Programming and Run Times Run-time support for accelerator offload Explicit Kernels Diverse Hardware (GPUs, DSPs, FPGAs) GPU FPGA DSP Dedicated Hardware Single Source C++ Languages Graph Programming TensorRT Code Intermediate Representations PTX HSAIL
  • 14. © Copyright Khronos Group 2017 - Page 14© Copyright Khronos Group 2017 - Page 14© Copyright Khronos Group 2017 - Page 14© Copyright Khronos Group 2017 - Page 14 OpenCL as Parallel Language/Library Backend C++ based Neural network framework MulticoreWare open source project on Bitbucket Compiler directives for Fortran, C and C++ Java language extensions for parallelism Language for image processing and computational photography Single Source C++ Programming for OpenCL Approaching 200 languages, frameworks and projects using OpenCL as a compiler target to access vendor-optimized, heterogeneous compute runtimes Low Level Explicit APIs Vision processing open source project Open source software library for machine learning
  • 15. © Copyright Khronos Group 2017 - Page 15© Copyright Khronos Group 2017 - Page 15© Copyright Khronos Group 2017 - Page 15© Copyright Khronos Group 2017 - Page 15 OpenCL 2.2 - Top to Bottom C++ OpenCL 1.0 Specification Dec08 Jun10 OpenCL 1.1 Specification Nov11 OpenCL 1.2 Specification OpenCL 2.0 Specification Nov13 Device partitioning Separate compilation and linking Enhanced image support Built-in kernels / custom devices Enhanced DX and OpenGL Interop Shared Virtual Memory On-device dispatch Generic Address Space Enhanced Image Support C11 Atomics Pipes Android ICD 3-component vectors Additional image formats Multiple hosts and devices Buffer region operations Enhanced event-driven execution Additional OpenCL C built-ins Improved OpenGL data/event interop 18 months 18 months 24 months OpenCL 2.1 Specification Nov1524 months SPIR-V in Core Subgroups into core Subgroup query operations clCloneKernel Low-latency device timer queries OpenCL 2.2 PROVISIONAL May167months Single Source C++ Programming Full support for features in C++14-based Kernel Language API and Language Specs Brings C++14-based Kernel Language into core specification Portable Kernel Intermediate Language Support for C++14-based kernel language e.g. constructors/destructors OpenCL C++ Kernel Language SPIR-V 1.1 with C++ support SYCL 2.2 for single source C++
  • 16. © Copyright Khronos Group 2017 - Page 16© Copyright Khronos Group 2017 - Page 16© Copyright Khronos Group 2017 - Page 16© Copyright Khronos Group 2017 - Page 16 OpenCL Conformant Implementations OpenCL 1.0 Specification Dec08 Jun10 OpenCL 1.1 Specification Nov11 OpenCL 1.2 Specification OpenCL 2.0 Specification Nov13 1.0|Jul13 1.0|Aug09 1.0|May09 1.0|May10 1.0 | Feb11 1.0|May09 1.0|Jan10 1.1|Aug10 1.1|Jul11 1.2|May12 1.2|Jun12 1.1|Feb11 1.1|Mar11 1.1|Jun10 1.1|Aug12 1.1|Nov12 1.1|May13 1.1|Apr12 1.2|Apr14 1.2|Sep13 1.2|Dec12 Desktop Mobile FPGA 2.0|Jul14 OpenCL 2.1 Specification Nov15 1.2|May15 2.0|Dec14 1.0|Dec14 1.2|Dec14 1.2|Sep14 Vendor timelines are first implementation of each spec generation 1.2|May15 Embedded 1.2|Aug15 1.2|Mar16 2.0|Nov15 2.0|Apr17
  • 17. © Copyright Khronos Group 2017 - Page 17© Copyright Khronos Group 2017 - Page 17© Copyright Khronos Group 2017 - Page 17© Copyright Khronos Group 2017 - Page 17 Strong Vulkan Momentum Games Studios publicly confirming that work is ongoing on Vulkan Titles All Major GPU Companies shipping Vulkan Drivers – for Desktop and Mobile Platforms Mobile, Embedded and Console Platforms Supporting Vulkan Embedded Linux In first 12months : #Vulkan Games on PC = 11 In first 18 months #DX12 Games on PC = 19 https://en.wikipedia.org/wiki/List_of_games_with_Vulkan_support Android TVAndroid 7.0 Nintendo Switch
  • 18. © Copyright Khronos Group 2017 - Page 18© Copyright Khronos Group 2017 - Page 18© Copyright Khronos Group 2017 - Page 18© Copyright Khronos Group 2017 - Page 18 Vulkan Explicit GPU Control GPU High-level Driver Abstraction Context management Memory allocation Full GLSL compiler Error detection Layered GPU Control Application Single thread per context GPU Thin Driver Explicit GPU Control Application Memory allocation Thread management Synchronization Multi-threaded generation of command buffers Language Front-end Compilers Initially GLSL Loadable debug and validation layers Vulkan 1.0 provides access to OpenGL ES 3.1 / OpenGL 4.X-class GPU functionality but with increased performance and flexibility Loadable Layers No error handling overhead in production code SPIR-V Pre-compiled Shaders: No front-end compiler in driver Future shading language flexibility Simpler drivers: Improved efficiency/performance Reduced CPU bottlenecks Lower latency Increased portability Graphics, compute and DMA queues: Work dispatch flexibility Command Buffers: Command creation can be multi-threaded Multiple CPU cores increase performance Resource management in app code: Less hitches and surprises Vulkan Benefits SPIR-V pre-compiled shaders
  • 19. © Copyright Khronos Group 2017 - Page 19© Copyright Khronos Group 2017 - Page 19© Copyright Khronos Group 2017 - Page 19© Copyright Khronos Group 2017 - Page 19 Vulkan Influence on OpenCL Roadmap • OpenCL converging with Vulkan – expanding beyond graphics + more processor types - Thin, powerful, explicit run-time for control and predictability - Feature sets and dial-able precision for target market agility - Installable tools and three layer ecosystem for flexibility - Vulkan renderpasses are already a way to enabled tiled processing • OpenCL also considering optional reduced precision for vision and inferencing - Will enable significant number of DSP implementations to become conformant Thin, explicit run-time with rigorous memory/execution model. Low-latency, fine-grain pre-emption and synchronization Dial-able types and precision Features that can be enabled for particular target markets Real-time Pre- emption and QoS scheduling Explicit Asynch DMA Self-synchronized, self-scheduled graphs Stream Processing … Math Libraries Vendor-supplied and open source middleware Language Front-ends Tool Layers Installable tool & validation layers Applications API Definitions
  • 20. © Copyright Khronos Group 2017 - Page 20© Copyright Khronos Group 2017 - Page 20© Copyright Khronos Group 2017 - Page 20© Copyright Khronos Group 2017 - Page 20 SPIR-V Ecosystem LLVM Third party kernel and shader Languages SPIR-V • Khronos defined and controlled cross-API intermediate language • Native support for graphics and parallel constructs • 32-bit Word Stream • Extensible and easily parsed • Retains data object and control flow information for effective code generation and translation OpenCL C++OpenCL C GLSL Khronos has open sourced these tools and translators IHV Driver Runtimes Other Intermediate Forms SPIR-V Validator SPIR-V (Dis)Assembler LLVM to SPIR-V Bi-directional Translator Khronos plans to open source these tools soon HLSL https://github.com/KhronosGroup/SPIRV-Tools ‘glslang’ GLSL to SPIR-V compiler Front-end compilers leverage clang
  • 21. © Copyright Khronos Group 2017 - Page 21© Copyright Khronos Group 2017 - Page 21© Copyright Khronos Group 2017 - Page 21© Copyright Khronos Group 2017 - Page 21 Possible Convergence of Graph Technologies • API-created Graphs such as OpenVX benefit from flexibility of user-programmed nodes - OpenVX Tiling extension lets them participate in tiled/fused optimizations - But currently user nodes can run only on the CPU • Perhaps use a C++ language, such such as a Halide algorithm, to program user nodes - That can be offloaded and scheduled within the OpenVX graph - Perhaps use SPIR-V to define node capabilities and store portable Node programs Vision Node Vision Node Vision Node Downstream Application Processing Native Camera Control CNN Nodes Discussion – should an OpenVX user programmed node in a C++ domain specific language may have its executable stored as SPIR-V? Programmed User Node
  • 22. © Copyright Khronos Group 2017 - Page 22© Copyright Khronos Group 2017 - Page 22© Copyright Khronos Group 2017 - Page 22© Copyright Khronos Group 2017 - Page 22 Safety Critical APIs New Generation APIs for safety certifiable vision, graphics and compute e.g. ISO 26262 and DO-178B/C OpenGL ES 1.0 - 2003 Fixed function graphics OpenGL ES 2.0 - 2007 Shader programmable pipeline OpenGL SC 1.0 - 2005 Fixed function graphics subset OpenGL SC 2.0 - April 2016 Shader programmable pipeline subset Experience and Guidelines Vulkan SC being discussed Small driver size Advanced functionality Graphics and compute OpenVX SC 1.1 Released Today Restricted “deployment” implementation executes on the target hardware by reading the binary format and executing the pre- compiled graphs Khronos SCAP ‘Safety Critical Advisory Panel’ Guidelines for designing APIs that ease system certification. Open to Khronos member AND industry experts. If interested to join contact ntrevett@nvidia.com
  • 23. © Copyright Khronos Group 2017 - Page 23 © Copyright Khronos Group 2017 - Page 23 Key Takeaways and What’s Next? • Vision Tools and APIs are becoming increasingly sophisticated - Ecosystem is layering libraries, language and run-times • Compiler technologies becoming increasingly central – especially C++ - To enable graph-based language-based solutions with kernel fusion - LLVM and Clang open source projects being used by many • Safety-critical APIs becoming increasingly essential - Many vision applications need system certification • Still no cross-vendor camera APIs? - Is the time yet right for this to be a target for standardization? • Please join if your company interested in Khronos open standards - Neil Trevett | ntrevett@nvidia.com | @neilt3d