SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Downloaden Sie, um offline zu lesen
© Copyright 2019 Xilinx
Ashish Sirasao
Fellow, Accelerated Computing
ashish.sirasao@xilinx.com
Xilinx Inference Solution
for Deep Learning
© Copyright 2019 Xilinx
Deep Learning Models – A broad spectrum
• Feature Extraction
• Object Detection
• Image Segmentation
Convolutional Neural Network
• Sequence and Temporal Data
• Speech to Text
• Language Translation
Recurrent Neural Network
• Classification
• Universal Function Approximator
• Autoencoder
Multi-Layer Perceptron
Object Detection SegmentationClassification
“Dog”
Page 2
© Copyright 2019 Xilinx
Xilinx – Focus on Inference
Page 3
© Copyright 2019 Xilinx
Deep learning resurgence - Till 2015
LeNet-5: 1998 AlexNet: 2012
VGG-Net: 2014 ResNet: 2015GoogLeNet: 2014
>> 4
© Copyright 2019 Xilinx
Rapid Algorithmic Changes – 2015 - 2018
>> 5
© Copyright 2019 Xilinx
Deep Learning on Xilinx Adaptable Devices
>> 6
• 2D Array of MACs
• Flexible on-chip memory access
• High Bandwidth, Multiple Access Ports
Data Parallel
• Near Memory Compute
• Programmable routing for data & filter reuse
Custom Memory
Hierarchy
• Flexible Data Types
• FP32/16, INT16/8/4/2, Binary/Ternary
• Sparsity friendly compute
Compression &
Sparsity
• Scalable device family for different applications
• Built in System functions – Networking, Video, ARM
Broad Device
Range
© Copyright 2019 Xilinx
ALVEO Data Center Workloads
>> 7
*GoogleNet v1
https://www.xilinx.com/products/boards-and-kits/alveo.html
© Copyright 2019 Xilinx
Variable Precision Compute Density – TOPs on VU9P
>> 8
Weight/Activation VU9P
MAX FLOAT (SP)/ FLOAT (SP) 2.18538
MAX FP16/FP16 5.64
MAX 8b/8b 17.48304
XFP8 (1,3,4) 23.60306
MAX 4b/4b - DSP 25.71035
XFP7 (1,3,3) 34.96608
MAX 4b/8b 41.72917
MAX 4b/4b - LUT 81.65384
MAX 2b/8b 92.10951
MAX T/8b 92.10951
MAX B/8b 92.10951
MAX B/4b 160.7017
MAX 2b/2b 314.7075
MAX B/B 686.6345
1 10 100 1000
MAX FLOAT (SP)/ FLOAT (SP)
MAX FP16/FP16
MAX 8b/8b
XFP8 (1,3,4)
MAX 4b/4b - DSP
XFP7 (1,3,3)
MAX 4b/8b
MAX 4b/4b - LUT
MAX 2b/8b
MAX T/8b
MAX B/8b
MAX B/4b
MAX 2b/2b
MAX B/B
MAX TOPs (Log Scale)
VU9P
MAX TOPs Estimates at 700 MHz FMAX
© Copyright 2019 Xilinx
>> 9
© Copyright 2019 Xilinx
Customized overlays with ISA architecture for optimized implementation
Easy plug and play with Software Stack
Overlay Architecture
Custom Processors Exploiting Xilinx FPGA Flexibility
MLP Engine
Scalable sparse and dense
implementation*
xDNN – CNN Engine for Large 16 nm
Xilinx Devices**
Deephi DPU – Flexible CNN Engine
with Embedded Focus
CHaiDNN – HLS based open source
offering***
Deephi ESE
LSTM Speech to Text
engine
Random Forest
Configurable RF
classification
*https://github.com/Xilinx/gemx
** https://github.com/Xilinx/ml-suite
*** https://github.com/Xilinx/CHaiDNN
© Copyright 2019 Xilinx
Inference Optimization Techniques
Hotchips 2018 Tutorial – Michaela Blott, Xilinx Inc
>> 11
© Copyright 2019 Xilinx
Model Pruning and Integer Arithmetic - Mainstream
• RNN Models – 5x to 20x
• CNN Models – 30% to 10x
Model Compression provides compute abd memory gains
• 8 bit solution loses no significant accuracy
• BNNs are improving rapidly
• Near consensus that inference can be very low precision
• Image / CNN: 2-bit (binary)
• Speech / RNN: 3-bit (ternary)
Increasing Accuracy of Reduced Precision CNNs & BNNs
© Copyright 2019 Xilinx
Whole Application Acceleration
Example - Smart City/Surveillance
Efficient AI Deployment Requires Full Application Optimization
© Copyright 2019 Xilinx
Xilinx AI Development Stack – Edge to Cloud
>> 14
Edge/Embedded Cloud/DC
Platforms Z7020 Board Z7020 SOM ZU2/3 SOM ZU2/3 Card
ZU9 Card ZCU102 ZCU104 Ultra96
Xilinx U200, U250
FPGA IP Deephi DPU xDNN v2 and v3
Deephi Runtime
Software Stack
xfDNN Runtime
Deephi Compiler xfDNN Compiler
Pruning and Quantization(Caffe and Tensorflow EA)
Models 20+ pruned / customized / basic models
Deephi LSTM
SDSoC SDAccel
© Copyright 2019 Xilinx
>> 15
Xilinx Tensor Processor: An Inference Engine,
Network Compiler + Runtime for Xilinx FPGAs
© Copyright 2019 Xilinx
>> 16
© Copyright 2019 Xilinx
xDNN Performance Comparison – Batch of 1
>> 17
https://www.xilinx.com/support/documentation/white_papers/wp504-accel-dnns.pdf
© Copyright 2019 Xilinx
https://github.com/Xilinx/ml-suite
Server Platforms
Intel x86, AMD Epyc,
Power9, ARM
FaaS
AWS F1, Nimbix,
Ali Cloud, Huawei
Xilinx SDx Boards
ALVEO U200
ALVEO U250
ALVEO 280
>> 18
© Copyright 2019 Xilinx
Python interface to simplify xdnn usage
Blocking
Non-Blocking 8-FPGA
© Copyright 2019 XilinxAMD / XILINX CONFIDENTIAL
Demo successfully shown at Xilinx Developer Forum
Most photographed demo
GoogLeNet Performance
30,000 images/sec, Int8, Batch 1, XDNN v3
Final softmax and FC layers running on AMD CPU overlapped with FPGA
using optimized OpenBLAS
Single U250 performance with XDNN v3 for GoogLeNet
Massive Scaleout
EPYC BOXX + 8 ALVEO U250
0
5000
10000
15000
20000
25000
30000
35000
Single FPGA 8 FPGAs
GoogLeNet Performance
(img/sec) Int8, Batch 1
XDNN v2 XDNN v3
PL Kernel Peak TOP/s
(Int8)
Latency
(ms)
Images/sec
4 Kernels--Throughput 19.088 1.82 4127
4 kernels – Low
Latency
19.088 1.18 3389
© Copyright 2019 Xilinx
Ready to use Algorithms – Evaluate Baseline Models
Face
Object Detection,
Landmarks,
Recognition and Anti-
spoofing
People
Object Detection, Pose
estimation,
Re-identification
Video Analytics
Object Detection, Multi-
object tracking
Attribute – Person, Car,
Text – Plate number
Segmentation
Scene parsing,
lane detection
Medical Imaging
Cervical cancer classification,
guide-wire detection, cell
segmentation
Satellite Imaging
Object detection,
Accelerated pre and post
processing>> 21
© Copyright 2019 Xilinx
Model Compression – Enabling Next Level of Performance
Classification Networks
Baseline Pruning Result 1 Pruning Result 2
Top-5 Top-5 ΔTop5 ratio Top-5 ΔTop5 ratio
Resnet50 [7.7G] 91.65% 91.23% -0.42% 40% 90.79% -0.86% 32%
Inception_v2 [4.0G] 91.07% 90.37% -0.70% 60% 90.07% -1.00% 55%
SqueezeNet [778M] 83.19% 82.46% -0.73% 89% 81.57% -1.62% 75%
Detection Networks
Baseline
mAP
Pruning Result 1 Pruning Result 2
mAP ΔmAP ratio mAP ΔmAP ratio
DetectNet [17.5G] 44.46 45.7 +1.24 63% 45.12 +0.66 50%
SSD+VGG [ 117G] 61.5 62.0 +0.5 16% 60.4 -1.1 10%
[A] SSD+VGG [ 173G] 57.1 58.7 +1.6 40% 56.6 -0.5 12%
[B] Yolov2 [ 198G] 80.4 81.9 +1.5 28% 79.2 -1.2 7%
Segmentation Networks
Baseline Pruning Result 1 Pruning Result 2
mIoU mIoU ΔmIoU ratio mIoU ΔmIoU ratio
FPN [163G] 65.69% 65.21% -0.48% 80% 64.07% -1.62% 60%
© Copyright 2019 Xilinx
Xilinx VERSAL – Breakthrough AI Inference Performance
>> 23
https://www.xilinx.com/products/silicon-devices/acap/versal.html
© Copyright 2019 Xilinx
Adaptable.
Intelligent.

Weitere ähnliche Inhalte

Was ist angesagt?

20190613 - IBM Cloud Côte d'Azur meetup - "Cloud & Containers"
20190613 - IBM Cloud Côte d'Azur meetup - "Cloud & Containers"20190613 - IBM Cloud Côte d'Azur meetup - "Cloud & Containers"
20190613 - IBM Cloud Côte d'Azur meetup - "Cloud & Containers"
IBM France Lab
 
“Vitis and Vitis AI: Application Acceleration from Cloud to Edge,” a Presenta...
“Vitis and Vitis AI: Application Acceleration from Cloud to Edge,” a Presenta...“Vitis and Vitis AI: Application Acceleration from Cloud to Edge,” a Presenta...
“Vitis and Vitis AI: Application Acceleration from Cloud to Edge,” a Presenta...
Edge AI and Vision Alliance
 

Was ist angesagt? (20)

"Update on Khronos Standards for Vision and Machine Learning," a Presentation...
"Update on Khronos Standards for Vision and Machine Learning," a Presentation..."Update on Khronos Standards for Vision and Machine Learning," a Presentation...
"Update on Khronos Standards for Vision and Machine Learning," a Presentation...
 
"SoCs for Computer Vision-enabled IoT Devices," a March 2019 Silicon Valle...
 	 "SoCs for Computer Vision-enabled IoT Devices," a March 2019 Silicon Valle... 	 "SoCs for Computer Vision-enabled IoT Devices," a March 2019 Silicon Valle...
"SoCs for Computer Vision-enabled IoT Devices," a March 2019 Silicon Valle...
 
Superfluid networking for 5G: vision and state of the art
Superfluid networking for 5G: vision and state of the artSuperfluid networking for 5G: vision and state of the art
Superfluid networking for 5G: vision and state of the art
 
"Developing Real-time Video Applications with CoaXPress," A Presentation from...
"Developing Real-time Video Applications with CoaXPress," A Presentation from..."Developing Real-time Video Applications with CoaXPress," A Presentation from...
"Developing Real-time Video Applications with CoaXPress," A Presentation from...
 
"Update on Khronos Standards for Vision and Machine Learning," a Presentation...
"Update on Khronos Standards for Vision and Machine Learning," a Presentation..."Update on Khronos Standards for Vision and Machine Learning," a Presentation...
"Update on Khronos Standards for Vision and Machine Learning," a Presentation...
 
"Designing Deep Neural Network Algorithms for Embedded Devices," a Presentati...
"Designing Deep Neural Network Algorithms for Embedded Devices," a Presentati..."Designing Deep Neural Network Algorithms for Embedded Devices," a Presentati...
"Designing Deep Neural Network Algorithms for Embedded Devices," a Presentati...
 
2016 open-source-network-softwarization
2016 open-source-network-softwarization2016 open-source-network-softwarization
2016 open-source-network-softwarization
 
Feec telecom-nw-softwarization-aug-2015
Feec telecom-nw-softwarization-aug-2015Feec telecom-nw-softwarization-aug-2015
Feec telecom-nw-softwarization-aug-2015
 
Open Source Edge Computing Platforms - Overview
Open Source Edge Computing Platforms - OverviewOpen Source Edge Computing Platforms - Overview
Open Source Edge Computing Platforms - Overview
 
IBM Bluemix Nice meetup #5 - 20170504 - Orchestrer Docker avec Kubernetes
IBM Bluemix Nice meetup #5 - 20170504 - Orchestrer Docker avec KubernetesIBM Bluemix Nice meetup #5 - 20170504 - Orchestrer Docker avec Kubernetes
IBM Bluemix Nice meetup #5 - 20170504 - Orchestrer Docker avec Kubernetes
 
Necos keynote UFRN Telecomday
Necos keynote UFRN TelecomdayNecos keynote UFRN Telecomday
Necos keynote UFRN Telecomday
 
Edge Computing: A Unified Infrastructure for all the Different Pieces
Edge Computing: A Unified Infrastructure for all the Different PiecesEdge Computing: A Unified Infrastructure for all the Different Pieces
Edge Computing: A Unified Infrastructure for all the Different Pieces
 
20190613 - IBM Cloud Côte d'Azur meetup - "Cloud & Containers"
20190613 - IBM Cloud Côte d'Azur meetup - "Cloud & Containers"20190613 - IBM Cloud Côte d'Azur meetup - "Cloud & Containers"
20190613 - IBM Cloud Côte d'Azur meetup - "Cloud & Containers"
 
IoT Microservices at the Edge with Eclipse ioFog
IoT Microservices at the Edge with Eclipse ioFogIoT Microservices at the Edge with Eclipse ioFog
IoT Microservices at the Edge with Eclipse ioFog
 
“Vitis and Vitis AI: Application Acceleration from Cloud to Edge,” a Presenta...
“Vitis and Vitis AI: Application Acceleration from Cloud to Edge,” a Presenta...“Vitis and Vitis AI: Application Acceleration from Cloud to Edge,” a Presenta...
“Vitis and Vitis AI: Application Acceleration from Cloud to Edge,” a Presenta...
 
Virtualization and Migration in Cloud - Edge Computing models using OpenStack...
Virtualization and Migration in Cloud - Edge Computing models using OpenStack...Virtualization and Migration in Cloud - Edge Computing models using OpenStack...
Virtualization and Migration in Cloud - Edge Computing models using OpenStack...
 
“Streamlining Development of Edge AI Applications,” a Presentation from NVIDIA
“Streamlining Development of Edge AI Applications,” a Presentation from NVIDIA“Streamlining Development of Edge AI Applications,” a Presentation from NVIDIA
“Streamlining Development of Edge AI Applications,” a Presentation from NVIDIA
 
NFV
NFVNFV
NFV
 
CPaaS.io Y1 Review Meeting - Cloud & Edge Programming
CPaaS.io Y1 Review Meeting - Cloud & Edge ProgrammingCPaaS.io Y1 Review Meeting - Cloud & Edge Programming
CPaaS.io Y1 Review Meeting - Cloud & Edge Programming
 
Orchestrate Your AI Workload with Cisco Hyperflex, Powered by NVIDIA GPUs
Orchestrate Your AI Workload with Cisco Hyperflex, Powered by NVIDIA GPUs Orchestrate Your AI Workload with Cisco Hyperflex, Powered by NVIDIA GPUs
Orchestrate Your AI Workload with Cisco Hyperflex, Powered by NVIDIA GPUs
 

Ähnlich wie Xilinx Inference solution for DL using OpenPOWER systems

HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
Linaro
 
Consumption Based On-Demand Private Cloud in a Box
Consumption Based On-Demand Private Cloud in a BoxConsumption Based On-Demand Private Cloud in a Box
Consumption Based On-Demand Private Cloud in a Box
Rebekah Rodriguez
 
"Portable Performance via the OpenVX Computer Vision Library: Case Studies," ...
"Portable Performance via the OpenVX Computer Vision Library: Case Studies," ..."Portable Performance via the OpenVX Computer Vision Library: Case Studies," ...
"Portable Performance via the OpenVX Computer Vision Library: Case Studies," ...
Edge AI and Vision Alliance
 

Ähnlich wie Xilinx Inference solution for DL using OpenPOWER systems (20)

Fast, Scalable Quantized Neural Network Inference on FPGAs with FINN and Logi...
Fast, Scalable Quantized Neural Network Inference on FPGAs with FINN and Logi...Fast, Scalable Quantized Neural Network Inference on FPGAs with FINN and Logi...
Fast, Scalable Quantized Neural Network Inference on FPGAs with FINN and Logi...
 
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
 
Xilinx Edge Compute using Power 9 /OpenPOWER systems
Xilinx Edge Compute using Power 9 /OpenPOWER systemsXilinx Edge Compute using Power 9 /OpenPOWER systems
Xilinx Edge Compute using Power 9 /OpenPOWER systems
 
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
 
Xilinx Data Center Strategy and CCIX
Xilinx Data Center Strategy and CCIXXilinx Data Center Strategy and CCIX
Xilinx Data Center Strategy and CCIX
 
HPE and NVIDIA empowering AI and IoT
HPE and NVIDIA empowering AI and IoTHPE and NVIDIA empowering AI and IoT
HPE and NVIDIA empowering AI and IoT
 
Consumption Based On-Demand Private Cloud in a Box
Consumption Based On-Demand Private Cloud in a BoxConsumption Based On-Demand Private Cloud in a Box
Consumption Based On-Demand Private Cloud in a Box
 
Sundance's presentation at B:RAI 2020
Sundance's presentation at B:RAI 2020Sundance's presentation at B:RAI 2020
Sundance's presentation at B:RAI 2020
 
Dov Nimratz, Roman Chobik "Embedded artificial intelligence"
Dov Nimratz, Roman Chobik "Embedded artificial intelligence"Dov Nimratz, Roman Chobik "Embedded artificial intelligence"
Dov Nimratz, Roman Chobik "Embedded artificial intelligence"
 
FPGA Conference 2021: Breaking the TOPS ceiling with sparse neural networks -...
FPGA Conference 2021: Breaking the TOPS ceiling with sparse neural networks -...FPGA Conference 2021: Breaking the TOPS ceiling with sparse neural networks -...
FPGA Conference 2021: Breaking the TOPS ceiling with sparse neural networks -...
 
Hai Tao at AI Frontiers: Deep Learning For Embedded Vision System
Hai Tao at AI Frontiers: Deep Learning For Embedded Vision SystemHai Tao at AI Frontiers: Deep Learning For Embedded Vision System
Hai Tao at AI Frontiers: Deep Learning For Embedded Vision System
 
Kevin Shaw at AI Frontiers: AI on the Edge: Bringing Intelligence to Small De...
Kevin Shaw at AI Frontiers: AI on the Edge: Bringing Intelligence to Small De...Kevin Shaw at AI Frontiers: AI on the Edge: Bringing Intelligence to Small De...
Kevin Shaw at AI Frontiers: AI on the Edge: Bringing Intelligence to Small De...
 
“Trends in Neural Network Topologies for Vision at the Edge,” a Presentation ...
“Trends in Neural Network Topologies for Vision at the Edge,” a Presentation ...“Trends in Neural Network Topologies for Vision at the Edge,” a Presentation ...
“Trends in Neural Network Topologies for Vision at the Edge,” a Presentation ...
 
HiPEAC 2019 Workshop - Vision Processing
HiPEAC 2019 Workshop - Vision ProcessingHiPEAC 2019 Workshop - Vision Processing
HiPEAC 2019 Workshop - Vision Processing
 
Mellnox Interconnect presentation in OpenPOWER Brazil workshop
Mellnox Interconnect presentation in OpenPOWER Brazil workshopMellnox Interconnect presentation in OpenPOWER Brazil workshop
Mellnox Interconnect presentation in OpenPOWER Brazil workshop
 
OCP U.S. Summit 2017 Presentation
OCP U.S. Summit 2017 PresentationOCP U.S. Summit 2017 Presentation
OCP U.S. Summit 2017 Presentation
 
"Portable Performance via the OpenVX Computer Vision Library: Case Studies," ...
"Portable Performance via the OpenVX Computer Vision Library: Case Studies," ..."Portable Performance via the OpenVX Computer Vision Library: Case Studies," ...
"Portable Performance via the OpenVX Computer Vision Library: Case Studies," ...
 
Re-Vision stack presentation
Re-Vision stack presentationRe-Vision stack presentation
Re-Vision stack presentation
 
ML for embedded systems at the edge - NXP and Arm - FINAL.pdf
ML for embedded systems at the edge - NXP and Arm - FINAL.pdfML for embedded systems at the edge - NXP and Arm - FINAL.pdf
ML for embedded systems at the edge - NXP and Arm - FINAL.pdf
 
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP..."Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
 

Mehr von Ganesan Narayanasamy

180 nm Tape out experience using Open POWER ISA
180 nm Tape out experience using Open POWER ISA180 nm Tape out experience using Open POWER ISA
180 nm Tape out experience using Open POWER ISA
Ganesan Narayanasamy
 

Mehr von Ganesan Narayanasamy (20)

Chip Design Curriculum development Residency program
Chip Design Curriculum development Residency programChip Design Curriculum development Residency program
Chip Design Curriculum development Residency program
 
Basics of Digital Design and Verilog
Basics of Digital Design and VerilogBasics of Digital Design and Verilog
Basics of Digital Design and Verilog
 
180 nm Tape out experience using Open POWER ISA
180 nm Tape out experience using Open POWER ISA180 nm Tape out experience using Open POWER ISA
180 nm Tape out experience using Open POWER ISA
 
Workload Transformation and Innovations in POWER Architecture
Workload Transformation and Innovations in POWER Architecture Workload Transformation and Innovations in POWER Architecture
Workload Transformation and Innovations in POWER Architecture
 
OpenPOWER Workshop at IIT Roorkee
OpenPOWER Workshop at IIT RoorkeeOpenPOWER Workshop at IIT Roorkee
OpenPOWER Workshop at IIT Roorkee
 
Deep Learning Use Cases using OpenPOWER systems
Deep Learning Use Cases using OpenPOWER systemsDeep Learning Use Cases using OpenPOWER systems
Deep Learning Use Cases using OpenPOWER systems
 
IBM BOA for POWER
IBM BOA for POWER IBM BOA for POWER
IBM BOA for POWER
 
OpenPOWER System Marconi100
OpenPOWER System Marconi100OpenPOWER System Marconi100
OpenPOWER System Marconi100
 
OpenPOWER Latest Updates
OpenPOWER Latest UpdatesOpenPOWER Latest Updates
OpenPOWER Latest Updates
 
POWER10 innovations for HPC
POWER10 innovations for HPCPOWER10 innovations for HPC
POWER10 innovations for HPC
 
Deeplearningusingcloudpakfordata
DeeplearningusingcloudpakfordataDeeplearningusingcloudpakfordata
Deeplearningusingcloudpakfordata
 
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
 
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systemsAI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
 
AI in healthcare - Use Cases
AI in healthcare - Use Cases AI in healthcare - Use Cases
AI in healthcare - Use Cases
 
AI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systemsAI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systems
 
AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems
 
Poster from NUS
Poster from NUSPoster from NUS
Poster from NUS
 
SAP HANA on POWER9 systems
SAP HANA on POWER9 systemsSAP HANA on POWER9 systems
SAP HANA on POWER9 systems
 
Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9
 
AI in the enterprise
AI in the enterprise AI in the enterprise
AI in the enterprise
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Kürzlich hochgeladen (20)

ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 

Xilinx Inference solution for DL using OpenPOWER systems

  • 1. © Copyright 2019 Xilinx Ashish Sirasao Fellow, Accelerated Computing ashish.sirasao@xilinx.com Xilinx Inference Solution for Deep Learning
  • 2. © Copyright 2019 Xilinx Deep Learning Models – A broad spectrum • Feature Extraction • Object Detection • Image Segmentation Convolutional Neural Network • Sequence and Temporal Data • Speech to Text • Language Translation Recurrent Neural Network • Classification • Universal Function Approximator • Autoencoder Multi-Layer Perceptron Object Detection SegmentationClassification “Dog” Page 2
  • 3. © Copyright 2019 Xilinx Xilinx – Focus on Inference Page 3
  • 4. © Copyright 2019 Xilinx Deep learning resurgence - Till 2015 LeNet-5: 1998 AlexNet: 2012 VGG-Net: 2014 ResNet: 2015GoogLeNet: 2014 >> 4
  • 5. © Copyright 2019 Xilinx Rapid Algorithmic Changes – 2015 - 2018 >> 5
  • 6. © Copyright 2019 Xilinx Deep Learning on Xilinx Adaptable Devices >> 6 • 2D Array of MACs • Flexible on-chip memory access • High Bandwidth, Multiple Access Ports Data Parallel • Near Memory Compute • Programmable routing for data & filter reuse Custom Memory Hierarchy • Flexible Data Types • FP32/16, INT16/8/4/2, Binary/Ternary • Sparsity friendly compute Compression & Sparsity • Scalable device family for different applications • Built in System functions – Networking, Video, ARM Broad Device Range
  • 7. © Copyright 2019 Xilinx ALVEO Data Center Workloads >> 7 *GoogleNet v1 https://www.xilinx.com/products/boards-and-kits/alveo.html
  • 8. © Copyright 2019 Xilinx Variable Precision Compute Density – TOPs on VU9P >> 8 Weight/Activation VU9P MAX FLOAT (SP)/ FLOAT (SP) 2.18538 MAX FP16/FP16 5.64 MAX 8b/8b 17.48304 XFP8 (1,3,4) 23.60306 MAX 4b/4b - DSP 25.71035 XFP7 (1,3,3) 34.96608 MAX 4b/8b 41.72917 MAX 4b/4b - LUT 81.65384 MAX 2b/8b 92.10951 MAX T/8b 92.10951 MAX B/8b 92.10951 MAX B/4b 160.7017 MAX 2b/2b 314.7075 MAX B/B 686.6345 1 10 100 1000 MAX FLOAT (SP)/ FLOAT (SP) MAX FP16/FP16 MAX 8b/8b XFP8 (1,3,4) MAX 4b/4b - DSP XFP7 (1,3,3) MAX 4b/8b MAX 4b/4b - LUT MAX 2b/8b MAX T/8b MAX B/8b MAX B/4b MAX 2b/2b MAX B/B MAX TOPs (Log Scale) VU9P MAX TOPs Estimates at 700 MHz FMAX
  • 9. © Copyright 2019 Xilinx >> 9
  • 10. © Copyright 2019 Xilinx Customized overlays with ISA architecture for optimized implementation Easy plug and play with Software Stack Overlay Architecture Custom Processors Exploiting Xilinx FPGA Flexibility MLP Engine Scalable sparse and dense implementation* xDNN – CNN Engine for Large 16 nm Xilinx Devices** Deephi DPU – Flexible CNN Engine with Embedded Focus CHaiDNN – HLS based open source offering*** Deephi ESE LSTM Speech to Text engine Random Forest Configurable RF classification *https://github.com/Xilinx/gemx ** https://github.com/Xilinx/ml-suite *** https://github.com/Xilinx/CHaiDNN
  • 11. © Copyright 2019 Xilinx Inference Optimization Techniques Hotchips 2018 Tutorial – Michaela Blott, Xilinx Inc >> 11
  • 12. © Copyright 2019 Xilinx Model Pruning and Integer Arithmetic - Mainstream • RNN Models – 5x to 20x • CNN Models – 30% to 10x Model Compression provides compute abd memory gains • 8 bit solution loses no significant accuracy • BNNs are improving rapidly • Near consensus that inference can be very low precision • Image / CNN: 2-bit (binary) • Speech / RNN: 3-bit (ternary) Increasing Accuracy of Reduced Precision CNNs & BNNs
  • 13. © Copyright 2019 Xilinx Whole Application Acceleration Example - Smart City/Surveillance Efficient AI Deployment Requires Full Application Optimization
  • 14. © Copyright 2019 Xilinx Xilinx AI Development Stack – Edge to Cloud >> 14 Edge/Embedded Cloud/DC Platforms Z7020 Board Z7020 SOM ZU2/3 SOM ZU2/3 Card ZU9 Card ZCU102 ZCU104 Ultra96 Xilinx U200, U250 FPGA IP Deephi DPU xDNN v2 and v3 Deephi Runtime Software Stack xfDNN Runtime Deephi Compiler xfDNN Compiler Pruning and Quantization(Caffe and Tensorflow EA) Models 20+ pruned / customized / basic models Deephi LSTM SDSoC SDAccel
  • 15. © Copyright 2019 Xilinx >> 15 Xilinx Tensor Processor: An Inference Engine, Network Compiler + Runtime for Xilinx FPGAs
  • 16. © Copyright 2019 Xilinx >> 16
  • 17. © Copyright 2019 Xilinx xDNN Performance Comparison – Batch of 1 >> 17 https://www.xilinx.com/support/documentation/white_papers/wp504-accel-dnns.pdf
  • 18. © Copyright 2019 Xilinx https://github.com/Xilinx/ml-suite Server Platforms Intel x86, AMD Epyc, Power9, ARM FaaS AWS F1, Nimbix, Ali Cloud, Huawei Xilinx SDx Boards ALVEO U200 ALVEO U250 ALVEO 280 >> 18
  • 19. © Copyright 2019 Xilinx Python interface to simplify xdnn usage Blocking Non-Blocking 8-FPGA
  • 20. © Copyright 2019 XilinxAMD / XILINX CONFIDENTIAL Demo successfully shown at Xilinx Developer Forum Most photographed demo GoogLeNet Performance 30,000 images/sec, Int8, Batch 1, XDNN v3 Final softmax and FC layers running on AMD CPU overlapped with FPGA using optimized OpenBLAS Single U250 performance with XDNN v3 for GoogLeNet Massive Scaleout EPYC BOXX + 8 ALVEO U250 0 5000 10000 15000 20000 25000 30000 35000 Single FPGA 8 FPGAs GoogLeNet Performance (img/sec) Int8, Batch 1 XDNN v2 XDNN v3 PL Kernel Peak TOP/s (Int8) Latency (ms) Images/sec 4 Kernels--Throughput 19.088 1.82 4127 4 kernels – Low Latency 19.088 1.18 3389
  • 21. © Copyright 2019 Xilinx Ready to use Algorithms – Evaluate Baseline Models Face Object Detection, Landmarks, Recognition and Anti- spoofing People Object Detection, Pose estimation, Re-identification Video Analytics Object Detection, Multi- object tracking Attribute – Person, Car, Text – Plate number Segmentation Scene parsing, lane detection Medical Imaging Cervical cancer classification, guide-wire detection, cell segmentation Satellite Imaging Object detection, Accelerated pre and post processing>> 21
  • 22. © Copyright 2019 Xilinx Model Compression – Enabling Next Level of Performance Classification Networks Baseline Pruning Result 1 Pruning Result 2 Top-5 Top-5 ΔTop5 ratio Top-5 ΔTop5 ratio Resnet50 [7.7G] 91.65% 91.23% -0.42% 40% 90.79% -0.86% 32% Inception_v2 [4.0G] 91.07% 90.37% -0.70% 60% 90.07% -1.00% 55% SqueezeNet [778M] 83.19% 82.46% -0.73% 89% 81.57% -1.62% 75% Detection Networks Baseline mAP Pruning Result 1 Pruning Result 2 mAP ΔmAP ratio mAP ΔmAP ratio DetectNet [17.5G] 44.46 45.7 +1.24 63% 45.12 +0.66 50% SSD+VGG [ 117G] 61.5 62.0 +0.5 16% 60.4 -1.1 10% [A] SSD+VGG [ 173G] 57.1 58.7 +1.6 40% 56.6 -0.5 12% [B] Yolov2 [ 198G] 80.4 81.9 +1.5 28% 79.2 -1.2 7% Segmentation Networks Baseline Pruning Result 1 Pruning Result 2 mIoU mIoU ΔmIoU ratio mIoU ΔmIoU ratio FPN [163G] 65.69% 65.21% -0.48% 80% 64.07% -1.62% 60%
  • 23. © Copyright 2019 Xilinx Xilinx VERSAL – Breakthrough AI Inference Performance >> 23 https://www.xilinx.com/products/silicon-devices/acap/versal.html
  • 24. © Copyright 2019 Xilinx Adaptable. Intelligent.