SlideShare ist ein Scribd-Unternehmen logo
1 von 45
Downloaden Sie, um offline zu lesen
FPGA Hardware Accelerator for Machine Learning
-Swami
1
• Number of ML publications are growing exponentially at a faster rate than Moore’s law!
Machine Learning Arxiv Papers per Year
• Moore's law: number of transistors in a dense integrated circuit doubles about every two years.
2
“AI is the new electricity” – Andrew Ng
Artificial intelligence is everywhere!
Just as electricity revolutionized lives 100 years ago,
AI is changing our lives completely today.
Google, Netflix, face detection, predictive searches,
recommendations, maps, autonomous cars, to name
a few, all use some form of AI to make our lives
better.
https://data-mining.philippe-fournier-viger.com/too-many-machine-learning-papers/
3
Source:javatpoint
Accelerator
• AI accelerator a class of specialized hardware accelerator designed to accelerate artificial
intelligence and machine learning applications, including artificial neural networks and machine vision.
• Hardware acceleration is the use of computer hardware designed to perform specific functions
more efficiently when compared to software running on a general-purpose central processing
unit (CPU). Any transformation of data that can be calculated in software running on a generic CPU can
also be calculated in custom-made hardware, or in some mix of both.
4
When to Use Hardware Acceleration
• Computer graphics via Graphics Processing Unit (GPU)
• Digital signal processing via Digital Signal Processor
• Analog signal processing via Field-Programmable Analog Array
• Sound processing via sound card
• Computer networking via network processor and network interface controller
• Cryptography via cryptographic accelerator and secure cryptoprocessor
• Artificial Intelligence via AI accelerator
• In-memory processing via network on a chip and systolic array
• Any given computing task via Field-Programmable Gate Arrays (FPGA), Application-Specific Integrated Circuits
(ASICs), Complex Programmable Logic Devices (CPLD), and Systems-on-Chip (SoC)
5
Most common hardware accelerators:
• Graphics Processing Units (GPUs): originally designed for handling the motion of image, GPUs are
now used for calculations involving massive amounts of data, accelerating portions of an
application while the rest continues to run on the CPU. The massive parallelism of modern GPUs
allows users to process billions of records instantly.
• Field Programmable Gate Arrays (FPGAs): a hardware description language (HDL)-specified
semiconductor integrated circuit designed to allow the user to configure a large majority of the
electrical functionality. FPGAs can be used to accelerate parts of an algorithm, sharing part of the
computation between the FPGA and a general-purpose processor.
• Application-Specific Integrated Circuits (ASICs): an integrated circuit customized specifically for a
particular purpose or application, improving overall speed as it focuses solely on performing its
one function. Maximum complexity in modern ASICs has grown to over 100 million logic gates.
6
Hardware Accelerators objectives
• Reducing the number of times values are moved from sources that have a high energy cost, such
as DRAM or large on-chip buffers; and
• Reducing the cost of moving each value
• Allocates work to as many PEs as possible so that they can operate in parallel; and
• Minimizes the number of idle cycles per PE by ensuring that there is sufficient memory bandwidth
to deliver the data that needs to be processed, the data is delivered before it is needed, and
workload imbalance among the parallel PEs is minimized.
Reuse: Input feature map reuse, Filter reuse, Convolutional reuse
Need to consider accuracy, throughput , latency , power consumption, hardware cost, flexibility,
scalability while choosing the suitable hardware and models
7
CPU, GPU, FPGA and ASIC
8
adlinktech.com
9
10
11
12
The RegNet architecture for non-rigid registration of pulmonary CT follow-up scans
Deep Learning in Pulmonology
13
Computer-Assisted Decision Support System in Pulmonary Cancer detection and stage classification on CT images
Overview of decision support system. 14
Predicting pregnancy test results after embryo transfer by image feature extraction and analysis using Machine Learning
15
Automation of early-stage human embryo development detection – Deep Learning
Embryo image classification based on AlexNet and VGG16 architectures
16
Convolution Max/Average pooling
17
Zero Padding
18
Hardware blocks for Neuron
19
https://github.com/vipinkmenon/neuralNetwork
DNN Development Resources
• Frameworks
• Models
• Popular Datasets for Classification
20
Frameworks –open-source libraries contain software libraries for DNNs
• Caffe - 2014 - UC Berkeley - It supports - C, C++, Python, and MATLAB
• Tensorflow - Google - 2015, supports C++ and Python- multiple CPUs and GPUs and
has more flexibility than Caffe PyTorch
• Torch - Facebook and NYU and supports C, C++, and Lua; PyTorch is its successor
and is built in Python
21
Models
22
23
Pre Trained Models
• LeNet-5 (1998)
• AlexNet (2012)
• VGG-16 (2014)
• Inception-v1 (2014)
• Inception-v3 (2015)
• ResNet-50 (2015)
• Xception (2016)
• Inception-v4 (2016)
Popular Datasets For Classification
• The MNIST database of handwritten digits, available from this page, has a training set of 60,000 examples, and a test set of 10,000 examples.
http://yann.lecun.com/exdb/mnist/
• The most highly-used subset of ImageNet is the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2012-2017 image classification and
localization dataset. This dataset spans 1000 object classes and contains 1,281,167 training images, 50,000 validation images and 100,000 test
images https://image-net.org/download.php 24
Hardware across the Machine Learning landscape
25
Edge Computing
• Edge computing is using a low latency network to process and return data faster to the
request sender. In edge computing, users has direct access and control over the data
process.
• While on cloud computing, users send requests and let the cloud servers do the rest of the
work. The differences may be in milliseconds, but time is important in this day and age.
26
https://www.winsystems.com
27
A typical setup
28
Development history of the neural network accelerator based on FPGA.
29
Field Programmable Gate Arrays (FPGAs) are semiconductor devices that are based around a matrix
of configurable logic blocks (CLBs) connected via programmable interconnects. FPGAs can be
reprogrammed to desired application or functionality requirements after manufacturing.
30
Tightly integrated programmable logic
Used to extend the processing system
Scalable density and performance
Complete ARM®-based processing system
Application Processor Unit (APU) Dual ARM Cortex™-A9
Caches and support blocks
Fully integrated memory controllers
I/O peripherals
Flexible array of I/O
Wide range of external multi-standard I/O
High-performance integrated serial transceivers
Analog-to-digital converter inputs
Zynq 7000 Architecture
31
Parallelism on FPGA
• Data comes in at the rate the camera output, (62 MHz)
• This is then split in to 8 parallel processes, so now the data rate is 496MOPS (Million operations per second)
• This then passes through the different stages of the processing. Now processing 2480 MOPS
• The parallelism is then removed and output to the memory at a steady rate of 62 MHz
32
33
A general architecture for the convolution layer (kernel size 33) with three different level of parallelism
CNN Architecture
34
Xilinx® Vitis™ AI
• Xilinx® Vitis™ AI is a development stack for AI inference on
Xilinx hardware platforms, including both edge devices and
Alveo cards.
• It consists of optimized IP, tools, libraries, models, and
example designs. It is designed with high efficiency and ease
of use in mind, unleashing the full potential of AI acceleration
on Xilinx FPGA and ACAP.
35
Vitis AI is composed of the following key components:
• AI Model Zoo - A comprehensive set of pre-optimized models that are ready to deploy on Xilinx devices.
• AI Optimizer - An optional model optimizer that can prune a model by up to 90%. It is separately available with
commercial licenses.
• AI Quantizer - A powerful quantizer that supports model quantization, calibration, and fine tuning.
• AI Compiler - Compiles the quantized model to a high-efficient instruction set and data flow.
• AI Profiler - Perform an in-depth analysis of the efficiency and utilization of AI inference implementation.
• AI Library - Offers high-level yet optimized C++ APIs for AI applications from edge to cloud.
• DPU - Efficient and scalable IP cores can be customized to meet the needs for many different applications
36
Vitis AI Workflow
• Model development - train models or get models from Vitis AI model zoo, use Vitis AI optimizer (optional), quantizer and
compiler to convert float models into DPU instruction files
• HW development - use Vitis tool to integrate DPU IP and other kernels with platform and generate board boot files
• SW development - implement model deployment codes using VART or Vitis AI library, finish application level SW
development and generate executable running on board.
37
System Requirements
Component Requirement
FPGA
Alveo U50, U50LV, U200, U250, U280 cards
Zynq UltraScale+ MPSoc ZCU102 and ZCU104 Boards
Versal ACAP VCK190 and VCK5000 boards
KV260
Motherboard PCI Express 3.0-compliant x16 with one or dual slot
System Power Supply 225W
Operating System
Ubuntu 16.04, 18.04, 20.04
CentOS 7.6, 7.7, 7.8, 8.1
RHEL 7.6, 7.7, 7.8, 8.1
CPU
Intel i3/i5/i7/i9/Xeon 64-bit CPU
AMD EPYC 7F52 64-bit CPU
GPU (Optional to accelerate quantization)
NVIDIA GPU supports CUDA 9.0 or higher, like NVIDIA P100,
V100
CUDA Driver (Optional to accelerate quantization)
Driver compatible to CUDA version, NVIDIA-384 or higher for
CUDA 9.0, NVIDIA-410 or higher for CUDA 10.0
Docker Version 19.03 or higher 38
39
Kria KV260 Vision AI Starter Kit
40
Deep Learning Processor Unit (DPU)
41
Zynq UltraScale+ MPSoC ZCU104 Evaluation Kit
42
43
Like poor before rich they yearn: For knowledge: the low never learn
Learning
உடையார்முன் இல்லார்ப ால் ஏக்கற்றுங் கற்றார்
கடையபே கல்லா தவர்.
செல்வர்முன் வறியவர் நிற் துப ால் (கற்றவர்முன்) ஏங்கித் தாழ்ந்து
நின்றும் கல்வி கற்றவபே உயர்ந்தவர்; கல்லாதவர் இழிந்தவர்.
References
• Vivienne Sze; Yu-Hsin Chen; Tien-Ju Yang; Joel S. Emer, Efficient Processing of Deep Neural
Networks , Morgan & Claypool, 2020.
• Xilinx.com
• https://github.com/Xilinx/Vitis-Tutorials
• https://github.com/datascienceid/deep-learning-resources
• https://github.com/josephmisiti/awesome-machine-learning
• https://github.com/ujjwalkarn/Machine-Learning-Tutorials
• https://github.com/datascienceid/machine-learning-resources
• Random pictures from Google.com
44
Thank U
45

Weitere ähnliche Inhalte

Was ist angesagt?

Parallel computing with Gpu
Parallel computing with GpuParallel computing with Gpu
Parallel computing with GpuRohit Khatana
 
Autoencoders in Deep Learning
Autoencoders in Deep LearningAutoencoders in Deep Learning
Autoencoders in Deep Learningmilad abbasi
 
Introduction to parallel computing using CUDA
Introduction to parallel computing using CUDAIntroduction to parallel computing using CUDA
Introduction to parallel computing using CUDAMartin Peniak
 
GPU - An Introduction
GPU - An IntroductionGPU - An Introduction
GPU - An IntroductionDhan V Sagar
 
Introduction to CUDA
Introduction to CUDAIntroduction to CUDA
Introduction to CUDARaymond Tay
 
Architecture of TPU, GPU and CPU
Architecture of TPU, GPU and CPUArchitecture of TPU, GPU and CPU
Architecture of TPU, GPU and CPUGlobalLogic Ukraine
 
graphics processing unit ppt
graphics processing unit pptgraphics processing unit ppt
graphics processing unit pptNitesh Dubey
 
Introduction to Deep Learning (NVIDIA)
Introduction to Deep Learning (NVIDIA)Introduction to Deep Learning (NVIDIA)
Introduction to Deep Learning (NVIDIA)Rakuten Group, Inc.
 
AMD EPYC™ Microprocessor Architecture
AMD EPYC™ Microprocessor ArchitectureAMD EPYC™ Microprocessor Architecture
AMD EPYC™ Microprocessor ArchitectureAMD
 
“Market Analysis on SoCs for Imaging, Vision and Deep Learning in Automotive ...
“Market Analysis on SoCs for Imaging, Vision and Deep Learning in Automotive ...“Market Analysis on SoCs for Imaging, Vision and Deep Learning in Automotive ...
“Market Analysis on SoCs for Imaging, Vision and Deep Learning in Automotive ...Edge AI and Vision Alliance
 
High Performance Computing
High Performance ComputingHigh Performance Computing
High Performance ComputingDell World
 
System On Chip
System On ChipSystem On Chip
System On Chipanishgoel
 
Neural networks.ppt
Neural networks.pptNeural networks.ppt
Neural networks.pptSrinivashR3
 
Sequence Modelling with Deep Learning
Sequence Modelling with Deep LearningSequence Modelling with Deep Learning
Sequence Modelling with Deep LearningNatasha Latysheva
 
Evaluating UCIe based multi-die SoC to meet timing and power
Evaluating UCIe based multi-die SoC to meet timing and power Evaluating UCIe based multi-die SoC to meet timing and power
Evaluating UCIe based multi-die SoC to meet timing and power Deepak Shankar
 
High performance computing with accelarators
High performance computing with accelaratorsHigh performance computing with accelarators
High performance computing with accelaratorsEmmanuel college
 
Deep learning: Hardware Landscape
Deep learning: Hardware LandscapeDeep learning: Hardware Landscape
Deep learning: Hardware LandscapeGrigory Sapunov
 

Was ist angesagt? (20)

GPU
GPUGPU
GPU
 
Parallel computing with Gpu
Parallel computing with GpuParallel computing with Gpu
Parallel computing with Gpu
 
Autoencoders in Deep Learning
Autoencoders in Deep LearningAutoencoders in Deep Learning
Autoencoders in Deep Learning
 
Introduction to parallel computing using CUDA
Introduction to parallel computing using CUDAIntroduction to parallel computing using CUDA
Introduction to parallel computing using CUDA
 
GPU - An Introduction
GPU - An IntroductionGPU - An Introduction
GPU - An Introduction
 
Introduction to CUDA
Introduction to CUDAIntroduction to CUDA
Introduction to CUDA
 
Architecture of TPU, GPU and CPU
Architecture of TPU, GPU and CPUArchitecture of TPU, GPU and CPU
Architecture of TPU, GPU and CPU
 
graphics processing unit ppt
graphics processing unit pptgraphics processing unit ppt
graphics processing unit ppt
 
Introduction to Deep Learning (NVIDIA)
Introduction to Deep Learning (NVIDIA)Introduction to Deep Learning (NVIDIA)
Introduction to Deep Learning (NVIDIA)
 
AMD EPYC™ Microprocessor Architecture
AMD EPYC™ Microprocessor ArchitectureAMD EPYC™ Microprocessor Architecture
AMD EPYC™ Microprocessor Architecture
 
“Market Analysis on SoCs for Imaging, Vision and Deep Learning in Automotive ...
“Market Analysis on SoCs for Imaging, Vision and Deep Learning in Automotive ...“Market Analysis on SoCs for Imaging, Vision and Deep Learning in Automotive ...
“Market Analysis on SoCs for Imaging, Vision and Deep Learning in Automotive ...
 
High Performance Computing
High Performance ComputingHigh Performance Computing
High Performance Computing
 
Lec04 gpu architecture
Lec04 gpu architectureLec04 gpu architecture
Lec04 gpu architecture
 
System On Chip
System On ChipSystem On Chip
System On Chip
 
Neural networks.ppt
Neural networks.pptNeural networks.ppt
Neural networks.ppt
 
Sequence Modelling with Deep Learning
Sequence Modelling with Deep LearningSequence Modelling with Deep Learning
Sequence Modelling with Deep Learning
 
Parallel Computing on the GPU
Parallel Computing on the GPUParallel Computing on the GPU
Parallel Computing on the GPU
 
Evaluating UCIe based multi-die SoC to meet timing and power
Evaluating UCIe based multi-die SoC to meet timing and power Evaluating UCIe based multi-die SoC to meet timing and power
Evaluating UCIe based multi-die SoC to meet timing and power
 
High performance computing with accelarators
High performance computing with accelaratorsHigh performance computing with accelarators
High performance computing with accelarators
 
Deep learning: Hardware Landscape
Deep learning: Hardware LandscapeDeep learning: Hardware Landscape
Deep learning: Hardware Landscape
 

Ähnlich wie FPGA Hardware Accelerator for Machine Learning

HiPEAC Computing Systems Week 2022_Mario Porrmann presentation
HiPEAC Computing Systems Week 2022_Mario Porrmann presentationHiPEAC Computing Systems Week 2022_Mario Porrmann presentation
HiPEAC Computing Systems Week 2022_Mario Porrmann presentationVEDLIoT Project
 
Edge AI Framework for Healthcare Applications
Edge AI Framework for Healthcare ApplicationsEdge AI Framework for Healthcare Applications
Edge AI Framework for Healthcare ApplicationsDebmalya Biswas
 
Graph Hardware Architecture - Enterprise graphs deserve great hardware!
Graph Hardware Architecture - Enterprise graphs deserve great hardware!Graph Hardware Architecture - Enterprise graphs deserve great hardware!
Graph Hardware Architecture - Enterprise graphs deserve great hardware!TigerGraph
 
HiPEAC 2022_Marco Tassemeier presentation
HiPEAC 2022_Marco Tassemeier presentationHiPEAC 2022_Marco Tassemeier presentation
HiPEAC 2022_Marco Tassemeier presentationVEDLIoT Project
 
Real time machine learning proposers day v3
Real time machine learning proposers day v3Real time machine learning proposers day v3
Real time machine learning proposers day v3mustafa sarac
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networksinside-BigData.com
 
Device Data Directory and Asynchronous execution: A path to heterogeneous com...
Device Data Directory and Asynchronous execution: A path to heterogeneous com...Device Data Directory and Asynchronous execution: A path to heterogeneous com...
Device Data Directory and Asynchronous execution: A path to heterogeneous com...LEGATO project
 
Mirabilis_Design AMD Versal System-Level IP Library
Mirabilis_Design AMD Versal System-Level IP LibraryMirabilis_Design AMD Versal System-Level IP Library
Mirabilis_Design AMD Versal System-Level IP LibraryDeepak Shankar
 
PIMRC-2012, Sydney, Australia, 28 July, 2012
PIMRC-2012, Sydney, Australia, 28 July, 2012PIMRC-2012, Sydney, Australia, 28 July, 2012
PIMRC-2012, Sydney, Australia, 28 July, 2012Charith Perera
 
RISC-V & SoC Architectural Exploration for AI and ML Accelerators
RISC-V & SoC Architectural Exploration for AI and ML AcceleratorsRISC-V & SoC Architectural Exploration for AI and ML Accelerators
RISC-V & SoC Architectural Exploration for AI and ML AcceleratorsRISC-V International
 
Cloud computing 2.pptx
Cloud computing 2.pptxCloud computing 2.pptx
Cloud computing 2.pptxkalavathisugan
 
ProjectVault[VivekKumar_CS-C_6Sem_MIT].pptx
ProjectVault[VivekKumar_CS-C_6Sem_MIT].pptxProjectVault[VivekKumar_CS-C_6Sem_MIT].pptx
ProjectVault[VivekKumar_CS-C_6Sem_MIT].pptxVivek Kumar
 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Lablup Inc.
 
Mauricio breteernitiz hpc-exascale-iscte
Mauricio breteernitiz hpc-exascale-iscteMauricio breteernitiz hpc-exascale-iscte
Mauricio breteernitiz hpc-exascale-isctembreternitz
 
X13 Pre-Release Update featuring 4th Gen Intel® Xeon® Scalable Processors
X13 Pre-Release Update featuring 4th Gen Intel® Xeon® Scalable Processors X13 Pre-Release Update featuring 4th Gen Intel® Xeon® Scalable Processors
X13 Pre-Release Update featuring 4th Gen Intel® Xeon® Scalable Processors Rebekah Rodriguez
 
From Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersFrom Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersRyousei Takano
 
e-Infrastructure available for research, using the right tool for the right job
e-Infrastructure available for research, using the right tool for the right jobe-Infrastructure available for research, using the right tool for the right job
e-Infrastructure available for research, using the right tool for the right jobDavid Wallom
 
How to create innovative architecture using VisualSim?
How to create innovative architecture using VisualSim?How to create innovative architecture using VisualSim?
How to create innovative architecture using VisualSim?Deepak Shankar
 
How to create innovative architecture using VisualSim?
How to create innovative architecture using VisualSim?How to create innovative architecture using VisualSim?
How to create innovative architecture using VisualSim?Deepak Shankar
 
How to create innovative architecture using ViualSim?
How to create innovative architecture using ViualSim?How to create innovative architecture using ViualSim?
How to create innovative architecture using ViualSim?Deepak Shankar
 

Ähnlich wie FPGA Hardware Accelerator for Machine Learning (20)

HiPEAC Computing Systems Week 2022_Mario Porrmann presentation
HiPEAC Computing Systems Week 2022_Mario Porrmann presentationHiPEAC Computing Systems Week 2022_Mario Porrmann presentation
HiPEAC Computing Systems Week 2022_Mario Porrmann presentation
 
Edge AI Framework for Healthcare Applications
Edge AI Framework for Healthcare ApplicationsEdge AI Framework for Healthcare Applications
Edge AI Framework for Healthcare Applications
 
Graph Hardware Architecture - Enterprise graphs deserve great hardware!
Graph Hardware Architecture - Enterprise graphs deserve great hardware!Graph Hardware Architecture - Enterprise graphs deserve great hardware!
Graph Hardware Architecture - Enterprise graphs deserve great hardware!
 
HiPEAC 2022_Marco Tassemeier presentation
HiPEAC 2022_Marco Tassemeier presentationHiPEAC 2022_Marco Tassemeier presentation
HiPEAC 2022_Marco Tassemeier presentation
 
Real time machine learning proposers day v3
Real time machine learning proposers day v3Real time machine learning proposers day v3
Real time machine learning proposers day v3
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networks
 
Device Data Directory and Asynchronous execution: A path to heterogeneous com...
Device Data Directory and Asynchronous execution: A path to heterogeneous com...Device Data Directory and Asynchronous execution: A path to heterogeneous com...
Device Data Directory and Asynchronous execution: A path to heterogeneous com...
 
Mirabilis_Design AMD Versal System-Level IP Library
Mirabilis_Design AMD Versal System-Level IP LibraryMirabilis_Design AMD Versal System-Level IP Library
Mirabilis_Design AMD Versal System-Level IP Library
 
PIMRC-2012, Sydney, Australia, 28 July, 2012
PIMRC-2012, Sydney, Australia, 28 July, 2012PIMRC-2012, Sydney, Australia, 28 July, 2012
PIMRC-2012, Sydney, Australia, 28 July, 2012
 
RISC-V & SoC Architectural Exploration for AI and ML Accelerators
RISC-V & SoC Architectural Exploration for AI and ML AcceleratorsRISC-V & SoC Architectural Exploration for AI and ML Accelerators
RISC-V & SoC Architectural Exploration for AI and ML Accelerators
 
Cloud computing 2.pptx
Cloud computing 2.pptxCloud computing 2.pptx
Cloud computing 2.pptx
 
ProjectVault[VivekKumar_CS-C_6Sem_MIT].pptx
ProjectVault[VivekKumar_CS-C_6Sem_MIT].pptxProjectVault[VivekKumar_CS-C_6Sem_MIT].pptx
ProjectVault[VivekKumar_CS-C_6Sem_MIT].pptx
 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
 
Mauricio breteernitiz hpc-exascale-iscte
Mauricio breteernitiz hpc-exascale-iscteMauricio breteernitiz hpc-exascale-iscte
Mauricio breteernitiz hpc-exascale-iscte
 
X13 Pre-Release Update featuring 4th Gen Intel® Xeon® Scalable Processors
X13 Pre-Release Update featuring 4th Gen Intel® Xeon® Scalable Processors X13 Pre-Release Update featuring 4th Gen Intel® Xeon® Scalable Processors
X13 Pre-Release Update featuring 4th Gen Intel® Xeon® Scalable Processors
 
From Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersFrom Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computers
 
e-Infrastructure available for research, using the right tool for the right job
e-Infrastructure available for research, using the right tool for the right jobe-Infrastructure available for research, using the right tool for the right job
e-Infrastructure available for research, using the right tool for the right job
 
How to create innovative architecture using VisualSim?
How to create innovative architecture using VisualSim?How to create innovative architecture using VisualSim?
How to create innovative architecture using VisualSim?
 
How to create innovative architecture using VisualSim?
How to create innovative architecture using VisualSim?How to create innovative architecture using VisualSim?
How to create innovative architecture using VisualSim?
 
How to create innovative architecture using ViualSim?
How to create innovative architecture using ViualSim?How to create innovative architecture using ViualSim?
How to create innovative architecture using ViualSim?
 

Kürzlich hochgeladen

Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionMebane Rash
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgsaravananr517913
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - GuideGOPINATHS437943
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substationstephanwindworld
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)dollysharma2066
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the weldingMuhammadUzairLiaqat
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptMadan Karki
 

Kürzlich hochgeladen (20)

Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of Action
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - Guide
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substation
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the welding
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.ppt
 

FPGA Hardware Accelerator for Machine Learning

  • 1. FPGA Hardware Accelerator for Machine Learning -Swami 1
  • 2. • Number of ML publications are growing exponentially at a faster rate than Moore’s law! Machine Learning Arxiv Papers per Year • Moore's law: number of transistors in a dense integrated circuit doubles about every two years. 2 “AI is the new electricity” – Andrew Ng Artificial intelligence is everywhere! Just as electricity revolutionized lives 100 years ago, AI is changing our lives completely today. Google, Netflix, face detection, predictive searches, recommendations, maps, autonomous cars, to name a few, all use some form of AI to make our lives better. https://data-mining.philippe-fournier-viger.com/too-many-machine-learning-papers/
  • 4. Accelerator • AI accelerator a class of specialized hardware accelerator designed to accelerate artificial intelligence and machine learning applications, including artificial neural networks and machine vision. • Hardware acceleration is the use of computer hardware designed to perform specific functions more efficiently when compared to software running on a general-purpose central processing unit (CPU). Any transformation of data that can be calculated in software running on a generic CPU can also be calculated in custom-made hardware, or in some mix of both. 4
  • 5. When to Use Hardware Acceleration • Computer graphics via Graphics Processing Unit (GPU) • Digital signal processing via Digital Signal Processor • Analog signal processing via Field-Programmable Analog Array • Sound processing via sound card • Computer networking via network processor and network interface controller • Cryptography via cryptographic accelerator and secure cryptoprocessor • Artificial Intelligence via AI accelerator • In-memory processing via network on a chip and systolic array • Any given computing task via Field-Programmable Gate Arrays (FPGA), Application-Specific Integrated Circuits (ASICs), Complex Programmable Logic Devices (CPLD), and Systems-on-Chip (SoC) 5
  • 6. Most common hardware accelerators: • Graphics Processing Units (GPUs): originally designed for handling the motion of image, GPUs are now used for calculations involving massive amounts of data, accelerating portions of an application while the rest continues to run on the CPU. The massive parallelism of modern GPUs allows users to process billions of records instantly. • Field Programmable Gate Arrays (FPGAs): a hardware description language (HDL)-specified semiconductor integrated circuit designed to allow the user to configure a large majority of the electrical functionality. FPGAs can be used to accelerate parts of an algorithm, sharing part of the computation between the FPGA and a general-purpose processor. • Application-Specific Integrated Circuits (ASICs): an integrated circuit customized specifically for a particular purpose or application, improving overall speed as it focuses solely on performing its one function. Maximum complexity in modern ASICs has grown to over 100 million logic gates. 6
  • 7. Hardware Accelerators objectives • Reducing the number of times values are moved from sources that have a high energy cost, such as DRAM or large on-chip buffers; and • Reducing the cost of moving each value • Allocates work to as many PEs as possible so that they can operate in parallel; and • Minimizes the number of idle cycles per PE by ensuring that there is sufficient memory bandwidth to deliver the data that needs to be processed, the data is delivered before it is needed, and workload imbalance among the parallel PEs is minimized. Reuse: Input feature map reuse, Filter reuse, Convolutional reuse Need to consider accuracy, throughput , latency , power consumption, hardware cost, flexibility, scalability while choosing the suitable hardware and models 7
  • 8. CPU, GPU, FPGA and ASIC 8
  • 10. 10
  • 11. 11
  • 12. 12
  • 13. The RegNet architecture for non-rigid registration of pulmonary CT follow-up scans Deep Learning in Pulmonology 13
  • 14. Computer-Assisted Decision Support System in Pulmonary Cancer detection and stage classification on CT images Overview of decision support system. 14
  • 15. Predicting pregnancy test results after embryo transfer by image feature extraction and analysis using Machine Learning 15
  • 16. Automation of early-stage human embryo development detection – Deep Learning Embryo image classification based on AlexNet and VGG16 architectures 16
  • 19. Hardware blocks for Neuron 19 https://github.com/vipinkmenon/neuralNetwork
  • 20. DNN Development Resources • Frameworks • Models • Popular Datasets for Classification 20
  • 21. Frameworks –open-source libraries contain software libraries for DNNs • Caffe - 2014 - UC Berkeley - It supports - C, C++, Python, and MATLAB • Tensorflow - Google - 2015, supports C++ and Python- multiple CPUs and GPUs and has more flexibility than Caffe PyTorch • Torch - Facebook and NYU and supports C, C++, and Lua; PyTorch is its successor and is built in Python 21
  • 23. 23 Pre Trained Models • LeNet-5 (1998) • AlexNet (2012) • VGG-16 (2014) • Inception-v1 (2014) • Inception-v3 (2015) • ResNet-50 (2015) • Xception (2016) • Inception-v4 (2016)
  • 24. Popular Datasets For Classification • The MNIST database of handwritten digits, available from this page, has a training set of 60,000 examples, and a test set of 10,000 examples. http://yann.lecun.com/exdb/mnist/ • The most highly-used subset of ImageNet is the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2012-2017 image classification and localization dataset. This dataset spans 1000 object classes and contains 1,281,167 training images, 50,000 validation images and 100,000 test images https://image-net.org/download.php 24
  • 25. Hardware across the Machine Learning landscape 25
  • 26. Edge Computing • Edge computing is using a low latency network to process and return data faster to the request sender. In edge computing, users has direct access and control over the data process. • While on cloud computing, users send requests and let the cloud servers do the rest of the work. The differences may be in milliseconds, but time is important in this day and age. 26
  • 29. Development history of the neural network accelerator based on FPGA. 29
  • 30. Field Programmable Gate Arrays (FPGAs) are semiconductor devices that are based around a matrix of configurable logic blocks (CLBs) connected via programmable interconnects. FPGAs can be reprogrammed to desired application or functionality requirements after manufacturing. 30
  • 31. Tightly integrated programmable logic Used to extend the processing system Scalable density and performance Complete ARM®-based processing system Application Processor Unit (APU) Dual ARM Cortex™-A9 Caches and support blocks Fully integrated memory controllers I/O peripherals Flexible array of I/O Wide range of external multi-standard I/O High-performance integrated serial transceivers Analog-to-digital converter inputs Zynq 7000 Architecture 31
  • 32. Parallelism on FPGA • Data comes in at the rate the camera output, (62 MHz) • This is then split in to 8 parallel processes, so now the data rate is 496MOPS (Million operations per second) • This then passes through the different stages of the processing. Now processing 2480 MOPS • The parallelism is then removed and output to the memory at a steady rate of 62 MHz 32
  • 33. 33
  • 34. A general architecture for the convolution layer (kernel size 33) with three different level of parallelism CNN Architecture 34
  • 35. Xilinx® Vitis™ AI • Xilinx® Vitis™ AI is a development stack for AI inference on Xilinx hardware platforms, including both edge devices and Alveo cards. • It consists of optimized IP, tools, libraries, models, and example designs. It is designed with high efficiency and ease of use in mind, unleashing the full potential of AI acceleration on Xilinx FPGA and ACAP. 35
  • 36. Vitis AI is composed of the following key components: • AI Model Zoo - A comprehensive set of pre-optimized models that are ready to deploy on Xilinx devices. • AI Optimizer - An optional model optimizer that can prune a model by up to 90%. It is separately available with commercial licenses. • AI Quantizer - A powerful quantizer that supports model quantization, calibration, and fine tuning. • AI Compiler - Compiles the quantized model to a high-efficient instruction set and data flow. • AI Profiler - Perform an in-depth analysis of the efficiency and utilization of AI inference implementation. • AI Library - Offers high-level yet optimized C++ APIs for AI applications from edge to cloud. • DPU - Efficient and scalable IP cores can be customized to meet the needs for many different applications 36
  • 37. Vitis AI Workflow • Model development - train models or get models from Vitis AI model zoo, use Vitis AI optimizer (optional), quantizer and compiler to convert float models into DPU instruction files • HW development - use Vitis tool to integrate DPU IP and other kernels with platform and generate board boot files • SW development - implement model deployment codes using VART or Vitis AI library, finish application level SW development and generate executable running on board. 37
  • 38. System Requirements Component Requirement FPGA Alveo U50, U50LV, U200, U250, U280 cards Zynq UltraScale+ MPSoc ZCU102 and ZCU104 Boards Versal ACAP VCK190 and VCK5000 boards KV260 Motherboard PCI Express 3.0-compliant x16 with one or dual slot System Power Supply 225W Operating System Ubuntu 16.04, 18.04, 20.04 CentOS 7.6, 7.7, 7.8, 8.1 RHEL 7.6, 7.7, 7.8, 8.1 CPU Intel i3/i5/i7/i9/Xeon 64-bit CPU AMD EPYC 7F52 64-bit CPU GPU (Optional to accelerate quantization) NVIDIA GPU supports CUDA 9.0 or higher, like NVIDIA P100, V100 CUDA Driver (Optional to accelerate quantization) Driver compatible to CUDA version, NVIDIA-384 or higher for CUDA 9.0, NVIDIA-410 or higher for CUDA 10.0 Docker Version 19.03 or higher 38
  • 39. 39
  • 40. Kria KV260 Vision AI Starter Kit 40
  • 41. Deep Learning Processor Unit (DPU) 41
  • 42. Zynq UltraScale+ MPSoC ZCU104 Evaluation Kit 42
  • 43. 43 Like poor before rich they yearn: For knowledge: the low never learn Learning உடையார்முன் இல்லார்ப ால் ஏக்கற்றுங் கற்றார் கடையபே கல்லா தவர். செல்வர்முன் வறியவர் நிற் துப ால் (கற்றவர்முன்) ஏங்கித் தாழ்ந்து நின்றும் கல்வி கற்றவபே உயர்ந்தவர்; கல்லாதவர் இழிந்தவர்.
  • 44. References • Vivienne Sze; Yu-Hsin Chen; Tien-Ju Yang; Joel S. Emer, Efficient Processing of Deep Neural Networks , Morgan & Claypool, 2020. • Xilinx.com • https://github.com/Xilinx/Vitis-Tutorials • https://github.com/datascienceid/deep-learning-resources • https://github.com/josephmisiti/awesome-machine-learning • https://github.com/ujjwalkarn/Machine-Learning-Tutorials • https://github.com/datascienceid/machine-learning-resources • Random pictures from Google.com 44