SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Š2017, Amazon Web Services, Inc. or its affiliates. All rights reserved
FPGAs in the cloud?
Julien Simon, Principal Evangelist, AI/ML
@julsimon
Velocity Conference, NYC, 04/10/2017
Agenda
• The case for non-CPU architectures
• What is an FPGA?
• Using FPGAs on AWS
• Demo: running an FPGA image on AWS
• FPGAs and Deep Learning
• Resources
The case for non-CPU architectures
Source: Intel
Powering AWS instances: Intel Xeon E7 v4
• 7.1 billion transistors
– 456 mm2 (0.7 square inch)
• General-purpose architecture
– SISD with SIMD extension (AVX instruction set)
• Best single-core performance
• Low parallelism
– 24 cores, 48 hyperthreads
– Multi-threaded applications are hard to build
– OS and librairies need to be thread-friendly
• Thermal envelope: 168W
•https://ark.intel.com/products/96900/Intel-Xeon-Processor-E7-8894-v4-60M-Cache-2_40-GHz
Case study: Clemenson University
1.1 million vCPUs for Natural Language Processing
Optimized cost thanks to Spot Instances
https://aws.amazon.com/blogs/aws/natural-language-processing-at-clemson-university-1-1-million-vcpus-ec2-spot-instances/
Moore’s winter is (probably) coming
• « I guess I see Moore’s Law dying here in the next decade or so, but
that’s not surprising », Gordon Moore, 2015
• Technology limits: a Skylake transistor is around 100 atoms across
• New workloads require higher parallelism to achieve good
performance
– Genomics
– Financial computing
– Image and video processing
– Deep Learning
• The age of the GPU has come
http://www.economist.com/technology-quarterly/2016-03-12/after-moores-law
https://spectrum.ieee.org/computing/hardware/gordon-moore-the-man-whose-name-means-progress
State of the art GPU: Nvidia V100
• 21.1 billion transistors
- 815 mm2 (1.36 square inch)
• Architecture optimized for floating point
– SIMT (Single Instruction, Multiple Threads)
• Massive parallelism
– 5120 CUDA cores, 640 Tensor cores
– CUDA programming model
– Large, high-bandwidth off-chip memory (DRAM)
• Thermal envelope: 250W
https://www.nvidia.com/en-us/data-center/tesla-v100/
https://devblogs.nvidia.com/parallelforall/inside-volta/
GPUs are not optimal for some applications
• Power consumption and efficiency (TOPS/Watt)
• Strict latency requirements
• Other requirements
– Custom data types, irregular parallelism, divergence
• Building your own ASIC may solve this, but:
– It’s a huge, costly and risky effort
– ASICs can’t be reconfigured
• Time for an FPGA renaissance?
What’s an FPGA?
The FPGA
• First commercial product by Xilink in 1985
• Field Programmable Gate Array
• Not a CPU (although you could build one with it)
• « Lego » hardware: logic cells, lookup tables, DSP, I/O
• Small amount of very fast on-chip memory
• Build custom logic to accelerate your SW application
FGPA architecture
Sources:
https://www.embedded-vision.com/industry-analysis/technical-articles/fpgas-deep-learning-based-vision-processing
http://www.bober-optosensorik.de/fpga-entwicklung.html
Developing FPGA applications
• Languages
– VHDL, Verilog
– OpenCL (C++)
• Software tools
– Design
– Simulation
– Synthesis
– Routing
• Hardware tools
– Evaluation boards
– Prototypes
Expensive and hard to scale
Using FPGAs on AWS
Amazon EC2 F1 Instances
• Up to 8 Xilinx UltraScale Plus VU9P FPGAs
• Each FPGA includes
• Local 64 GB DDR4 ECC protected memory
• Dedicated PCIe x16 connections
• Up to 400Gbps bidirectional ring connection for high-speed streaming
• Approximately 2.5 million logic elements, and approximately 6,800 DSP
engines
The FPGA Developer Amazon Machine Image
(AMI)
• Xilinx SDx 2017.1
– Free license for F1 FPGA development
– Supports VHDL, Verilog, OpenCL
• AWS FPGA SDK
– Amazon FPGA Image (AFI) Management Tools
– Linux drivers
– Command line
• AWS FPGA HDK
– Design files and scripts required to build an AFI
– Shell: platform logic to handle external peripherals, PCIe, DRAM, and interrupts
• Run simulation, design, etc. on a C4 to save money!
https://aws.amazon.com/marketplace/pp/B06VVYBLZZ
https://github.com/aws/aws-fpga
Amazon
Machine
Image (AMI)
Amazon FPGA
Image (AFI)
F1 Instance
CPU
DDR-4
Attached
Memory
DDR-4
Attached
Memory
DDR-4
Attached
Memory
DDR-4
Attached
Memory
DDR-4
Attached
Memory
DDR-4
Attached
Memory
DDR-4
Attached
Memory
DDR-4
Attached
Memory
FPGA Link
PCIe
DDR
Controllers
FPGA Acceleration Using F1 instances
AWS
Marketplac
e
Case study: Edico Genome
Highly Efficient
• Algorithms Implemented in Hardware
• Gate-Level Circuit Design
• No Instruction Set Overhead
Massively Parallel
• Massively Parallel Circuits
• Multiple Compute Engines
• Rapid FPGA Reconfigurability
Speeds Analysis of Whole Human Genomes from Hours to Minutes
Unprecedented Low Cost for Compute and Compressed Storage
http://www.edicogenome.com/
https://aws.amazon.com/marketplace/pp/B075JR57J1
Case study: NGCodec
• Provider of UHD video compression technology
• Up to 50x faster vs. software H.265
• Higher quality video than x265 ‘veryslow’ preset
– Same bit rate
– 60+ frames per second
• Lower latency between live stream and end
viewing
• Optimized cost
https://ngcodec.com/markets-cloud-transcoding/
https://aws.amazon.com/marketplace/pp/B074W1FPKR
Demo: OpenCL on F1 instance
Building the OpenCL application
git clone https://github.com/aws/aws-fpga.git
cd aws-fpga
source sdk_setup.sh
source hdk_setup.sh
source sdaccel_setup.sh
source $XILINX_SDX/settings64.sh
cd $SDACCEL_DIR/examples/xilinx/getting_started/host/helloworld_ocl/
make clean
make check TARGETS=sw_emu DEVICES=$AWS_PLATFORM all
make check TARGETS=hw_emu DEVICES=$AWS_PLATFORM all
make check TARGETS=hw DEVICES=$AWS_PLATFORM all
Creating Vivado project and starting FPGA synthesis
…
INFO: [XOCC 60-586] Created xclbin/vector_addition.hw.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0.xclbin
Total elapsed time: 2h 31m 7s
$(SDACCEL_DIR)/tools/create_sdaccel_afi.sh -xclbin=xclbin/vector_addition.hw.xilinx_aws-vu9p-f1_4ddr-xpr-
2pr_4_0.xclbin -o=vector_addition.hw.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0 -s3_bucket=jsimon-fpga -
s3_logs_key=logs -s3_dcp_key=dcp
…
Generated manifest file '17_10_02-163912_manifest.txt’
upload: ./17_10_02-163912_Developer_SDAccel_Kernel.tar to s3://jsimon-fpga/dcp/17_10_02-
163912_Developer_SDAccel_Kernel.tar17_10_02-163912_agfi_id.txt
Building the AFI
aws ec2 describe-fpga-images --fpga-image-id afi-056fb17ddb8cedf37
{ "FpgaImages": [{
"UpdateTime": "2017-10-02T16:39:17.000Z",
"Name": "xclbin/vector_addition.hw.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0.xclbin",
"FpgaImageGlobalId": "agfi-03a8031774fc4773f",
"Public": false,
"State": { "Code": "pending"},
"OwnerId": "6XXXXXXXXXXX",
"FpgaImageId": "afi-056fb17ddb8cedf37",
"CreateTime": "2017-10-02T16:39:17.000Z",
"Description": "xclbin/vector_addition.hw.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0.xclbin" }]
}
Loading the AFI and running the OpenCL
application
aws ec2 describe-fpga-images --fpga-image-id afi-056fb17ddb8cedf37
{ "FpgaImages": [{
"UpdateTime": "2017-10-02T16:39:17.000Z",
"Name": "xclbin/vector_addition.hw.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0.xclbin",
"FpgaImageGlobalId": "agfi-03a8031774fc4773f",
"Public": false,
"State": { "Code": "ready"},
"OwnerId": "6XXXXXXXXXXX",
"FpgaImageId": "afi-056fb17ddb8cedf37",
"CreateTime": "2017-10-02T16:39:17.000Z",
"Description": "xclbin/vector_addition.hw.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0.xclbin" }]
}
sudo fpga-load-local-image -S 0 -I agfi-03a8031774fc4773f
sudo fpga-describe-local-image -S 0
sudo sh
source /opt/Xilinx/SDx/2017.1.rte/setup.sh
./helloworld
sudo fpga-clear-local-image -S 0
FPGAs and Deep Learning
A chink in the GPU armor?
• GPUs are great for training,
but what about inference?
• Throughput and latency: pick one?
– Using batches increases latency
– Using single samples degrades throughput
• Power and memory requirements
– Floating-point operations are power-hungry
– Floating-point weights need more DRAM,
which is power-hungry too
• Neural networks can be implemented
on FPGA
Š HBO
Using custom logic to Multiply and Accumulate
Source: ÂŤ FPGA Implementations of Neural Networks Âť, Springer, 2006
Smaller weights  less gates, less data to load into the FPGA
Optimizing Deep Learning models for FPGAs
• Quantization: using integer weights
– 8/4/2-bit integers instead of 32-bit floats
– Reduces power consumption
– Simplifies the logic needed to implement the
model
– Reduces memory usage
• Pruning: removing useless connections
– Increases computation speed
– Reduces memory usage
• Compression: encoding weights
– Reduces model size
On-chip SRAM
becomes a
viable option
 More power-
effcient than
DRAM
 Faster than
off-chip DRAM
Published results
[Han, 2016] Optimizing CNNs on CPU and GPU
• AlexNet 35x smaller, VGG-16 49x smaller
• 3x to 4x speedup, 3x to 7x more energy-efficient
• No loss of accuracy
[Han, 2017] Optimizing LSTM on Xilinx FPGA
• FPGA vs CPU: 43x faster, 40x more energy-efficient
• FPGA vs GPU: 3x faster, 11.5x more energy-efficient
[Nurvitadhi, 2017] Optimizing CNNs on Intel FPGA
• FPGA vs GPU: 60% faster, 2.3x more energy-effcient
• <1% loss of accuracy
Nvidia Hardware for Deep Learning
• Open architecture for DL inference accelerators on IoT
devices
– Convolution Core – optimized high-performance convolution engine
– Single Data Processor – single-point lookup engine for activation functions
– Planar Data Processor – planar averaging engine for pooling
– Channel Data Processor – multi-channel averaging engine for normalization
functions
– Dedicated Memory and Data Reshape Engines – memory-to-memory
transformation acceleration for tensor reshape and copy operations.
• Verilog model + test suite
• F1 instances are supportedhttp://nvdla.org/
https://github.com/nvdla/
Conclusion
• CPU, GPU, FPGA: the battle rages on
• As always, pick the right tool for the job
– Application requirements: performance, power, cost, etc.
– Time to market
– Skills
– The AWS marketplace: the solution may be just a few clicks away!
• AWS offers you many options,
please explore them and give us feedback
Resources
https://aws.amazon.com/ec2/instance-types/f1
https://aws.amazon.com/ec2/instance-types/f1/partners/
https://github.com/aws/aws-fpga
[Han, 2016] ÂŤ Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization
and Huffman Coding Âť https://arxiv.org/abs/1510.00149
[Han, 2017] ÂŤ ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA Âť, Best Paper at
FPGA’17
https://arxiv.org/abs/1612.00694
« Deep Learning Tutorial and Recent Trends », FPGA’17
http://isfpga.org/slides/D1_S1_Tutorial.pdf
[Nurvitadhi, 2017] ÂŤ Can FPGAs Beat GPUs in Accelerating Next-Generation Deep Neural Networks? Âť,
FPGA’17 http://jaewoong.org/pubs/fpga17-next-generation-dnns.pdf
Thank you!
http://aws.amazon.com/evangelists/julien-simon
@julsimon

Weitere ähnliche Inhalte

Was ist angesagt?

(CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC
(CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC(CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC
(CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPCAmazon Web Services
 
10 tips to improve the performance of your AWS application
10 tips to improve the performance of your AWS application10 tips to improve the performance of your AWS application
10 tips to improve the performance of your AWS applicationAmazon Web Services
 
Build, train, and deploy Machine Learning models at scale (May 2018)
Build, train, and deploy Machine Learning models at scale (May 2018)Build, train, and deploy Machine Learning models at scale (May 2018)
Build, train, and deploy Machine Learning models at scale (May 2018)Julien SIMON
 
Machine Learning inference at the Edge
Machine Learning inference at the EdgeMachine Learning inference at the Edge
Machine Learning inference at the EdgeJulien SIMON
 
High Performance Computing with AWS
High Performance Computing with AWSHigh Performance Computing with AWS
High Performance Computing with AWSAmazon Web Services
 
Deep Learning with AWS (November 2016)
Deep Learning with AWS (November 2016)Deep Learning with AWS (November 2016)
Deep Learning with AWS (November 2016)Julien SIMON
 
High Performance Computing (HPC) in cloud
High Performance Computing (HPC) in cloudHigh Performance Computing (HPC) in cloud
High Performance Computing (HPC) in cloudAccubits Technologies
 
Best Practices for Genomic and Bioinformatics Analysis Pipelines on AWS
Best Practices for Genomic and Bioinformatics Analysis Pipelines on AWS Best Practices for Genomic and Bioinformatics Analysis Pipelines on AWS
Best Practices for Genomic and Bioinformatics Analysis Pipelines on AWS Amazon Web Services
 
HPC on Azure for Reserach
HPC on Azure for ReserachHPC on Azure for Reserach
HPC on Azure for ReserachJĂźrgen Ambrosi
 
Deep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance PerformanceDeep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance PerformanceAmazon Web Services
 
Deep Dive on Amazon EC2 Instances (March 2017)
Deep Dive on Amazon EC2 Instances (March 2017)Deep Dive on Amazon EC2 Instances (March 2017)
Deep Dive on Amazon EC2 Instances (March 2017)Julien SIMON
 
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014Amazon Web Services
 
Machine Learning Inference at the Edge
Machine Learning Inference at the EdgeMachine Learning Inference at the Edge
Machine Learning Inference at the EdgeJulien SIMON
 
Deep Dive on Amazon EC2 Instances - AWS Summit Cape Town 2017
Deep Dive on Amazon EC2 Instances - AWS Summit Cape Town 2017Deep Dive on Amazon EC2 Instances - AWS Summit Cape Town 2017
Deep Dive on Amazon EC2 Instances - AWS Summit Cape Town 2017Amazon Web Services
 
Deep Learning for Developers
Deep Learning for DevelopersDeep Learning for Developers
Deep Learning for DevelopersAmazon Web Services
 
AWS meetup「Apache Spark on EMR」
AWS meetup「Apache Spark on EMR」AWS meetup「Apache Spark on EMR」
AWS meetup「Apache Spark on EMR」SmartNews, Inc.
 
deep learning in production cff 2017
deep learning in production cff 2017deep learning in production cff 2017
deep learning in production cff 2017Ari Kamlani
 
Deep Learning for Developers (October 2017)
Deep Learning for Developers (October 2017)Deep Learning for Developers (October 2017)
Deep Learning for Developers (October 2017)Julien SIMON
 

Was ist angesagt? (20)

(CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC
(CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC(CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC
(CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC
 
10 tips to improve the performance of your AWS application
10 tips to improve the performance of your AWS application10 tips to improve the performance of your AWS application
10 tips to improve the performance of your AWS application
 
Build, train, and deploy Machine Learning models at scale (May 2018)
Build, train, and deploy Machine Learning models at scale (May 2018)Build, train, and deploy Machine Learning models at scale (May 2018)
Build, train, and deploy Machine Learning models at scale (May 2018)
 
Machine Learning inference at the Edge
Machine Learning inference at the EdgeMachine Learning inference at the Edge
Machine Learning inference at the Edge
 
High Performance Computing with AWS
High Performance Computing with AWSHigh Performance Computing with AWS
High Performance Computing with AWS
 
Deep Learning with AWS (November 2016)
Deep Learning with AWS (November 2016)Deep Learning with AWS (November 2016)
Deep Learning with AWS (November 2016)
 
Deep Dive on Amazon EC2
Deep Dive on Amazon EC2Deep Dive on Amazon EC2
Deep Dive on Amazon EC2
 
High Performance Computing (HPC) in cloud
High Performance Computing (HPC) in cloudHigh Performance Computing (HPC) in cloud
High Performance Computing (HPC) in cloud
 
Best Practices for Genomic and Bioinformatics Analysis Pipelines on AWS
Best Practices for Genomic and Bioinformatics Analysis Pipelines on AWS Best Practices for Genomic and Bioinformatics Analysis Pipelines on AWS
Best Practices for Genomic and Bioinformatics Analysis Pipelines on AWS
 
HPC on Azure for Reserach
HPC on Azure for ReserachHPC on Azure for Reserach
HPC on Azure for Reserach
 
Deep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance PerformanceDeep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance Performance
 
Deep Dive on Amazon EC2 Instances (March 2017)
Deep Dive on Amazon EC2 Instances (March 2017)Deep Dive on Amazon EC2 Instances (March 2017)
Deep Dive on Amazon EC2 Instances (March 2017)
 
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
 
Machine Learning Inference at the Edge
Machine Learning Inference at the EdgeMachine Learning Inference at the Edge
Machine Learning Inference at the Edge
 
Deep Dive on Amazon EC2 Instances - AWS Summit Cape Town 2017
Deep Dive on Amazon EC2 Instances - AWS Summit Cape Town 2017Deep Dive on Amazon EC2 Instances - AWS Summit Cape Town 2017
Deep Dive on Amazon EC2 Instances - AWS Summit Cape Town 2017
 
Deep Learning for Developers
Deep Learning for DevelopersDeep Learning for Developers
Deep Learning for Developers
 
AWS meetup「Apache Spark on EMR」
AWS meetup「Apache Spark on EMR」AWS meetup「Apache Spark on EMR」
AWS meetup「Apache Spark on EMR」
 
Deep Dive on Amazon EC2
Deep Dive on Amazon EC2Deep Dive on Amazon EC2
Deep Dive on Amazon EC2
 
deep learning in production cff 2017
deep learning in production cff 2017deep learning in production cff 2017
deep learning in production cff 2017
 
Deep Learning for Developers (October 2017)
Deep Learning for Developers (October 2017)Deep Learning for Developers (October 2017)
Deep Learning for Developers (October 2017)
 

Ähnlich wie FPGAs in the cloud? (October 2017)

Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...Cesar Maciel
 
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-PremiseTackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-PremiseDatabricks
 
AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...
AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...
AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...Amazon Web Services
 
Making the most out of Heterogeneous Chips with CPU, GPU and FPGA
Making the most out of Heterogeneous Chips with CPU, GPU and FPGAMaking the most out of Heterogeneous Chips with CPU, GPU and FPGA
Making the most out of Heterogeneous Chips with CPU, GPU and FPGAFacultad de InformĂĄtica UCM
 
Deep learning with FPGA
Deep learning with FPGADeep learning with FPGA
Deep learning with FPGAAyush Singh, MS
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureCeph Community
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitecturePatrick McGarry
 
PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」
PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」
PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」PC Cluster Consortium
 
High Performance Computing in AWS, Immersion Day Huntsville 2019
High Performance Computing in AWS, Immersion Day Huntsville 2019High Performance Computing in AWS, Immersion Day Huntsville 2019
High Performance Computing in AWS, Immersion Day Huntsville 2019Amazon Web Services
 
Choose Your Weapon: Comparing Spark on FPGAs vs GPUs
Choose Your Weapon: Comparing Spark on FPGAs vs GPUsChoose Your Weapon: Comparing Spark on FPGAs vs GPUs
Choose Your Weapon: Comparing Spark on FPGAs vs GPUsDatabricks
 
SCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
SCFE 2020 OpenCAPI presentation as part of OpenPWOER TutorialSCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
SCFE 2020 OpenCAPI presentation as part of OpenPWOER TutorialGanesan Narayanasamy
 
Introduction to HPC & Supercomputing in AI
Introduction to HPC & Supercomputing in AIIntroduction to HPC & Supercomputing in AI
Introduction to HPC & Supercomputing in AITyrone Systems
 
FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning Dr. Swaminathan Kathirvel
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDKKernel TLV
 
ODSA Sub-Project Launch
 ODSA Sub-Project Launch ODSA Sub-Project Launch
ODSA Sub-Project LaunchNetronome
 
ODSA Sub-Project Launch
ODSA Sub-Project LaunchODSA Sub-Project Launch
ODSA Sub-Project LaunchODSA Workgroup
 
OpenCAPI next generation accelerator
OpenCAPI next generation accelerator OpenCAPI next generation accelerator
OpenCAPI next generation accelerator Ganesan Narayanasamy
 
Ceph Day Beijing - Ceph all-flash array design based on NUMA architecture
Ceph Day Beijing - Ceph all-flash array design based on NUMA architectureCeph Day Beijing - Ceph all-flash array design based on NUMA architecture
Ceph Day Beijing - Ceph all-flash array design based on NUMA architectureCeph Community
 
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureCeph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureDanielle Womboldt
 

Ähnlich wie FPGAs in the cloud? (October 2017) (20)

Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
 
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-PremiseTackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
 
AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...
AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...
AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...
 
Making the most out of Heterogeneous Chips with CPU, GPU and FPGA
Making the most out of Heterogeneous Chips with CPU, GPU and FPGAMaking the most out of Heterogeneous Chips with CPU, GPU and FPGA
Making the most out of Heterogeneous Chips with CPU, GPU and FPGA
 
SDAccel Design Contest: SDAccel and F1 Instances
SDAccel Design Contest: SDAccel and F1 InstancesSDAccel Design Contest: SDAccel and F1 Instances
SDAccel Design Contest: SDAccel and F1 Instances
 
Deep learning with FPGA
Deep learning with FPGADeep learning with FPGA
Deep learning with FPGA
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference Architecture
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference Architecture
 
PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」
PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」
PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」
 
High Performance Computing in AWS, Immersion Day Huntsville 2019
High Performance Computing in AWS, Immersion Day Huntsville 2019High Performance Computing in AWS, Immersion Day Huntsville 2019
High Performance Computing in AWS, Immersion Day Huntsville 2019
 
Choose Your Weapon: Comparing Spark on FPGAs vs GPUs
Choose Your Weapon: Comparing Spark on FPGAs vs GPUsChoose Your Weapon: Comparing Spark on FPGAs vs GPUs
Choose Your Weapon: Comparing Spark on FPGAs vs GPUs
 
SCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
SCFE 2020 OpenCAPI presentation as part of OpenPWOER TutorialSCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
SCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
 
Introduction to HPC & Supercomputing in AI
Introduction to HPC & Supercomputing in AIIntroduction to HPC & Supercomputing in AI
Introduction to HPC & Supercomputing in AI
 
FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
 
ODSA Sub-Project Launch
 ODSA Sub-Project Launch ODSA Sub-Project Launch
ODSA Sub-Project Launch
 
ODSA Sub-Project Launch
ODSA Sub-Project LaunchODSA Sub-Project Launch
ODSA Sub-Project Launch
 
OpenCAPI next generation accelerator
OpenCAPI next generation accelerator OpenCAPI next generation accelerator
OpenCAPI next generation accelerator
 
Ceph Day Beijing - Ceph all-flash array design based on NUMA architecture
Ceph Day Beijing - Ceph all-flash array design based on NUMA architectureCeph Day Beijing - Ceph all-flash array design based on NUMA architecture
Ceph Day Beijing - Ceph all-flash array design based on NUMA architecture
 
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureCeph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
 

Mehr von Julien SIMON

An introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging FaceAn introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging FaceJulien SIMON
 
Reinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face TransformersReinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face TransformersJulien SIMON
 
Building NLP applications with Transformers
Building NLP applications with TransformersBuilding NLP applications with Transformers
Building NLP applications with TransformersJulien SIMON
 
Building Machine Learning Models Automatically (June 2020)
Building Machine Learning Models Automatically (June 2020)Building Machine Learning Models Automatically (June 2020)
Building Machine Learning Models Automatically (June 2020)Julien SIMON
 
Starting your AI/ML project right (May 2020)
Starting your AI/ML project right (May 2020)Starting your AI/ML project right (May 2020)
Starting your AI/ML project right (May 2020)Julien SIMON
 
Scale Machine Learning from zero to millions of users (April 2020)
Scale Machine Learning from zero to millions of users (April 2020)Scale Machine Learning from zero to millions of users (April 2020)
Scale Machine Learning from zero to millions of users (April 2020)Julien SIMON
 
An Introduction to Generative Adversarial Networks (April 2020)
An Introduction to Generative Adversarial Networks (April 2020)An Introduction to Generative Adversarial Networks (April 2020)
An Introduction to Generative Adversarial Networks (April 2020)Julien SIMON
 
AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...
AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...
AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...Julien SIMON
 
AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)
AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)
AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)Julien SIMON
 
AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...
AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...
AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...Julien SIMON
 
A pragmatic introduction to natural language processing models (October 2019)
A pragmatic introduction to natural language processing models (October 2019)A pragmatic introduction to natural language processing models (October 2019)
A pragmatic introduction to natural language processing models (October 2019)Julien SIMON
 
Building smart applications with AWS AI services (October 2019)
Building smart applications with AWS AI services (October 2019)Building smart applications with AWS AI services (October 2019)
Building smart applications with AWS AI services (October 2019)Julien SIMON
 
Build, train and deploy ML models with SageMaker (October 2019)
Build, train and deploy ML models with SageMaker (October 2019)Build, train and deploy ML models with SageMaker (October 2019)
Build, train and deploy ML models with SageMaker (October 2019)Julien SIMON
 
The Future of AI (September 2019)
The Future of AI (September 2019)The Future of AI (September 2019)
The Future of AI (September 2019)Julien SIMON
 
Building Machine Learning Inference Pipelines at Scale (July 2019)
Building Machine Learning Inference Pipelines at Scale (July 2019)Building Machine Learning Inference Pipelines at Scale (July 2019)
Building Machine Learning Inference Pipelines at Scale (July 2019)Julien SIMON
 
Train and Deploy Machine Learning Workloads with AWS Container Services (July...
Train and Deploy Machine Learning Workloads with AWS Container Services (July...Train and Deploy Machine Learning Workloads with AWS Container Services (July...
Train and Deploy Machine Learning Workloads with AWS Container Services (July...Julien SIMON
 
Optimize your Machine Learning Workloads on AWS (July 2019)
Optimize your Machine Learning Workloads on AWS (July 2019)Optimize your Machine Learning Workloads on AWS (July 2019)
Optimize your Machine Learning Workloads on AWS (July 2019)Julien SIMON
 
Deep Learning on Amazon Sagemaker (July 2019)
Deep Learning on Amazon Sagemaker (July 2019)Deep Learning on Amazon Sagemaker (July 2019)
Deep Learning on Amazon Sagemaker (July 2019)Julien SIMON
 
Automate your Amazon SageMaker Workflows (July 2019)
Automate your Amazon SageMaker Workflows (July 2019)Automate your Amazon SageMaker Workflows (July 2019)
Automate your Amazon SageMaker Workflows (July 2019)Julien SIMON
 
Build, train and deploy ML models with Amazon SageMaker (May 2019)
Build, train and deploy ML models with Amazon SageMaker (May 2019)Build, train and deploy ML models with Amazon SageMaker (May 2019)
Build, train and deploy ML models with Amazon SageMaker (May 2019)Julien SIMON
 

Mehr von Julien SIMON (20)

An introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging FaceAn introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging Face
 
Reinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face TransformersReinventing Deep Learning
 with Hugging Face Transformers
Reinventing Deep Learning
 with Hugging Face Transformers
 
Building NLP applications with Transformers
Building NLP applications with TransformersBuilding NLP applications with Transformers
Building NLP applications with Transformers
 
Building Machine Learning Models Automatically (June 2020)
Building Machine Learning Models Automatically (June 2020)Building Machine Learning Models Automatically (June 2020)
Building Machine Learning Models Automatically (June 2020)
 
Starting your AI/ML project right (May 2020)
Starting your AI/ML project right (May 2020)Starting your AI/ML project right (May 2020)
Starting your AI/ML project right (May 2020)
 
Scale Machine Learning from zero to millions of users (April 2020)
Scale Machine Learning from zero to millions of users (April 2020)Scale Machine Learning from zero to millions of users (April 2020)
Scale Machine Learning from zero to millions of users (April 2020)
 
An Introduction to Generative Adversarial Networks (April 2020)
An Introduction to Generative Adversarial Networks (April 2020)An Introduction to Generative Adversarial Networks (April 2020)
An Introduction to Generative Adversarial Networks (April 2020)
 
AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...
AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...
AIM410R1 Deep learning applications with TensorFlow, featuring Fannie Mae (De...
 
AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)
AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)
AIM361 Optimizing machine learning models with Amazon SageMaker (December 2019)
 
AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...
AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...
AIM410R Deep Learning Applications with TensorFlow, featuring Mobileye (Decem...
 
A pragmatic introduction to natural language processing models (October 2019)
A pragmatic introduction to natural language processing models (October 2019)A pragmatic introduction to natural language processing models (October 2019)
A pragmatic introduction to natural language processing models (October 2019)
 
Building smart applications with AWS AI services (October 2019)
Building smart applications with AWS AI services (October 2019)Building smart applications with AWS AI services (October 2019)
Building smart applications with AWS AI services (October 2019)
 
Build, train and deploy ML models with SageMaker (October 2019)
Build, train and deploy ML models with SageMaker (October 2019)Build, train and deploy ML models with SageMaker (October 2019)
Build, train and deploy ML models with SageMaker (October 2019)
 
The Future of AI (September 2019)
The Future of AI (September 2019)The Future of AI (September 2019)
The Future of AI (September 2019)
 
Building Machine Learning Inference Pipelines at Scale (July 2019)
Building Machine Learning Inference Pipelines at Scale (July 2019)Building Machine Learning Inference Pipelines at Scale (July 2019)
Building Machine Learning Inference Pipelines at Scale (July 2019)
 
Train and Deploy Machine Learning Workloads with AWS Container Services (July...
Train and Deploy Machine Learning Workloads with AWS Container Services (July...Train and Deploy Machine Learning Workloads with AWS Container Services (July...
Train and Deploy Machine Learning Workloads with AWS Container Services (July...
 
Optimize your Machine Learning Workloads on AWS (July 2019)
Optimize your Machine Learning Workloads on AWS (July 2019)Optimize your Machine Learning Workloads on AWS (July 2019)
Optimize your Machine Learning Workloads on AWS (July 2019)
 
Deep Learning on Amazon Sagemaker (July 2019)
Deep Learning on Amazon Sagemaker (July 2019)Deep Learning on Amazon Sagemaker (July 2019)
Deep Learning on Amazon Sagemaker (July 2019)
 
Automate your Amazon SageMaker Workflows (July 2019)
Automate your Amazon SageMaker Workflows (July 2019)Automate your Amazon SageMaker Workflows (July 2019)
Automate your Amazon SageMaker Workflows (July 2019)
 
Build, train and deploy ML models with Amazon SageMaker (May 2019)
Build, train and deploy ML models with Amazon SageMaker (May 2019)Build, train and deploy ML models with Amazon SageMaker (May 2019)
Build, train and deploy ML models with Amazon SageMaker (May 2019)
 

KĂźrzlich hochgeladen

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Christopher Logan Kennedy
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 

KĂźrzlich hochgeladen (20)

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 

FPGAs in the cloud? (October 2017)

  • 1. Š2017, Amazon Web Services, Inc. or its affiliates. All rights reserved FPGAs in the cloud? Julien Simon, Principal Evangelist, AI/ML @julsimon Velocity Conference, NYC, 04/10/2017
  • 2. Agenda • The case for non-CPU architectures • What is an FPGA? • Using FPGAs on AWS • Demo: running an FPGA image on AWS • FPGAs and Deep Learning • Resources
  • 3. The case for non-CPU architectures
  • 5. Powering AWS instances: Intel Xeon E7 v4 • 7.1 billion transistors – 456 mm2 (0.7 square inch) • General-purpose architecture – SISD with SIMD extension (AVX instruction set) • Best single-core performance • Low parallelism – 24 cores, 48 hyperthreads – Multi-threaded applications are hard to build – OS and librairies need to be thread-friendly • Thermal envelope: 168W •https://ark.intel.com/products/96900/Intel-Xeon-Processor-E7-8894-v4-60M-Cache-2_40-GHz
  • 6. Case study: Clemenson University 1.1 million vCPUs for Natural Language Processing Optimized cost thanks to Spot Instances https://aws.amazon.com/blogs/aws/natural-language-processing-at-clemson-university-1-1-million-vcpus-ec2-spot-instances/
  • 7. Moore’s winter is (probably) coming • ÂŤ I guess I see Moore’s Law dying here in the next decade or so, but that’s not surprising Âť, Gordon Moore, 2015 • Technology limits: a Skylake transistor is around 100 atoms across • New workloads require higher parallelism to achieve good performance – Genomics – Financial computing – Image and video processing – Deep Learning • The age of the GPU has come http://www.economist.com/technology-quarterly/2016-03-12/after-moores-law https://spectrum.ieee.org/computing/hardware/gordon-moore-the-man-whose-name-means-progress
  • 8. State of the art GPU: Nvidia V100 • 21.1 billion transistors - 815 mm2 (1.36 square inch) • Architecture optimized for floating point – SIMT (Single Instruction, Multiple Threads) • Massive parallelism – 5120 CUDA cores, 640 Tensor cores – CUDA programming model – Large, high-bandwidth off-chip memory (DRAM) • Thermal envelope: 250W https://www.nvidia.com/en-us/data-center/tesla-v100/ https://devblogs.nvidia.com/parallelforall/inside-volta/
  • 9. GPUs are not optimal for some applications • Power consumption and efficiency (TOPS/Watt) • Strict latency requirements • Other requirements – Custom data types, irregular parallelism, divergence • Building your own ASIC may solve this, but: – It’s a huge, costly and risky effort – ASICs can’t be reconfigured • Time for an FPGA renaissance?
  • 11. The FPGA • First commercial product by Xilink in 1985 • Field Programmable Gate Array • Not a CPU (although you could build one with it) • ÂŤ Lego Âť hardware: logic cells, lookup tables, DSP, I/O • Small amount of very fast on-chip memory • Build custom logic to accelerate your SW application
  • 13. Developing FPGA applications • Languages – VHDL, Verilog – OpenCL (C++) • Software tools – Design – Simulation – Synthesis – Routing • Hardware tools – Evaluation boards – Prototypes Expensive and hard to scale
  • 15. Amazon EC2 F1 Instances • Up to 8 Xilinx UltraScale Plus VU9P FPGAs • Each FPGA includes • Local 64 GB DDR4 ECC protected memory • Dedicated PCIe x16 connections • Up to 400Gbps bidirectional ring connection for high-speed streaming • Approximately 2.5 million logic elements, and approximately 6,800 DSP engines
  • 16. The FPGA Developer Amazon Machine Image (AMI) • Xilinx SDx 2017.1 – Free license for F1 FPGA development – Supports VHDL, Verilog, OpenCL • AWS FPGA SDK – Amazon FPGA Image (AFI) Management Tools – Linux drivers – Command line • AWS FPGA HDK – Design files and scripts required to build an AFI – Shell: platform logic to handle external peripherals, PCIe, DRAM, and interrupts • Run simulation, design, etc. on a C4 to save money! https://aws.amazon.com/marketplace/pp/B06VVYBLZZ https://github.com/aws/aws-fpga
  • 17. Amazon Machine Image (AMI) Amazon FPGA Image (AFI) F1 Instance CPU DDR-4 Attached Memory DDR-4 Attached Memory DDR-4 Attached Memory DDR-4 Attached Memory DDR-4 Attached Memory DDR-4 Attached Memory DDR-4 Attached Memory DDR-4 Attached Memory FPGA Link PCIe DDR Controllers FPGA Acceleration Using F1 instances AWS Marketplac e
  • 18. Case study: Edico Genome Highly Efficient • Algorithms Implemented in Hardware • Gate-Level Circuit Design • No Instruction Set Overhead Massively Parallel • Massively Parallel Circuits • Multiple Compute Engines • Rapid FPGA Reconfigurability Speeds Analysis of Whole Human Genomes from Hours to Minutes Unprecedented Low Cost for Compute and Compressed Storage http://www.edicogenome.com/ https://aws.amazon.com/marketplace/pp/B075JR57J1
  • 19. Case study: NGCodec • Provider of UHD video compression technology • Up to 50x faster vs. software H.265 • Higher quality video than x265 ‘veryslow’ preset – Same bit rate – 60+ frames per second • Lower latency between live stream and end viewing • Optimized cost https://ngcodec.com/markets-cloud-transcoding/ https://aws.amazon.com/marketplace/pp/B074W1FPKR
  • 20. Demo: OpenCL on F1 instance
  • 21. Building the OpenCL application git clone https://github.com/aws/aws-fpga.git cd aws-fpga source sdk_setup.sh source hdk_setup.sh source sdaccel_setup.sh source $XILINX_SDX/settings64.sh cd $SDACCEL_DIR/examples/xilinx/getting_started/host/helloworld_ocl/ make clean make check TARGETS=sw_emu DEVICES=$AWS_PLATFORM all make check TARGETS=hw_emu DEVICES=$AWS_PLATFORM all make check TARGETS=hw DEVICES=$AWS_PLATFORM all Creating Vivado project and starting FPGA synthesis … INFO: [XOCC 60-586] Created xclbin/vector_addition.hw.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0.xclbin Total elapsed time: 2h 31m 7s $(SDACCEL_DIR)/tools/create_sdaccel_afi.sh -xclbin=xclbin/vector_addition.hw.xilinx_aws-vu9p-f1_4ddr-xpr- 2pr_4_0.xclbin -o=vector_addition.hw.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0 -s3_bucket=jsimon-fpga - s3_logs_key=logs -s3_dcp_key=dcp … Generated manifest file '17_10_02-163912_manifest.txt’ upload: ./17_10_02-163912_Developer_SDAccel_Kernel.tar to s3://jsimon-fpga/dcp/17_10_02- 163912_Developer_SDAccel_Kernel.tar17_10_02-163912_agfi_id.txt
  • 22. Building the AFI aws ec2 describe-fpga-images --fpga-image-id afi-056fb17ddb8cedf37 { "FpgaImages": [{ "UpdateTime": "2017-10-02T16:39:17.000Z", "Name": "xclbin/vector_addition.hw.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0.xclbin", "FpgaImageGlobalId": "agfi-03a8031774fc4773f", "Public": false, "State": { "Code": "pending"}, "OwnerId": "6XXXXXXXXXXX", "FpgaImageId": "afi-056fb17ddb8cedf37", "CreateTime": "2017-10-02T16:39:17.000Z", "Description": "xclbin/vector_addition.hw.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0.xclbin" }] }
  • 23. Loading the AFI and running the OpenCL application aws ec2 describe-fpga-images --fpga-image-id afi-056fb17ddb8cedf37 { "FpgaImages": [{ "UpdateTime": "2017-10-02T16:39:17.000Z", "Name": "xclbin/vector_addition.hw.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0.xclbin", "FpgaImageGlobalId": "agfi-03a8031774fc4773f", "Public": false, "State": { "Code": "ready"}, "OwnerId": "6XXXXXXXXXXX", "FpgaImageId": "afi-056fb17ddb8cedf37", "CreateTime": "2017-10-02T16:39:17.000Z", "Description": "xclbin/vector_addition.hw.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0.xclbin" }] } sudo fpga-load-local-image -S 0 -I agfi-03a8031774fc4773f sudo fpga-describe-local-image -S 0 sudo sh source /opt/Xilinx/SDx/2017.1.rte/setup.sh ./helloworld sudo fpga-clear-local-image -S 0
  • 24. FPGAs and Deep Learning
  • 25. A chink in the GPU armor? • GPUs are great for training, but what about inference? • Throughput and latency: pick one? – Using batches increases latency – Using single samples degrades throughput • Power and memory requirements – Floating-point operations are power-hungry – Floating-point weights need more DRAM, which is power-hungry too • Neural networks can be implemented on FPGA Š HBO
  • 26. Using custom logic to Multiply and Accumulate Source: ÂŤ FPGA Implementations of Neural Networks Âť, Springer, 2006 Smaller weights  less gates, less data to load into the FPGA
  • 27. Optimizing Deep Learning models for FPGAs • Quantization: using integer weights – 8/4/2-bit integers instead of 32-bit floats – Reduces power consumption – Simplifies the logic needed to implement the model – Reduces memory usage • Pruning: removing useless connections – Increases computation speed – Reduces memory usage • Compression: encoding weights – Reduces model size On-chip SRAM becomes a viable option  More power- effcient than DRAM  Faster than off-chip DRAM
  • 28. Published results [Han, 2016] Optimizing CNNs on CPU and GPU • AlexNet 35x smaller, VGG-16 49x smaller • 3x to 4x speedup, 3x to 7x more energy-efficient • No loss of accuracy [Han, 2017] Optimizing LSTM on Xilinx FPGA • FPGA vs CPU: 43x faster, 40x more energy-efficient • FPGA vs GPU: 3x faster, 11.5x more energy-efficient [Nurvitadhi, 2017] Optimizing CNNs on Intel FPGA • FPGA vs GPU: 60% faster, 2.3x more energy-effcient • <1% loss of accuracy
  • 29. Nvidia Hardware for Deep Learning • Open architecture for DL inference accelerators on IoT devices – Convolution Core – optimized high-performance convolution engine – Single Data Processor – single-point lookup engine for activation functions – Planar Data Processor – planar averaging engine for pooling – Channel Data Processor – multi-channel averaging engine for normalization functions – Dedicated Memory and Data Reshape Engines – memory-to-memory transformation acceleration for tensor reshape and copy operations. • Verilog model + test suite • F1 instances are supportedhttp://nvdla.org/ https://github.com/nvdla/
  • 30. Conclusion • CPU, GPU, FPGA: the battle rages on • As always, pick the right tool for the job – Application requirements: performance, power, cost, etc. – Time to market – Skills – The AWS marketplace: the solution may be just a few clicks away! • AWS offers you many options, please explore them and give us feedback
  • 31. Resources https://aws.amazon.com/ec2/instance-types/f1 https://aws.amazon.com/ec2/instance-types/f1/partners/ https://github.com/aws/aws-fpga [Han, 2016] ÂŤ Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding Âť https://arxiv.org/abs/1510.00149 [Han, 2017] ÂŤ ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA Âť, Best Paper at FPGA’17 https://arxiv.org/abs/1612.00694 ÂŤ Deep Learning Tutorial and Recent Trends Âť, FPGA’17 http://isfpga.org/slides/D1_S1_Tutorial.pdf [Nurvitadhi, 2017] ÂŤ Can FPGAs Beat GPUs in Accelerating Next-Generation Deep Neural Networks? Âť, FPGA’17 http://jaewoong.org/pubs/fpga17-next-generation-dnns.pdf

Hinweis der Redaktion

  1. OK
  2. OK
  3. OK
  4. Intel 4004: 15 novembre 1971, 4-bit architecture 0.74MHz, 2300 transistors, 10um
  5. XXX is this Skylake ?
  6. OK
  7. OK
  8. Amazon EC2 instances: F1 family FPGA Developer AMI AWS SDK and HDK
  9. Available in North Virginia, Oregon and Ireland regions
  10. An F1 instance can have any number of AFIs An AFI can be loaded into the FPGA in less than 1 second
  11. Edico Genome ported their genomics platform (DRAGEN) to F1 enabling real-time genomic analysis while saving cost and dramatically scaling its availability. This offering has the potential to be transformative for hospitals, academic institutions, drug developers and sequencing centers results, as it enables them to analyze whole genome data in under an hour, which offers up to 10x improvement compared to comparable state of the art algorithms both on-prem and in the cloud.
  12. off-the-shelf images for customers revenue stream for FPGA developers