SlideShare ist ein Scribd-Unternehmen logo
1 von 33
Downloaden Sie, um offline zu lesen
Yuval Degani, LinkedIn
Dr. Jithin Jose, Microsoft Azure
Tackling Network
Bottlenecks with
Hardware Accelerations:
Cloud vs. On-Premise
#UnifiedAnalytics #SparkAISummit
Intro
• Infinite loop of removing performance road blocks
• With faster storage devices (DRAM, NVMe, SSD) and
stronger than ever processing power (CPU, GPU, ASIC),
a traditional network just can’t keep up with I/O flow
• Upgrading to higher wire speeds will rarely do the trick
• This is where co-designed hardware acceleration can be
used to truly utilize the power of a compute cluster
2#UnifiedAnalytics #SparkAISummit
Previous talks
3#UnifiedAnalytics #SparkAISummit
Spark Summit Europe 2017
First open-source stand-alone RDMA accelerated
shuffle plugin for Spark (SparkRDMA)
Spark+AI Summit North America 2018
First preview of SparkRDMA on Azure HPC
nodes, demonstrating x2.6 job speed-up on cloud
VMs
Network Bottlenecks in the Wild
4#UnifiedAnalytics #SparkAISummit
Network Bottlenecks in the Wild
• Not always caused by lack of bandwidth
• Network I/O imposes overhead in many system components:
– Memory management
– Memory copy
– Garbage Collection
– Serialization/Compression/Encryption
• Overhead=CPU cycles, cycles that are not available for the
actual job at hand
• Hardware acceleration can reduce overhead and allow better
utilization of compute and network resources
5#UnifiedAnalytics #SparkAISummit
Network Bottlenecks: Shuffle
• Most expensive non-storage
network I/O in compute clusters
• Blocking, massive movement of
transient data
• Acceleration opportunities:
– Efficient serving with reduced server-
side logic
– Serialization/Compression/Encryption
– Reduce I/O overhead and latency by
employing modern transport protocols
6#UnifiedAnalytics #SparkAISummit
Partitioning
4%
Input
11%
Shuffle
Read
57%
Output
28%
HiBench TeraSort on Spark
Network Bottlenecks: Distributed
Training
• Model updates create massive
network traffic
• Model update frequency rises
as GPUs get faster
• Acceleration opportunities:
– Inter-GPU RDMA communication
– Lower latency network transport
– Collectives offloads
7#UnifiedAnalytics #SparkAISummit
K80
M60
V100
ResNet 269*
Total Time GPU Active Time
* “Parameter Hub: High Performance Parameter Servers for Efficient Distributed Deep Neural Network Training” by Luo et al.
Network Bottlenecks: Storage
• Massive data movement
• Premium devices (DRAM, Flash) provide storage
access speeds that were never seen before
• Acceleration opportunities:
– Higher bandwidth
– Reduced transport overhead
– OS/CPU bypass – direct storage access from network
devices
8#UnifiedAnalytics #SparkAISummit
Major Hardware Acceleration
Technologies
9#UnifiedAnalytics #SparkAISummit
Speeds
• 1, 10, 25, 40, 100, 200Gbps
• Faster network doesn’t
necessarily mean a faster
runtime
• Many workloads consist of
relatively short bursts rather
than sustainable throughput:
higher bandwidth may not have
any effect
10#UnifiedAnalytics #SparkAISummit
0
100
200
300
400
500
600
700
800
Flink
TeraSort
Flink
PageRank
PowerGraph
PageRank
Timely
PageRank
Effect of network speed
on workload runtime*
1GbE 10GbE 40GbE
* “On The [Ir]relevance of Network Performance for Data Processing” by Trivedi et al.
InfiniBand
• De-facto standard in the HPC world
• FDR: 56Gbps, EDR: 100Gbps, HDR:
200Gbps
• Sub-microsecond latency
• Native support for RDMA
• HW accelerated transport layer
• True SDN: standard fabric components are
developed as open-source and are cross-
platform
• Native support for Switch collectives offload
11#UnifiedAnalytics #SparkAISummit
Ethernet
23%
InfiniBand
38%
Custom
28%
Omnipath
10%
Proprietary
1%
TOP500 Supercomputers
Interconnect Performance
Share*
* www.top500.org
RDMA
• Remote Direct Memory Access
– Read/write from/to remote memory locations
• Zero-copy
• Direct hardware interface – bypasses the
kernel and TCP/IP in IO path
• Flow control and reliability is offloaded in
hardware
• Supported on almost all mid-range/high-
end network adapters: both InfiniBand
and Ethernet
12
Java app
buffer
OS
Sockets
TCP/IP
Driver
Network Adapter
RDMA
Socket
Context switch
#UnifiedAnalytics #SparkAISummit
NVIDIA GPUDirect
• Direct DMA over PCIe
• RDMA devices can write/read
directly to/from GPU memory
over the network
• No CPU overhead
• Zero-copy
13#UnifiedAnalytics #SparkAISummit
GPUDirect
Non-GPUDirect
NIC GPU
CPU
“Smart NIC” – FPGA/ASIC Offloads
• FPGA – tailor-made accelerations
• ASIC – less flexibility, better performance
• Common use cases:
– I/O: Serialization, compression, encryption offloads
– Data: Aggregation, sorting, group-by, reduce
• Deployment options:
– Pipeline
– Look-aside
– Bump-on-the-wire
14#UnifiedAnalytics #SparkAISummit
“Smart Switch”
• In-network processing
– Data reduction during movement
– Wire-speed
• Generic: MPI Switch Collectives Offloads (e.g.
Mellanox SHArP)
• Per-workload: Programmable switches (e.g.
Barefoot Tofino)
– Example: Network-Accelerated Query Processing
15#UnifiedAnalytics #SparkAISummit
NVMeOF
• Network protocol for NVM
express disks (PCIe)
• Uses RDMA to provide direct
NIC<->Disk access
• Completely bypasses the host
• Minimal latency differences
between local and remote access
16#UnifiedAnalytics #SparkAISummit
NVMeOF
Traditional
NIC
CPU
Azure Network Acceleration
Offering
17#UnifiedAnalytics #SparkAISummit
Offer ‘Bare Metal’ Experience
– Azure HPC Solution
#UnifiedAnalytics #SparkAISummit 18
Eliminate Jitter
Host holdback is a start, but must
completely isolate guest from host
Minroot & CPU Groups; separated
host and guest VM sandboxes
Full Network Experience
Enable customers to use Mellanox or
OFED drivers
Supports all MPI types and versions
Leverage hardware offload to
Mellanox InfiniBand ASIC
Transparent Exposure of
Hardware
Core N in guest VM should =
Core N in silicon
1:1 between physical pNUMA
topology and vNUMA topology
Latest Azure HPC Offerings – HB/HC
HB Series (AMD EPYC) HC Series (Intel Xeon Platinum)
Workloads Targets Bandwidth Intensive Compute Intensive
Core Count 60 44
System Memory 240 GB 352 GB
Network 100 Gbps EDR InfiniBand, 40 Gbps Ethernet
Storage Support Standard / Premium Azure Storage, and 700GB Local SSD
OS Support for RDMA CentOS/RHEL, Ubuntu, SLES 12, Windows
MPI Support
OpenMPI, HPC-X, MVAPICH2, MPICH,
Intel MPI, PlatformMPI, Microsoft MPI
Hardware Collectives Enabled
Access Model
Azure CLI, ARM template, Azure CycleCloud,
Azure Batch, Partner Platform
19#UnifiedAnalytics #SparkAISummit
Other Azure HPC Highlights
• SR-IOV going broad
– All HPC SKUs will support SR-IOV
– Driver/SKU Performance Optimizations
• GPUs
– Latest NDv2 Series
• 8 Nvidia Tesla v100 NVLINK interconnected GPUs
• Intel Skylake, 672 GB Memory
• Excellent platform for HPC and AI workloads
• Azure FPGA
– Based on Project Brainwave
– Deploy model to Azure FPGA, Reconfigure for different models
– Supports ResNet 50, ResNet 152, DenseNet-121, and VGG-16
20#UnifiedAnalytics #SparkAISummit
Accelerate Your Framework
21#UnifiedAnalytics #SparkAISummit
MPI Microbenchmarks
22#UnifiedAnalytics #SparkAISummit
• Experiments on HC cluster
• OSU Benchmarks 5.6.1
• OpenMPI (4.0.0) + UCX (1.5.0)
• MPI ranks pinned nearer to HCA
1.77 us
12 GB/s
• MPI Latency (4 B) – 1.77us
• Getting even better later this year
• MPI Bandwidth (4 MB) – 12.06 GB/s
0
2000
4000
6000
8000
10000
12000
14000
1
2
4
8
16
32
64
128
256
512
1K
2K
4K
8K
16K
32K
64K
128K
256K
512K
1M
2M
4M
Bandwidth(MB/s)
Message Size (bytes)
MPI Bandwidth
Ethernet (40 Gbps)
IPoIB (100 Gbps)
RDMA (100 Gbps)
0
10
20
30
40
50
60
70
80
90
0 1 2 4 8 16 32 64 128 256 512 1K 2K
Time(us)
Message Size (bytes)
MPI Latency
Ethernet (40 Gbps)
IPoIB (100 Gbps)
RDMA (100 Gbps)
SparkRDMA
• RDMA-powered ShuffleManager
plugin for Apache Spark
• Similarly spec 8 node cluster:
– On-prem: 100GbE RoCE
– Cloud: Azure ”h16mr” instances with
56Gbps InfiniBand
• https://github.com/Mellanox/SparkRDMA
23#UnifiedAnalytics #SparkAISummit
0 1000 2000
TeraSort 320GB
PageRank 19GB
On-prem non-RDMA 100GbE
On-prem RDMA 100GbE
Azure IPoIB 56Gbps
Azure RDMA 56Gbps
SparkRDMA on Azure
• Azure HC cluster:
– 100 Gbps InfiniBand
– 16 Spark Workers/HDFS DataNodes
– Separate NameNode
– Data folder hosted on SSD
– HiBench Benchmarks (gigantic)
• Spark 2.4.0, Hadoop 2.7.7, SparkRDMA 3.1
24#UnifiedAnalytics #SparkAISummit
0 100 200 300 400 500 600
TeraSort - 320 GB
PageRank - 19GB
Execution Time (s)
RDMA (100 Gbps)
IPoIB (100 Gbps)
HDFS-RDMA on Azure
25#UnifiedAnalytics #SparkAISummit
• OSU HDFS RDMA 0.9.1
• Based on Hadoop 3.0.0
• http://hibd.cse.ohio-state.edu/#hadoop3
• HDFS on HC cluster
• 1 NameNode
• 16 DataNodes
• Data folder hosted on SSD
• Packet Size: 128KB
• Containers per Node: 32 0
50
100
150
200
250
300
350
400
512GB 640GB 768GB 896GB 1TB
Time(sec)
Size (bytes)
TestDFSIO (Write) Execution Time
Ethernet (40 Gbps)
IPoIB (100 Gbps)
RDMA (100 Gbps)
Memcached-RDMA on Azure
26#UnifiedAnalytics #SparkAISummit
• OSU Memcached RDMA 0.9.6
• Based on Memcached 1.5.3 and
libmemcached 1.0.18
• http://hibd.cse.ohio-state.edu/#memcached
• Experiment run on HC Nodes
• Memcached GET (8 B) Latency – 5.5us
• Memcached SET (8 B) Latency – 6.45us
0
20
40
60
80
100
120
140
160
180
1 2 4 8 16 32 64 128 256 512 1K 2K 4K
Latency(us)
Message Size (bytes)
Memcached GET
0
20
40
60
80
100
120
140
160
180
1 2 4 8 16 32 64 128 256 512 1K 2K 4K
Latency(us)
Message Size (bytes)
Memcached SET
Ethernet (40 Gbps) IPoIB (100 Gbps)
RDMA (100 Gbps)
Kafka-RDMA on Azure
27#UnifiedAnalytics #SparkAISummit
• OSU Kafka RDMA 0.9.1
• Based on Apache Kafka 1.0.0
• http://hibd.cse.ohio-state.edu/#kafka
• HC cluster
• Broker with 100 GB Ramdisk
• Record Size – 100 bytes
• Number of Records – 500000
0
50
100
150
200
250
300
350
400
Producer
Time(s)
Kafka Producer Latency
IPoIB (100 Gbps) RDMA (100 Gbps)
0
10
20
30
40
50
60
70
Producer
Bandwidth(MB/s)
Kafka Producer Bandwidth
IPoIB (100 Gbps) RDMA (100 Gbps)
Horovod on Azure
28#UnifiedAnalytics #SparkAISummit
• Tensorflow 1.13
– ResNet-50 Training
– Partial ImageNet Data
– Batch Size = 64 per worker
– 2 workers per node
– Total batches 100
– CPU only version
• HC Cluster
– OpenMPI 4.0 + UCX 1.5
– Singularity container
• ~97% Scaling efficiency
100.00
96.78
95.58 94.93
100.00
98.86 98.37
96.94
50.00
55.00
60.00
65.00
70.00
75.00
80.00
85.00
90.00
95.00
100.00
0
200
400
600
800
1000
1200
1400
1600
2 4 8 16
%Efficiency
Images/second
# nodes
IPoIB (100 Gbps)
RDMA (100 Gbps)
IPoIB Efficiency
RDMA Efficiency
Wrapping up
29#UnifiedAnalytics #SparkAISummit
What’s available on major clouds?
Technology Azure AWS GCP
Network speeds 100Gbps 100Gbps 20Gbps?
InfiniBand ✔ ! !
RDMA ✔ (limited) !
GPUDirect ! (single host) !
Smart NIC ! ! !
Smart Switch ! ! !
NVMeOF ! ! !
30#UnifiedAnalytics #SparkAISummit
Take-aways
• Accelerated Frameworks:
– SparkRDMA on GitHub
– High Performance Big Data (From OSU)
– Horovod
• Azure instances
– Azure HPC HB/HC
– Azure NDv2 GPUs
– Azure FPGA
31#UnifiedAnalytics #SparkAISummit
Questions?
32#UnifiedAnalytics #SparkAISummit
DON’T FORGET TO RATE
AND REVIEW THE SESSIONS
SEARCH SPARK + AI SUMMIT

Weitere ähnliche Inhalte

Was ist angesagt?

Improving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationImproving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationDataWorks Summit
 
Improving Hadoop Performance via Linux
Improving Hadoop Performance via LinuxImproving Hadoop Performance via Linux
Improving Hadoop Performance via LinuxAlex Moundalexis
 
Backup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
Backup management with Ceph Storage - Camilo Echevarne, Félix BarbeiraBackup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
Backup management with Ceph Storage - Camilo Echevarne, Félix BarbeiraCeph Community
 
Ceph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer SpotlightCeph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer SpotlightColleen Corrice
 
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureCeph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureDanielle Womboldt
 
Hello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopHello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopDataWorks Summit
 
Infrastructure optimization for seismic processing (eng)
Infrastructure optimization for seismic processing (eng)Infrastructure optimization for seismic processing (eng)
Infrastructure optimization for seismic processing (eng)Vsevolod Shabad
 
2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on Ceph2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on CephCeph Community
 
A fun cup of joe with open liberty
A fun cup of joe with open libertyA fun cup of joe with open liberty
A fun cup of joe with open libertyAndy Mauer
 
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화OpenStack Korea Community
 
UberCloud HPC Experiment Introduction for Beginners
UberCloud HPC Experiment Introduction for BeginnersUberCloud HPC Experiment Introduction for Beginners
UberCloud HPC Experiment Introduction for Beginnershpcexperiment
 
Multi Master PostgreSQL Cluster on Kubernetes
Multi Master PostgreSQL Cluster on KubernetesMulti Master PostgreSQL Cluster on Kubernetes
Multi Master PostgreSQL Cluster on KubernetesOhyama Masanori
 
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on CephBuild an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on CephRongze Zhu
 
Inside Microsoft's FPGA-Based Configurable Cloud
Inside Microsoft's FPGA-Based Configurable CloudInside Microsoft's FPGA-Based Configurable Cloud
Inside Microsoft's FPGA-Based Configurable Cloudinside-BigData.com
 
Ceph Day Melabourne - Community Update
Ceph Day Melabourne - Community UpdateCeph Day Melabourne - Community Update
Ceph Day Melabourne - Community UpdateCeph Community
 
Red Hat Storage Day New York - What's New in Red Hat Ceph Storage
Red Hat Storage Day New York - What's New in Red Hat Ceph StorageRed Hat Storage Day New York - What's New in Red Hat Ceph Storage
Red Hat Storage Day New York - What's New in Red Hat Ceph StorageRed_Hat_Storage
 
Ceph Day Beijing - SPDK for Ceph
Ceph Day Beijing - SPDK for CephCeph Day Beijing - SPDK for Ceph
Ceph Day Beijing - SPDK for CephDanielle Womboldt
 
Which Hypervisor is Best?
Which Hypervisor is Best?Which Hypervisor is Best?
Which Hypervisor is Best?Kyle Bader
 
Hadoop Performance at LinkedIn
Hadoop Performance at LinkedInHadoop Performance at LinkedIn
Hadoop Performance at LinkedInAllen Wittenauer
 

Was ist angesagt? (20)

Improving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationImproving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux Configuration
 
Improving Hadoop Performance via Linux
Improving Hadoop Performance via LinuxImproving Hadoop Performance via Linux
Improving Hadoop Performance via Linux
 
Backup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
Backup management with Ceph Storage - Camilo Echevarne, Félix BarbeiraBackup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
Backup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
 
Ceph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer SpotlightCeph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer Spotlight
 
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureCeph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
 
Hello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopHello OpenStack, Meet Hadoop
Hello OpenStack, Meet Hadoop
 
Infrastructure optimization for seismic processing (eng)
Infrastructure optimization for seismic processing (eng)Infrastructure optimization for seismic processing (eng)
Infrastructure optimization for seismic processing (eng)
 
2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on Ceph2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on Ceph
 
A fun cup of joe with open liberty
A fun cup of joe with open libertyA fun cup of joe with open liberty
A fun cup of joe with open liberty
 
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
 
UberCloud HPC Experiment Introduction for Beginners
UberCloud HPC Experiment Introduction for BeginnersUberCloud HPC Experiment Introduction for Beginners
UberCloud HPC Experiment Introduction for Beginners
 
Multi Master PostgreSQL Cluster on Kubernetes
Multi Master PostgreSQL Cluster on KubernetesMulti Master PostgreSQL Cluster on Kubernetes
Multi Master PostgreSQL Cluster on Kubernetes
 
5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance
 
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on CephBuild an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
 
Inside Microsoft's FPGA-Based Configurable Cloud
Inside Microsoft's FPGA-Based Configurable CloudInside Microsoft's FPGA-Based Configurable Cloud
Inside Microsoft's FPGA-Based Configurable Cloud
 
Ceph Day Melabourne - Community Update
Ceph Day Melabourne - Community UpdateCeph Day Melabourne - Community Update
Ceph Day Melabourne - Community Update
 
Red Hat Storage Day New York - What's New in Red Hat Ceph Storage
Red Hat Storage Day New York - What's New in Red Hat Ceph StorageRed Hat Storage Day New York - What's New in Red Hat Ceph Storage
Red Hat Storage Day New York - What's New in Red Hat Ceph Storage
 
Ceph Day Beijing - SPDK for Ceph
Ceph Day Beijing - SPDK for CephCeph Day Beijing - SPDK for Ceph
Ceph Day Beijing - SPDK for Ceph
 
Which Hypervisor is Best?
Which Hypervisor is Best?Which Hypervisor is Best?
Which Hypervisor is Best?
 
Hadoop Performance at LinkedIn
Hadoop Performance at LinkedInHadoop Performance at LinkedIn
Hadoop Performance at LinkedIn
 

Ähnlich wie Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise

Optimizing Performance and Computing Resource Efficiency of In-Memory Big Dat...
Optimizing Performance and Computing Resource Efficiency of In-Memory Big Dat...Optimizing Performance and Computing Resource Efficiency of In-Memory Big Dat...
Optimizing Performance and Computing Resource Efficiency of In-Memory Big Dat...Databricks
 
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)Ontico
 
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...Databricks
 
Accelerating Shuffle: A Tailor-Made RDMA Solution for Apache Spark with Yuval...
Accelerating Shuffle: A Tailor-Made RDMA Solution for Apache Spark with Yuval...Accelerating Shuffle: A Tailor-Made RDMA Solution for Apache Spark with Yuval...
Accelerating Shuffle: A Tailor-Made RDMA Solution for Apache Spark with Yuval...Spark Summit
 
FPGAs in the cloud? (October 2017)
FPGAs in the cloud? (October 2017)FPGAs in the cloud? (October 2017)
FPGAs in the cloud? (October 2017)Julien SIMON
 
Choose Your Weapon: Comparing Spark on FPGAs vs GPUs
Choose Your Weapon: Comparing Spark on FPGAs vs GPUsChoose Your Weapon: Comparing Spark on FPGAs vs GPUs
Choose Your Weapon: Comparing Spark on FPGAs vs GPUsDatabricks
 
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAdvancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAlluxio, Inc.
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDKKernel TLV
 
Optimized HPC/AI cloud with OpenStack acceleration service and composable har...
Optimized HPC/AI cloud with OpenStack acceleration service and composable har...Optimized HPC/AI cloud with OpenStack acceleration service and composable har...
Optimized HPC/AI cloud with OpenStack acceleration service and composable har...Shuquan Huang
 
Elastify Cloud-Native Spark Application with Persistent Memory
Elastify Cloud-Native Spark Application with Persistent MemoryElastify Cloud-Native Spark Application with Persistent Memory
Elastify Cloud-Native Spark Application with Persistent MemoryDatabricks
 
Design installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttuDesign installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttuAlan Sill
 
[OpenStack Days Korea 2016] Track3 - OpenStack on 64-bit ARM with X-Gene
[OpenStack Days Korea 2016] Track3 - OpenStack on 64-bit ARM with X-Gene[OpenStack Days Korea 2016] Track3 - OpenStack on 64-bit ARM with X-Gene
[OpenStack Days Korea 2016] Track3 - OpenStack on 64-bit ARM with X-GeneOpenStack Korea Community
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyPeter Clapham
 
Building a High Performance Analytics Platform
Building a High Performance Analytics PlatformBuilding a High Performance Analytics Platform
Building a High Performance Analytics PlatformSantanu Dey
 
Exploration of Radars and Software Defined Radios using VisualSim
Exploration of  Radars and Software Defined Radios using VisualSimExploration of  Radars and Software Defined Radios using VisualSim
Exploration of Radars and Software Defined Radios using VisualSimDeepak Shankar
 
High performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHigh performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHungWei Chiu
 
AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...
AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...
AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...Amazon Web Services
 
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...PROIDEA
 
6 open capi_meetup_in_japan_final
6 open capi_meetup_in_japan_final6 open capi_meetup_in_japan_final
6 open capi_meetup_in_japan_finalYutaka Kawai
 

Ähnlich wie Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise (20)

Optimizing Performance and Computing Resource Efficiency of In-Memory Big Dat...
Optimizing Performance and Computing Resource Efficiency of In-Memory Big Dat...Optimizing Performance and Computing Resource Efficiency of In-Memory Big Dat...
Optimizing Performance and Computing Resource Efficiency of In-Memory Big Dat...
 
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
 
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
 
Accelerating Shuffle: A Tailor-Made RDMA Solution for Apache Spark with Yuval...
Accelerating Shuffle: A Tailor-Made RDMA Solution for Apache Spark with Yuval...Accelerating Shuffle: A Tailor-Made RDMA Solution for Apache Spark with Yuval...
Accelerating Shuffle: A Tailor-Made RDMA Solution for Apache Spark with Yuval...
 
FPGAs in the cloud? (October 2017)
FPGAs in the cloud? (October 2017)FPGAs in the cloud? (October 2017)
FPGAs in the cloud? (October 2017)
 
Choose Your Weapon: Comparing Spark on FPGAs vs GPUs
Choose Your Weapon: Comparing Spark on FPGAs vs GPUsChoose Your Weapon: Comparing Spark on FPGAs vs GPUs
Choose Your Weapon: Comparing Spark on FPGAs vs GPUs
 
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAdvancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
 
uCluster
uClusteruCluster
uCluster
 
Optimized HPC/AI cloud with OpenStack acceleration service and composable har...
Optimized HPC/AI cloud with OpenStack acceleration service and composable har...Optimized HPC/AI cloud with OpenStack acceleration service and composable har...
Optimized HPC/AI cloud with OpenStack acceleration service and composable har...
 
Elastify Cloud-Native Spark Application with Persistent Memory
Elastify Cloud-Native Spark Application with Persistent MemoryElastify Cloud-Native Spark Application with Persistent Memory
Elastify Cloud-Native Spark Application with Persistent Memory
 
Design installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttuDesign installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttu
 
[OpenStack Days Korea 2016] Track3 - OpenStack on 64-bit ARM with X-Gene
[OpenStack Days Korea 2016] Track3 - OpenStack on 64-bit ARM with X-Gene[OpenStack Days Korea 2016] Track3 - OpenStack on 64-bit ARM with X-Gene
[OpenStack Days Korea 2016] Track3 - OpenStack on 64-bit ARM with X-Gene
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
 
Building a High Performance Analytics Platform
Building a High Performance Analytics PlatformBuilding a High Performance Analytics Platform
Building a High Performance Analytics Platform
 
Exploration of Radars and Software Defined Radios using VisualSim
Exploration of  Radars and Software Defined Radios using VisualSimExploration of  Radars and Software Defined Radios using VisualSim
Exploration of Radars and Software Defined Radios using VisualSim
 
High performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHigh performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User Group
 
AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...
AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...
AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...
 
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
 
6 open capi_meetup_in_japan_final
6 open capi_meetup_in_japan_final6 open capi_meetup_in_japan_final
6 open capi_meetup_in_japan_final
 

Mehr von Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

Mehr von Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Kürzlich hochgeladen

Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 

Kürzlich hochgeladen (20)

Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 

Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise

  • 1. Yuval Degani, LinkedIn Dr. Jithin Jose, Microsoft Azure Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise #UnifiedAnalytics #SparkAISummit
  • 2. Intro • Infinite loop of removing performance road blocks • With faster storage devices (DRAM, NVMe, SSD) and stronger than ever processing power (CPU, GPU, ASIC), a traditional network just can’t keep up with I/O flow • Upgrading to higher wire speeds will rarely do the trick • This is where co-designed hardware acceleration can be used to truly utilize the power of a compute cluster 2#UnifiedAnalytics #SparkAISummit
  • 3. Previous talks 3#UnifiedAnalytics #SparkAISummit Spark Summit Europe 2017 First open-source stand-alone RDMA accelerated shuffle plugin for Spark (SparkRDMA) Spark+AI Summit North America 2018 First preview of SparkRDMA on Azure HPC nodes, demonstrating x2.6 job speed-up on cloud VMs
  • 4. Network Bottlenecks in the Wild 4#UnifiedAnalytics #SparkAISummit
  • 5. Network Bottlenecks in the Wild • Not always caused by lack of bandwidth • Network I/O imposes overhead in many system components: – Memory management – Memory copy – Garbage Collection – Serialization/Compression/Encryption • Overhead=CPU cycles, cycles that are not available for the actual job at hand • Hardware acceleration can reduce overhead and allow better utilization of compute and network resources 5#UnifiedAnalytics #SparkAISummit
  • 6. Network Bottlenecks: Shuffle • Most expensive non-storage network I/O in compute clusters • Blocking, massive movement of transient data • Acceleration opportunities: – Efficient serving with reduced server- side logic – Serialization/Compression/Encryption – Reduce I/O overhead and latency by employing modern transport protocols 6#UnifiedAnalytics #SparkAISummit Partitioning 4% Input 11% Shuffle Read 57% Output 28% HiBench TeraSort on Spark
  • 7. Network Bottlenecks: Distributed Training • Model updates create massive network traffic • Model update frequency rises as GPUs get faster • Acceleration opportunities: – Inter-GPU RDMA communication – Lower latency network transport – Collectives offloads 7#UnifiedAnalytics #SparkAISummit K80 M60 V100 ResNet 269* Total Time GPU Active Time * “Parameter Hub: High Performance Parameter Servers for Efficient Distributed Deep Neural Network Training” by Luo et al.
  • 8. Network Bottlenecks: Storage • Massive data movement • Premium devices (DRAM, Flash) provide storage access speeds that were never seen before • Acceleration opportunities: – Higher bandwidth – Reduced transport overhead – OS/CPU bypass – direct storage access from network devices 8#UnifiedAnalytics #SparkAISummit
  • 10. Speeds • 1, 10, 25, 40, 100, 200Gbps • Faster network doesn’t necessarily mean a faster runtime • Many workloads consist of relatively short bursts rather than sustainable throughput: higher bandwidth may not have any effect 10#UnifiedAnalytics #SparkAISummit 0 100 200 300 400 500 600 700 800 Flink TeraSort Flink PageRank PowerGraph PageRank Timely PageRank Effect of network speed on workload runtime* 1GbE 10GbE 40GbE * “On The [Ir]relevance of Network Performance for Data Processing” by Trivedi et al.
  • 11. InfiniBand • De-facto standard in the HPC world • FDR: 56Gbps, EDR: 100Gbps, HDR: 200Gbps • Sub-microsecond latency • Native support for RDMA • HW accelerated transport layer • True SDN: standard fabric components are developed as open-source and are cross- platform • Native support for Switch collectives offload 11#UnifiedAnalytics #SparkAISummit Ethernet 23% InfiniBand 38% Custom 28% Omnipath 10% Proprietary 1% TOP500 Supercomputers Interconnect Performance Share* * www.top500.org
  • 12. RDMA • Remote Direct Memory Access – Read/write from/to remote memory locations • Zero-copy • Direct hardware interface – bypasses the kernel and TCP/IP in IO path • Flow control and reliability is offloaded in hardware • Supported on almost all mid-range/high- end network adapters: both InfiniBand and Ethernet 12 Java app buffer OS Sockets TCP/IP Driver Network Adapter RDMA Socket Context switch #UnifiedAnalytics #SparkAISummit
  • 13. NVIDIA GPUDirect • Direct DMA over PCIe • RDMA devices can write/read directly to/from GPU memory over the network • No CPU overhead • Zero-copy 13#UnifiedAnalytics #SparkAISummit GPUDirect Non-GPUDirect NIC GPU CPU
  • 14. “Smart NIC” – FPGA/ASIC Offloads • FPGA – tailor-made accelerations • ASIC – less flexibility, better performance • Common use cases: – I/O: Serialization, compression, encryption offloads – Data: Aggregation, sorting, group-by, reduce • Deployment options: – Pipeline – Look-aside – Bump-on-the-wire 14#UnifiedAnalytics #SparkAISummit
  • 15. “Smart Switch” • In-network processing – Data reduction during movement – Wire-speed • Generic: MPI Switch Collectives Offloads (e.g. Mellanox SHArP) • Per-workload: Programmable switches (e.g. Barefoot Tofino) – Example: Network-Accelerated Query Processing 15#UnifiedAnalytics #SparkAISummit
  • 16. NVMeOF • Network protocol for NVM express disks (PCIe) • Uses RDMA to provide direct NIC<->Disk access • Completely bypasses the host • Minimal latency differences between local and remote access 16#UnifiedAnalytics #SparkAISummit NVMeOF Traditional NIC CPU
  • 18. Offer ‘Bare Metal’ Experience – Azure HPC Solution #UnifiedAnalytics #SparkAISummit 18 Eliminate Jitter Host holdback is a start, but must completely isolate guest from host Minroot & CPU Groups; separated host and guest VM sandboxes Full Network Experience Enable customers to use Mellanox or OFED drivers Supports all MPI types and versions Leverage hardware offload to Mellanox InfiniBand ASIC Transparent Exposure of Hardware Core N in guest VM should = Core N in silicon 1:1 between physical pNUMA topology and vNUMA topology
  • 19. Latest Azure HPC Offerings – HB/HC HB Series (AMD EPYC) HC Series (Intel Xeon Platinum) Workloads Targets Bandwidth Intensive Compute Intensive Core Count 60 44 System Memory 240 GB 352 GB Network 100 Gbps EDR InfiniBand, 40 Gbps Ethernet Storage Support Standard / Premium Azure Storage, and 700GB Local SSD OS Support for RDMA CentOS/RHEL, Ubuntu, SLES 12, Windows MPI Support OpenMPI, HPC-X, MVAPICH2, MPICH, Intel MPI, PlatformMPI, Microsoft MPI Hardware Collectives Enabled Access Model Azure CLI, ARM template, Azure CycleCloud, Azure Batch, Partner Platform 19#UnifiedAnalytics #SparkAISummit
  • 20. Other Azure HPC Highlights • SR-IOV going broad – All HPC SKUs will support SR-IOV – Driver/SKU Performance Optimizations • GPUs – Latest NDv2 Series • 8 Nvidia Tesla v100 NVLINK interconnected GPUs • Intel Skylake, 672 GB Memory • Excellent platform for HPC and AI workloads • Azure FPGA – Based on Project Brainwave – Deploy model to Azure FPGA, Reconfigure for different models – Supports ResNet 50, ResNet 152, DenseNet-121, and VGG-16 20#UnifiedAnalytics #SparkAISummit
  • 22. MPI Microbenchmarks 22#UnifiedAnalytics #SparkAISummit • Experiments on HC cluster • OSU Benchmarks 5.6.1 • OpenMPI (4.0.0) + UCX (1.5.0) • MPI ranks pinned nearer to HCA 1.77 us 12 GB/s • MPI Latency (4 B) – 1.77us • Getting even better later this year • MPI Bandwidth (4 MB) – 12.06 GB/s 0 2000 4000 6000 8000 10000 12000 14000 1 2 4 8 16 32 64 128 256 512 1K 2K 4K 8K 16K 32K 64K 128K 256K 512K 1M 2M 4M Bandwidth(MB/s) Message Size (bytes) MPI Bandwidth Ethernet (40 Gbps) IPoIB (100 Gbps) RDMA (100 Gbps) 0 10 20 30 40 50 60 70 80 90 0 1 2 4 8 16 32 64 128 256 512 1K 2K Time(us) Message Size (bytes) MPI Latency Ethernet (40 Gbps) IPoIB (100 Gbps) RDMA (100 Gbps)
  • 23. SparkRDMA • RDMA-powered ShuffleManager plugin for Apache Spark • Similarly spec 8 node cluster: – On-prem: 100GbE RoCE – Cloud: Azure ”h16mr” instances with 56Gbps InfiniBand • https://github.com/Mellanox/SparkRDMA 23#UnifiedAnalytics #SparkAISummit 0 1000 2000 TeraSort 320GB PageRank 19GB On-prem non-RDMA 100GbE On-prem RDMA 100GbE Azure IPoIB 56Gbps Azure RDMA 56Gbps
  • 24. SparkRDMA on Azure • Azure HC cluster: – 100 Gbps InfiniBand – 16 Spark Workers/HDFS DataNodes – Separate NameNode – Data folder hosted on SSD – HiBench Benchmarks (gigantic) • Spark 2.4.0, Hadoop 2.7.7, SparkRDMA 3.1 24#UnifiedAnalytics #SparkAISummit 0 100 200 300 400 500 600 TeraSort - 320 GB PageRank - 19GB Execution Time (s) RDMA (100 Gbps) IPoIB (100 Gbps)
  • 25. HDFS-RDMA on Azure 25#UnifiedAnalytics #SparkAISummit • OSU HDFS RDMA 0.9.1 • Based on Hadoop 3.0.0 • http://hibd.cse.ohio-state.edu/#hadoop3 • HDFS on HC cluster • 1 NameNode • 16 DataNodes • Data folder hosted on SSD • Packet Size: 128KB • Containers per Node: 32 0 50 100 150 200 250 300 350 400 512GB 640GB 768GB 896GB 1TB Time(sec) Size (bytes) TestDFSIO (Write) Execution Time Ethernet (40 Gbps) IPoIB (100 Gbps) RDMA (100 Gbps)
  • 26. Memcached-RDMA on Azure 26#UnifiedAnalytics #SparkAISummit • OSU Memcached RDMA 0.9.6 • Based on Memcached 1.5.3 and libmemcached 1.0.18 • http://hibd.cse.ohio-state.edu/#memcached • Experiment run on HC Nodes • Memcached GET (8 B) Latency – 5.5us • Memcached SET (8 B) Latency – 6.45us 0 20 40 60 80 100 120 140 160 180 1 2 4 8 16 32 64 128 256 512 1K 2K 4K Latency(us) Message Size (bytes) Memcached GET 0 20 40 60 80 100 120 140 160 180 1 2 4 8 16 32 64 128 256 512 1K 2K 4K Latency(us) Message Size (bytes) Memcached SET Ethernet (40 Gbps) IPoIB (100 Gbps) RDMA (100 Gbps)
  • 27. Kafka-RDMA on Azure 27#UnifiedAnalytics #SparkAISummit • OSU Kafka RDMA 0.9.1 • Based on Apache Kafka 1.0.0 • http://hibd.cse.ohio-state.edu/#kafka • HC cluster • Broker with 100 GB Ramdisk • Record Size – 100 bytes • Number of Records – 500000 0 50 100 150 200 250 300 350 400 Producer Time(s) Kafka Producer Latency IPoIB (100 Gbps) RDMA (100 Gbps) 0 10 20 30 40 50 60 70 Producer Bandwidth(MB/s) Kafka Producer Bandwidth IPoIB (100 Gbps) RDMA (100 Gbps)
  • 28. Horovod on Azure 28#UnifiedAnalytics #SparkAISummit • Tensorflow 1.13 – ResNet-50 Training – Partial ImageNet Data – Batch Size = 64 per worker – 2 workers per node – Total batches 100 – CPU only version • HC Cluster – OpenMPI 4.0 + UCX 1.5 – Singularity container • ~97% Scaling efficiency 100.00 96.78 95.58 94.93 100.00 98.86 98.37 96.94 50.00 55.00 60.00 65.00 70.00 75.00 80.00 85.00 90.00 95.00 100.00 0 200 400 600 800 1000 1200 1400 1600 2 4 8 16 %Efficiency Images/second # nodes IPoIB (100 Gbps) RDMA (100 Gbps) IPoIB Efficiency RDMA Efficiency
  • 30. What’s available on major clouds? Technology Azure AWS GCP Network speeds 100Gbps 100Gbps 20Gbps? InfiniBand ✔ ! ! RDMA ✔ (limited) ! GPUDirect ! (single host) ! Smart NIC ! ! ! Smart Switch ! ! ! NVMeOF ! ! ! 30#UnifiedAnalytics #SparkAISummit
  • 31. Take-aways • Accelerated Frameworks: – SparkRDMA on GitHub – High Performance Big Data (From OSU) – Horovod • Azure instances – Azure HPC HB/HC – Azure NDv2 GPUs – Azure FPGA 31#UnifiedAnalytics #SparkAISummit
  • 33. DON’T FORGET TO RATE AND REVIEW THE SESSIONS SEARCH SPARK + AI SUMMIT