SlideShare ist ein Scribd-Unternehmen logo
1 von 40
Downloaden Sie, um offline zu lesen
Dror Goldenberg, March 2016, HPCAC Swiss
Co-Design Architecture
Emergence of New Co-Processors
© 2016 Mellanox Technologies 2
Co-Design Architecture to Enable Exascale Performance
CPU-Centric Co-Design
Limited to Main CPU Usage
Results in Performance Limitation
Creating Synergies
Enables Higher Performance and Scale
Software
Software
In-CPU
Computing
In-Network
Computing
In-Storage
Computing
© 2016 Mellanox Technologies 3
The Intelligence is Moving to the Interconnect
CPU
Interconnect
Past Future
© 2016 Mellanox Technologies 4
Intelligent Interconnect Delivers Higher Datacenter ROI
Users
NETWORK
COMPUTING
NETWORK
Users
Intelligence
Network Offloads
Computing for applications
Smart Network
Increase Datacenter Value
Network functions
On CPU
COMPUTING
© 2016 Mellanox Technologies 5
Breaking the Application Latency Wall
§ Today: Network device latencies are on the order of 100 nanoseconds
§ Challenge: Enabling the next order of magnitude improvement in application performance
§ Solution: Creating synergies between software and hardware – intelligent interconnect
Intelligent Interconnect Paves the Road to Exascale Performance
10 years ago
~10
microsecond
~100
microsecond
NetworkCommunication
Framework
Today
~10
microsecond
Communication
Framework
~0.1
microsecond
Network
~1
microsecond
Communication
Framework
Future
~0.05
microsecond
Co-Design
Network
© 2016 Mellanox Technologies 6
Introducing Switch-IB 2 World’s First Smart Switch
© 2016 Mellanox Technologies 7
Introducing Switch-IB 2 World’s First Smart Switch
§ The world fastest switch with <90 nanosecond latency
§ 36-ports, 100Gb/s per port, 7.2Tb/s throughput, 7.02 Billion messages/sec
§ Adaptive Routing, Congestion control, support for multiple topologies
World’s First Smart Switch
Build for Scalable Compute and Storage Infrastructures
10X Higher Performance with The New Switch SHArP Technology
© 2016 Mellanox Technologies 8
SHArP (Scalable Hierarchical Aggregation Protocol) Technology
Delivering 10X Performance Improvement
for MPI and SHMEM/PAGS Communications
Switch-IB 2 Enables the Switch Network to
Operate as a Co-Processor
SHArP Enables Switch-IB 2 to Manage and
Execute MPI Operations in the Network
© 2016 Mellanox Technologies 9
SHArP Performance Advantage
§  MiniFE is a Finite Element mini-application
•  Implements kernels that represent
implicit finite-element applications
10X to 25X Performance Improvement
AllRedcue MPI Collective
© 2016 Mellanox Technologies 10
The Intelligence is Moving to the Interconnect
Communication Frameworks (MPI, SHMEM/PGAS)
The Only Approach to Deliver 10X Performance Improvements
Applications Transport
RDMA
SR-IOV
Collectives
Peer-Direct
GPUDirect
More…
MPI / SHMEM Offloads
Q1’16
Q3’16
© 2016 Mellanox Technologies 11
Multi-Host Socket DirectTM – Low Latency Socket Communication
§ Each CPU with direct network access
§  QPI avoidance for I/O – improve performance
§  Enables GPU / peer direct on both sockets
§ Solution is transparent to software
CPU CPUCPU CPU
QPI
Multi-Host Socket Direct Performance
50% Lower CPU Utilization
20% lower Latency
Multi Host Evaluation Kit
Lower Application Latency, Free-up CPU
© 2016 Mellanox Technologies 12
Introducing ConnectX-4 Lx Programmable Adapter
Scalable, Efficient, High-Performance and Flexible Solution
Security
Cloud/Virtualization
Storage
High Performance Computing
Precision Time Synchronization
Networking + FPGA
Mellanox Acceleration Engines
and FGPA Programmability
On One Adapter
© 2016 Mellanox Technologies 13
Mellanox InfiniBand Proven and Most Scalable HPC Interconnect
“Summit” System “Sierra” System
Paving the Road to Exascale
© 2016 Mellanox Technologies 14
NCAR-Wyoming Supercomputing Center (NWSC) – “Cheyenne”
§ Cheyenne supercomputer system
§ 5.34-petaflop SGI ICE XA Cluster
§ Intel “Broadwell” processors
§ More than 4K compute nodes
§ Mellanox EDR InfiniBand interconnect
§ Mellanox Unified Fabric Manager
§ Partial 9D Enhanced Hypercube interconnect topology
§ DDN SFA14KX systems
§ 20 petabytes of usable file system space
§ IBM GPFS (General Parallel File System)
© 2016 Mellanox Technologies 15
High-Performance Designed 100Gb/s Interconnect Solutions
Transceivers
Active Optical and Copper Cables
(10 / 25 / 40 / 50 / 56 / 100Gb/s)
VCSELs, Silicon Photonics and Copper
36 EDR (100Gb/s) Ports, <90ns Latency
Throughput of 7.2Tb/s
7.02 Billion msg/sec (195M msg/sec/port)
100Gb/s Adapter, 0.7us latency
150 million messages per second
(10 / 25 / 40 / 50 / 56 / 100Gb/s)
32 100GbE Ports, 64 25/50GbE Ports
(10 / 25 / 40 / 50 / 100GbE)
Throughput of 6.4Tb/s
© 2016 Mellanox Technologies 16
Leading Supplier of End-to-End Interconnect Solutions
StoreAnalyze
Enabling the Use of Data
SoftwareICs Switches/GatewaysAdapter Cards Cables/Modules
Comprehensive End-to-End InfiniBand and Ethernet Portfolio (VPI)
Metro / WANNPU & Multicore
NPS
TILE
© 2016 Mellanox Technologies 17
The Performance Advantage of EDR 100G InfiniBand (28-80%)
28%
© 2016 Mellanox Technologies 18
End-to-End Interconnect Solutions for All Platforms
Highest Performance and Scalability for
X86, Power, GPU, ARM and FPGA-based Compute and Storage Platforms
10, 20, 25, 40, 50, 56 and 100Gb/s Speeds
X86
Open
POWER
GPU ARM FPGA
Smart Interconnect to Unleash The Power of All Compute Architectures
© 2016 Mellanox Technologies 19
Technology Roadmap – One-Generation Lead over the Competition
2000 202020102005
20G 40G 56G 100G
“Roadrunner”
Mellanox Connected
1st3rd
TOP500 2003
Virginia Tech (Apple)
2015
200G
Terascale Petascale Exascale
Mellanox 400G
© 2016 Mellanox Technologies 20
§ Transparent InfiniBand integration into OpenStack
•  Since Havana
§ RDMA directly from VM - SRIOV
§ MAC to GUID mapping
§ VLAN to pkey mapping
§ InfiniBand SDN network
§ Ideal fit for High Performance Computing Clouds
OpenStack Over InfiniBand – Extreme Performance in the Cloud
InfiniBand Enables The Highest Performance and Efficiency
© 2016 Mellanox Technologies 21
§ Mellanox End to End
•  Mellanox ConnectX-4 NIC family, Switch-IB/Spectrum switches and 25/100Gb/s cables
§ Bring the astonishing 100Gb/s speeds to the cloud with minimal CPU utilization
•  Both VMs and Hypervisors
•  Accelerations are critical to reach line rate
-  SR-IOV, RDMA, etc.
25, 50 And 100Gb/s Clouds Are Here!
92.412 Gb/s
0.71%
© 2016 Mellanox Technologies 22
The Next Generation HPC Software Framework
To Meet the Needs of Future Systems / Applications
Unified Communication – X Framework (UCX)
© 2016 Mellanox Technologies 23
Exascale Co-Design Collaboration
Collaborative Effort
Industry, National Laboratories and Academia
The Next Generation
HPC Software Framework
© 2016 Mellanox Technologies 24
A Collaboration Effort
§ Mellanox co-designs network interface and contributes MXM technology
•  Infrastructure, transport, shared memory, protocols, integration with OpenMPI/SHMEM, MPICH
§ ORNL co-designs network interface and contributes UCCS project
•  InfiniBand optimizations, Cray devices, shared memory
§ NVIDIA co-designs high-quality support for GPU devices
•  GPUDirect, GDR copy, etc.
§ IBM co-designs network interface and contributes ideas and concepts from PAMI
§ UH/UTK focus on integration with their research platforms
© 2016 Mellanox Technologies 25
Mellanox HPC-X™ Scalable HPC Software Toolkit
§ Complete MPI, PGAS OpenSHMEM and UPC package
§ Maximize application performance
§ For commercial and open source applications
§ Based on UCX (Unified Communication – X Framework)
© 2016 Mellanox Technologies 26
Mellanox Delivers Highest MPI (HPC-X) Performance
Enabling Highest Applications Scalability and Performance
Mellanox ConnectX-4 Collectives Offload
© 2016 Mellanox Technologies 27
Mellanox Delivers Highest Applications Performance (HPC-X)
§ Quantum Espresso application
		 Intel	MPI	
Bull	MPI		
(HPC-X)	
Quantum	
Espresso	
Test	Case	 #	nodes	 <me	(s)	 <me	(s)	 Gain	
A	 43	 584	 368	 37%	
B	 196	 2592	 998	 61%	
Enabling Highest Applications Scalability and Performance
© 2016 Mellanox Technologies 28
Maximize Performance via Accelerator and GPU Offloads
GPUDirect RDMA Technology
© 2016 Mellanox Technologies 29
GPUs are Everywhere!
GPUDirect RDMA / Sync
CPU
GPUChip
set
GPU
Memory
System
Memory
1
GPU
© 2016 Mellanox Technologies 30
§ Eliminates CPU bandwidth and latency bottlenecks
§ Uses remote direct memory access (RDMA) transfers between GPUs
§ Resulting in significantly improved MPI efficiency between GPUs in remote nodes
§ Based on PCIe PeerDirect technology
GPUDirect™ RDMA (GPUDirect 3.0)
With GPUDirect™ RDMA
Using PeerDirect™
© 2016 Mellanox Technologies 31
Mellanox GPUDirect RDMA Performance Advantage
§ HOOMD-blue is a general-purpose Molecular Dynamics simulation code accelerated on GPUs
§ GPUDirect RDMA allows direct peer to peer GPU communications over InfiniBand
•  Unlocks performance between GPU and InfiniBand
•  This provides a significant decrease in GPU-GPU communication latency
•  Provides complete CPU offload from all GPU communications across the network
102%
2X Application
Performance!
© 2016 Mellanox Technologies 32
GPUDirect Sync (GPUDirect 4.0)
§ GPUDirect RDMA (3.0) – direct data path between the GPU and Mellanox interconnect
•  Control path still uses the CPU
-  CPU prepares and queues communication tasks on GPU
-  GPU triggers communication on HCA
-  Mellanox HCA directly accesses GPU memory
§ GPUDirect Sync (GPUDirect 4.0)
•  Both data path and control path go directly
between the GPU and the Mellanox interconnect
0
10
20
30
40
50
60
70
80
2 4
Averagetimeperiteration(us)
Number of nodes/GPUs
2D stencil benchmark
RDMA only RDMA+PeerSync
27% faster 23% faster
Maximum Performance
For GPU Clusters
© 2016 Mellanox Technologies 33
Remote GPU Access through rCUDA
GPU servers GPU as a Service
rCUDA daemon
Network Interface
CUDA
Driver + runtime
Network Interface
rCUDA library
Application
Client Side Server Side
Application
CUDA
Driver + runtime
CUDA Application
rCUDA provides remote access from
every node to any GPU in the system
CPU
VGPU
CPU
VGPU
CPU
VGPU
GPUGPUGPUGPUGPUGPUGPUGPUGPUGPUGPU
© 2016 Mellanox Technologies 34
Interconnect Architecture Comparison
Offload versus Onload (Non-Offload)
© 2016 Mellanox Technologies 35
Offload versus Onload (Non-Offload)
§ Two interconnect architectures exist – Offload-based and Onload-based
§ Offload Architecture
•  The Interconnect manages and executes all network operations
•  The interconnect is capable of including application acceleration engines
•  Offloads the CPU and therefore free CPU cycles to be used by the applications
•  Development requires large R&D investment
•  Higher data center ROI
§ Onload architecture
•  A CPU-centric approach – everything must be executed on and by the CPU
•  The CPU is responsible for all network functions, the interconnect only pushes the data into the wire
•  Cannot support acceleration engines, no support for RDMA, and network transport is done by the CPU
•  Onload the CPU and reduces the CPU cycles available for the applications
•  Does not require R&D investments or interconnect expertise
© 2016 Mellanox Technologies 36
Sandia National Laboratory Paper – Offloading versus Onloading
© 2016 Mellanox Technologies 37
Interconnect Throughput – Offload versus Onload
The Offloading Advantage!Network Performance Dramatically
Depends on CPU Frequency!
Data Throughput:
20% Higher at common Xeon Frequency
250% Higher at common Xeon Phi Frequency
Common Xeon Frequency 2.6GHz
Common Xeon Phi Frequency ~1Ghz
© 2016 Mellanox Technologies 38
Only Offload Architecture Can Enable Co-Processors
Offloading (Highest Performance for all Frequencies)
Onloading (performance loss with lower CPU frequency)
Common Xeon Frequency
Common Xeon Phi Frequency
Onloading Technology Not Suitable for Co-Processors!
© 2016 Mellanox Technologies 39
Switch LatencyMessage Rate
Mellanox InfiniBand Leadership Over Omni-Path
20%
Lower
44%
Higher
Power Consumption
Per Switch Port
Scalability
CPU efficiency
25%
Lower
2X
Higher
100
Gb/s
Link Speed
200
Gb/s
Link Speed
2014
Gain Competitive Advantage Today
Protect Your Future
2017
Smart Network For Smart Systems
RDMA, Acceleration Engines, Programmability
Higher Performance
Unlimited Scalability
Higher Resiliency
Proven!
Thank You

Weitere ähnliche Inhalte

Was ist angesagt?

Ibm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bkIbm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bk
IBM Switzerland
 

Was ist angesagt? (20)

IBM Power Systems E850C and S824
IBM Power Systems E850C and S824IBM Power Systems E850C and S824
IBM Power Systems E850C and S824
 
Huawei Powers Efficient and Scalable HPC
Huawei Powers Efficient and Scalable HPCHuawei Powers Efficient and Scalable HPC
Huawei Powers Efficient and Scalable HPC
 
Understanding the IBM Power Systems Advantage
Understanding the IBM Power Systems AdvantageUnderstanding the IBM Power Systems Advantage
Understanding the IBM Power Systems Advantage
 
IBM Power for Life Sciences
IBM Power for Life SciencesIBM Power for Life Sciences
IBM Power for Life Sciences
 
EDBT 2013 - Near Realtime Analytics with IBM DB2 Analytics Accelerator
EDBT 2013 - Near Realtime Analytics with IBM DB2 Analytics AcceleratorEDBT 2013 - Near Realtime Analytics with IBM DB2 Analytics Accelerator
EDBT 2013 - Near Realtime Analytics with IBM DB2 Analytics Accelerator
 
OpenPOWER Webinar
OpenPOWER Webinar OpenPOWER Webinar
OpenPOWER Webinar
 
IBM World of Watson 2016 - DB2 Analytics Accelerator on Cloud
IBM World of Watson 2016 - DB2 Analytics Accelerator on CloudIBM World of Watson 2016 - DB2 Analytics Accelerator on Cloud
IBM World of Watson 2016 - DB2 Analytics Accelerator on Cloud
 
DDN: Protecting Your Data, Protecting Your Hardware
DDN: Protecting Your Data, Protecting Your HardwareDDN: Protecting Your Data, Protecting Your Hardware
DDN: Protecting Your Data, Protecting Your Hardware
 
2017-02-21 AFCEA West Building Continuous Integration & Deployment (CI/CD) Pi...
2017-02-21 AFCEA West Building Continuous Integration & Deployment (CI/CD) Pi...2017-02-21 AFCEA West Building Continuous Integration & Deployment (CI/CD) Pi...
2017-02-21 AFCEA West Building Continuous Integration & Deployment (CI/CD) Pi...
 
IBM Power Systems Open Innovation
IBM Power Systems Open InnovationIBM Power Systems Open Innovation
IBM Power Systems Open Innovation
 
Ac922 watson 180208 v1
Ac922 watson 180208 v1Ac922 watson 180208 v1
Ac922 watson 180208 v1
 
Ibm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bkIbm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bk
 
Covid-19 Response Capability with Power Systems
Covid-19 Response Capability with Power SystemsCovid-19 Response Capability with Power Systems
Covid-19 Response Capability with Power Systems
 
GEN-Z: An Overview and Use Cases
GEN-Z: An Overview and Use CasesGEN-Z: An Overview and Use Cases
GEN-Z: An Overview and Use Cases
 
Accelerate Innovation in Your Business with HP
Accelerate Innovation in Your Business with HPAccelerate Innovation in Your Business with HP
Accelerate Innovation in Your Business with HP
 
HPC DAY 2017 | HPE Storage and Data Management for Big Data
HPC DAY 2017 | HPE Storage and Data Management for Big DataHPC DAY 2017 | HPE Storage and Data Management for Big Data
HPC DAY 2017 | HPE Storage and Data Management for Big Data
 
Deeplearningusingcloudpakfordata
DeeplearningusingcloudpakfordataDeeplearningusingcloudpakfordata
Deeplearningusingcloudpakfordata
 
Overcoming write availability challenges of PostgreSQL
Overcoming write availability challenges of PostgreSQLOvercoming write availability challenges of PostgreSQL
Overcoming write availability challenges of PostgreSQL
 
IBM Power Systems Update 2Q17
IBM Power Systems Update 2Q17IBM Power Systems Update 2Q17
IBM Power Systems Update 2Q17
 
How to Design for Database High Availability
How to Design for Database High AvailabilityHow to Design for Database High Availability
How to Design for Database High Availability
 

Andere mochten auch

Andere mochten auch (18)

The Quantum Effect: HPC without FLOPS
The Quantum Effect: HPC without FLOPSThe Quantum Effect: HPC without FLOPS
The Quantum Effect: HPC without FLOPS
 
The State of Linux Containers
The State of Linux ContainersThe State of Linux Containers
The State of Linux Containers
 
OpenPOWER Update
OpenPOWER UpdateOpenPOWER Update
OpenPOWER Update
 
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_finalPresentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
 
Georgia Azure Event - Scalable cloud games using Microsoft Azure
Georgia Azure Event - Scalable cloud games using Microsoft AzureGeorgia Azure Event - Scalable cloud games using Microsoft Azure
Georgia Azure Event - Scalable cloud games using Microsoft Azure
 
Accelerating Business Intelligence Solutions with Microsoft Azure pass
Accelerating Business Intelligence Solutions with Microsoft Azure   passAccelerating Business Intelligence Solutions with Microsoft Azure   pass
Accelerating Business Intelligence Solutions with Microsoft Azure pass
 
OpenPOWER Roadmap Toward CORAL
OpenPOWER Roadmap Toward CORALOpenPOWER Roadmap Toward CORAL
OpenPOWER Roadmap Toward CORAL
 
Bitcoin explained
Bitcoin explainedBitcoin explained
Bitcoin explained
 
Blockchain
BlockchainBlockchain
Blockchain
 
Oracle Solaris Software Integration
Oracle Solaris Software IntegrationOracle Solaris Software Integration
Oracle Solaris Software Integration
 
Open Innovation with Power Systems
Open Innovation with Power Systems Open Innovation with Power Systems
Open Innovation with Power Systems
 
IBM Power8 announce
IBM Power8 announceIBM Power8 announce
IBM Power8 announce
 
Puppet + Windows Nano Server
Puppet + Windows Nano ServerPuppet + Windows Nano Server
Puppet + Windows Nano Server
 
Expert summit SQL Server 2016
Expert summit   SQL Server 2016Expert summit   SQL Server 2016
Expert summit SQL Server 2016
 
Oracle Solaris Build and Run Applications Better on 11.3
Oracle Solaris  Build and Run Applications Better on 11.3Oracle Solaris  Build and Run Applications Better on 11.3
Oracle Solaris Build and Run Applications Better on 11.3
 
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
 
Oracle Solaris Secure Cloud Infrastructure
Oracle Solaris Secure Cloud InfrastructureOracle Solaris Secure Cloud Infrastructure
Oracle Solaris Secure Cloud Infrastructure
 
20150716 introduction to apache spark v3
20150716 introduction to apache spark v3 20150716 introduction to apache spark v3
20150716 introduction to apache spark v3
 

Ähnlich wie Co-Design Architecture for Exascale

Advanced Networking: The Critical Path for HPC, Cloud, Machine Learning and more
Advanced Networking: The Critical Path for HPC, Cloud, Machine Learning and moreAdvanced Networking: The Critical Path for HPC, Cloud, Machine Learning and more
Advanced Networking: The Critical Path for HPC, Cloud, Machine Learning and more
inside-BigData.com
 

Ähnlich wie Co-Design Architecture for Exascale (20)

Announcing the Mellanox ConnectX-5 100G InfiniBand Adapter
Announcing the Mellanox ConnectX-5 100G InfiniBand AdapterAnnouncing the Mellanox ConnectX-5 100G InfiniBand Adapter
Announcing the Mellanox ConnectX-5 100G InfiniBand Adapter
 
Interconnect your future
Interconnect your futureInterconnect your future
Interconnect your future
 
Mellanox Announcements at SC15
Mellanox Announcements at SC15Mellanox Announcements at SC15
Mellanox Announcements at SC15
 
InfiniBand In-Network Computing Technology and Roadmap
InfiniBand In-Network Computing Technology and RoadmapInfiniBand In-Network Computing Technology and Roadmap
InfiniBand In-Network Computing Technology and Roadmap
 
Advancing Applications Performance With InfiniBand
Advancing Applications Performance With InfiniBandAdvancing Applications Performance With InfiniBand
Advancing Applications Performance With InfiniBand
 
Mellanox Announces HDR 200 Gb/s InfiniBand Solutions
Mellanox Announces HDR 200 Gb/s InfiniBand SolutionsMellanox Announces HDR 200 Gb/s InfiniBand Solutions
Mellanox Announces HDR 200 Gb/s InfiniBand Solutions
 
Mellnox Interconnect presentation in OpenPOWER Brazil workshop
Mellnox Interconnect presentation in OpenPOWER Brazil workshopMellnox Interconnect presentation in OpenPOWER Brazil workshop
Mellnox Interconnect presentation in OpenPOWER Brazil workshop
 
Mellanox IBM
Mellanox IBMMellanox IBM
Mellanox IBM
 
Mellanox OpenPOWER features
Mellanox OpenPOWER featuresMellanox OpenPOWER features
Mellanox OpenPOWER features
 
Interconnect Your Future With Mellanox
Interconnect Your Future With MellanoxInterconnect Your Future With Mellanox
Interconnect Your Future With Mellanox
 
Open vSwitch Implementation Options
Open vSwitch Implementation Options Open vSwitch Implementation Options
Open vSwitch Implementation Options
 
Open Ethernet: an open-source approach to modern network design
Open Ethernet: an open-source approach to modern network designOpen Ethernet: an open-source approach to modern network design
Open Ethernet: an open-source approach to modern network design
 
InfiniBand In-Network Computing Technology and Roadmap
InfiniBand In-Network Computing Technology and RoadmapInfiniBand In-Network Computing Technology and Roadmap
InfiniBand In-Network Computing Technology and Roadmap
 
Advanced Networking: The Critical Path for HPC, Cloud, Machine Learning and more
Advanced Networking: The Critical Path for HPC, Cloud, Machine Learning and moreAdvanced Networking: The Critical Path for HPC, Cloud, Machine Learning and more
Advanced Networking: The Critical Path for HPC, Cloud, Machine Learning and more
 
OVNC 2015-Open Ethernet과 SDN을 통한 Mellanox의 차세대 네트워크 혁신 방안
OVNC 2015-Open Ethernet과 SDN을 통한 Mellanox의 차세대 네트워크 혁신 방안OVNC 2015-Open Ethernet과 SDN을 통한 Mellanox의 차세대 네트워크 혁신 방안
OVNC 2015-Open Ethernet과 SDN을 통한 Mellanox의 차세대 네트워크 혁신 방안
 
22 - IDNOG03 - Christopher Lim (Mellanox) - Efficient Virtual Network for Ser...
22 - IDNOG03 - Christopher Lim (Mellanox) - Efficient Virtual Network for Ser...22 - IDNOG03 - Christopher Lim (Mellanox) - Efficient Virtual Network for Ser...
22 - IDNOG03 - Christopher Lim (Mellanox) - Efficient Virtual Network for Ser...
 
Mellanox Approach to NFV & SDN
Mellanox Approach to NFV & SDNMellanox Approach to NFV & SDN
Mellanox Approach to NFV & SDN
 
Ceph Day London 2014 - Ceph Over High-Performance Networks
Ceph Day London 2014 - Ceph Over High-Performance Networks Ceph Day London 2014 - Ceph Over High-Performance Networks
Ceph Day London 2014 - Ceph Over High-Performance Networks
 
Big Data Benchmarking with RDMA solutions
Big Data Benchmarking with RDMA solutions Big Data Benchmarking with RDMA solutions
Big Data Benchmarking with RDMA solutions
 
ProductX2014 Tom thirer. mellanox
ProductX2014 Tom thirer. mellanoxProductX2014 Tom thirer. mellanox
ProductX2014 Tom thirer. mellanox
 

Mehr von inside-BigData.com

Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
inside-BigData.com
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
inside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
inside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
inside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
inside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
inside-BigData.com
 

Mehr von inside-BigData.com (20)

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 

Kürzlich hochgeladen

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Kürzlich hochgeladen (20)

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 

Co-Design Architecture for Exascale

  • 1. Dror Goldenberg, March 2016, HPCAC Swiss Co-Design Architecture Emergence of New Co-Processors
  • 2. © 2016 Mellanox Technologies 2 Co-Design Architecture to Enable Exascale Performance CPU-Centric Co-Design Limited to Main CPU Usage Results in Performance Limitation Creating Synergies Enables Higher Performance and Scale Software Software In-CPU Computing In-Network Computing In-Storage Computing
  • 3. © 2016 Mellanox Technologies 3 The Intelligence is Moving to the Interconnect CPU Interconnect Past Future
  • 4. © 2016 Mellanox Technologies 4 Intelligent Interconnect Delivers Higher Datacenter ROI Users NETWORK COMPUTING NETWORK Users Intelligence Network Offloads Computing for applications Smart Network Increase Datacenter Value Network functions On CPU COMPUTING
  • 5. © 2016 Mellanox Technologies 5 Breaking the Application Latency Wall § Today: Network device latencies are on the order of 100 nanoseconds § Challenge: Enabling the next order of magnitude improvement in application performance § Solution: Creating synergies between software and hardware – intelligent interconnect Intelligent Interconnect Paves the Road to Exascale Performance 10 years ago ~10 microsecond ~100 microsecond NetworkCommunication Framework Today ~10 microsecond Communication Framework ~0.1 microsecond Network ~1 microsecond Communication Framework Future ~0.05 microsecond Co-Design Network
  • 6. © 2016 Mellanox Technologies 6 Introducing Switch-IB 2 World’s First Smart Switch
  • 7. © 2016 Mellanox Technologies 7 Introducing Switch-IB 2 World’s First Smart Switch § The world fastest switch with <90 nanosecond latency § 36-ports, 100Gb/s per port, 7.2Tb/s throughput, 7.02 Billion messages/sec § Adaptive Routing, Congestion control, support for multiple topologies World’s First Smart Switch Build for Scalable Compute and Storage Infrastructures 10X Higher Performance with The New Switch SHArP Technology
  • 8. © 2016 Mellanox Technologies 8 SHArP (Scalable Hierarchical Aggregation Protocol) Technology Delivering 10X Performance Improvement for MPI and SHMEM/PAGS Communications Switch-IB 2 Enables the Switch Network to Operate as a Co-Processor SHArP Enables Switch-IB 2 to Manage and Execute MPI Operations in the Network
  • 9. © 2016 Mellanox Technologies 9 SHArP Performance Advantage §  MiniFE is a Finite Element mini-application •  Implements kernels that represent implicit finite-element applications 10X to 25X Performance Improvement AllRedcue MPI Collective
  • 10. © 2016 Mellanox Technologies 10 The Intelligence is Moving to the Interconnect Communication Frameworks (MPI, SHMEM/PGAS) The Only Approach to Deliver 10X Performance Improvements Applications Transport RDMA SR-IOV Collectives Peer-Direct GPUDirect More… MPI / SHMEM Offloads Q1’16 Q3’16
  • 11. © 2016 Mellanox Technologies 11 Multi-Host Socket DirectTM – Low Latency Socket Communication § Each CPU with direct network access §  QPI avoidance for I/O – improve performance §  Enables GPU / peer direct on both sockets § Solution is transparent to software CPU CPUCPU CPU QPI Multi-Host Socket Direct Performance 50% Lower CPU Utilization 20% lower Latency Multi Host Evaluation Kit Lower Application Latency, Free-up CPU
  • 12. © 2016 Mellanox Technologies 12 Introducing ConnectX-4 Lx Programmable Adapter Scalable, Efficient, High-Performance and Flexible Solution Security Cloud/Virtualization Storage High Performance Computing Precision Time Synchronization Networking + FPGA Mellanox Acceleration Engines and FGPA Programmability On One Adapter
  • 13. © 2016 Mellanox Technologies 13 Mellanox InfiniBand Proven and Most Scalable HPC Interconnect “Summit” System “Sierra” System Paving the Road to Exascale
  • 14. © 2016 Mellanox Technologies 14 NCAR-Wyoming Supercomputing Center (NWSC) – “Cheyenne” § Cheyenne supercomputer system § 5.34-petaflop SGI ICE XA Cluster § Intel “Broadwell” processors § More than 4K compute nodes § Mellanox EDR InfiniBand interconnect § Mellanox Unified Fabric Manager § Partial 9D Enhanced Hypercube interconnect topology § DDN SFA14KX systems § 20 petabytes of usable file system space § IBM GPFS (General Parallel File System)
  • 15. © 2016 Mellanox Technologies 15 High-Performance Designed 100Gb/s Interconnect Solutions Transceivers Active Optical and Copper Cables (10 / 25 / 40 / 50 / 56 / 100Gb/s) VCSELs, Silicon Photonics and Copper 36 EDR (100Gb/s) Ports, <90ns Latency Throughput of 7.2Tb/s 7.02 Billion msg/sec (195M msg/sec/port) 100Gb/s Adapter, 0.7us latency 150 million messages per second (10 / 25 / 40 / 50 / 56 / 100Gb/s) 32 100GbE Ports, 64 25/50GbE Ports (10 / 25 / 40 / 50 / 100GbE) Throughput of 6.4Tb/s
  • 16. © 2016 Mellanox Technologies 16 Leading Supplier of End-to-End Interconnect Solutions StoreAnalyze Enabling the Use of Data SoftwareICs Switches/GatewaysAdapter Cards Cables/Modules Comprehensive End-to-End InfiniBand and Ethernet Portfolio (VPI) Metro / WANNPU & Multicore NPS TILE
  • 17. © 2016 Mellanox Technologies 17 The Performance Advantage of EDR 100G InfiniBand (28-80%) 28%
  • 18. © 2016 Mellanox Technologies 18 End-to-End Interconnect Solutions for All Platforms Highest Performance and Scalability for X86, Power, GPU, ARM and FPGA-based Compute and Storage Platforms 10, 20, 25, 40, 50, 56 and 100Gb/s Speeds X86 Open POWER GPU ARM FPGA Smart Interconnect to Unleash The Power of All Compute Architectures
  • 19. © 2016 Mellanox Technologies 19 Technology Roadmap – One-Generation Lead over the Competition 2000 202020102005 20G 40G 56G 100G “Roadrunner” Mellanox Connected 1st3rd TOP500 2003 Virginia Tech (Apple) 2015 200G Terascale Petascale Exascale Mellanox 400G
  • 20. © 2016 Mellanox Technologies 20 § Transparent InfiniBand integration into OpenStack •  Since Havana § RDMA directly from VM - SRIOV § MAC to GUID mapping § VLAN to pkey mapping § InfiniBand SDN network § Ideal fit for High Performance Computing Clouds OpenStack Over InfiniBand – Extreme Performance in the Cloud InfiniBand Enables The Highest Performance and Efficiency
  • 21. © 2016 Mellanox Technologies 21 § Mellanox End to End •  Mellanox ConnectX-4 NIC family, Switch-IB/Spectrum switches and 25/100Gb/s cables § Bring the astonishing 100Gb/s speeds to the cloud with minimal CPU utilization •  Both VMs and Hypervisors •  Accelerations are critical to reach line rate -  SR-IOV, RDMA, etc. 25, 50 And 100Gb/s Clouds Are Here! 92.412 Gb/s 0.71%
  • 22. © 2016 Mellanox Technologies 22 The Next Generation HPC Software Framework To Meet the Needs of Future Systems / Applications Unified Communication – X Framework (UCX)
  • 23. © 2016 Mellanox Technologies 23 Exascale Co-Design Collaboration Collaborative Effort Industry, National Laboratories and Academia The Next Generation HPC Software Framework
  • 24. © 2016 Mellanox Technologies 24 A Collaboration Effort § Mellanox co-designs network interface and contributes MXM technology •  Infrastructure, transport, shared memory, protocols, integration with OpenMPI/SHMEM, MPICH § ORNL co-designs network interface and contributes UCCS project •  InfiniBand optimizations, Cray devices, shared memory § NVIDIA co-designs high-quality support for GPU devices •  GPUDirect, GDR copy, etc. § IBM co-designs network interface and contributes ideas and concepts from PAMI § UH/UTK focus on integration with their research platforms
  • 25. © 2016 Mellanox Technologies 25 Mellanox HPC-X™ Scalable HPC Software Toolkit § Complete MPI, PGAS OpenSHMEM and UPC package § Maximize application performance § For commercial and open source applications § Based on UCX (Unified Communication – X Framework)
  • 26. © 2016 Mellanox Technologies 26 Mellanox Delivers Highest MPI (HPC-X) Performance Enabling Highest Applications Scalability and Performance Mellanox ConnectX-4 Collectives Offload
  • 27. © 2016 Mellanox Technologies 27 Mellanox Delivers Highest Applications Performance (HPC-X) § Quantum Espresso application Intel MPI Bull MPI (HPC-X) Quantum Espresso Test Case # nodes <me (s) <me (s) Gain A 43 584 368 37% B 196 2592 998 61% Enabling Highest Applications Scalability and Performance
  • 28. © 2016 Mellanox Technologies 28 Maximize Performance via Accelerator and GPU Offloads GPUDirect RDMA Technology
  • 29. © 2016 Mellanox Technologies 29 GPUs are Everywhere! GPUDirect RDMA / Sync CPU GPUChip set GPU Memory System Memory 1 GPU
  • 30. © 2016 Mellanox Technologies 30 § Eliminates CPU bandwidth and latency bottlenecks § Uses remote direct memory access (RDMA) transfers between GPUs § Resulting in significantly improved MPI efficiency between GPUs in remote nodes § Based on PCIe PeerDirect technology GPUDirect™ RDMA (GPUDirect 3.0) With GPUDirect™ RDMA Using PeerDirect™
  • 31. © 2016 Mellanox Technologies 31 Mellanox GPUDirect RDMA Performance Advantage § HOOMD-blue is a general-purpose Molecular Dynamics simulation code accelerated on GPUs § GPUDirect RDMA allows direct peer to peer GPU communications over InfiniBand •  Unlocks performance between GPU and InfiniBand •  This provides a significant decrease in GPU-GPU communication latency •  Provides complete CPU offload from all GPU communications across the network 102% 2X Application Performance!
  • 32. © 2016 Mellanox Technologies 32 GPUDirect Sync (GPUDirect 4.0) § GPUDirect RDMA (3.0) – direct data path between the GPU and Mellanox interconnect •  Control path still uses the CPU -  CPU prepares and queues communication tasks on GPU -  GPU triggers communication on HCA -  Mellanox HCA directly accesses GPU memory § GPUDirect Sync (GPUDirect 4.0) •  Both data path and control path go directly between the GPU and the Mellanox interconnect 0 10 20 30 40 50 60 70 80 2 4 Averagetimeperiteration(us) Number of nodes/GPUs 2D stencil benchmark RDMA only RDMA+PeerSync 27% faster 23% faster Maximum Performance For GPU Clusters
  • 33. © 2016 Mellanox Technologies 33 Remote GPU Access through rCUDA GPU servers GPU as a Service rCUDA daemon Network Interface CUDA Driver + runtime Network Interface rCUDA library Application Client Side Server Side Application CUDA Driver + runtime CUDA Application rCUDA provides remote access from every node to any GPU in the system CPU VGPU CPU VGPU CPU VGPU GPUGPUGPUGPUGPUGPUGPUGPUGPUGPUGPU
  • 34. © 2016 Mellanox Technologies 34 Interconnect Architecture Comparison Offload versus Onload (Non-Offload)
  • 35. © 2016 Mellanox Technologies 35 Offload versus Onload (Non-Offload) § Two interconnect architectures exist – Offload-based and Onload-based § Offload Architecture •  The Interconnect manages and executes all network operations •  The interconnect is capable of including application acceleration engines •  Offloads the CPU and therefore free CPU cycles to be used by the applications •  Development requires large R&D investment •  Higher data center ROI § Onload architecture •  A CPU-centric approach – everything must be executed on and by the CPU •  The CPU is responsible for all network functions, the interconnect only pushes the data into the wire •  Cannot support acceleration engines, no support for RDMA, and network transport is done by the CPU •  Onload the CPU and reduces the CPU cycles available for the applications •  Does not require R&D investments or interconnect expertise
  • 36. © 2016 Mellanox Technologies 36 Sandia National Laboratory Paper – Offloading versus Onloading
  • 37. © 2016 Mellanox Technologies 37 Interconnect Throughput – Offload versus Onload The Offloading Advantage!Network Performance Dramatically Depends on CPU Frequency! Data Throughput: 20% Higher at common Xeon Frequency 250% Higher at common Xeon Phi Frequency Common Xeon Frequency 2.6GHz Common Xeon Phi Frequency ~1Ghz
  • 38. © 2016 Mellanox Technologies 38 Only Offload Architecture Can Enable Co-Processors Offloading (Highest Performance for all Frequencies) Onloading (performance loss with lower CPU frequency) Common Xeon Frequency Common Xeon Phi Frequency Onloading Technology Not Suitable for Co-Processors!
  • 39. © 2016 Mellanox Technologies 39 Switch LatencyMessage Rate Mellanox InfiniBand Leadership Over Omni-Path 20% Lower 44% Higher Power Consumption Per Switch Port Scalability CPU efficiency 25% Lower 2X Higher 100 Gb/s Link Speed 200 Gb/s Link Speed 2014 Gain Competitive Advantage Today Protect Your Future 2017 Smart Network For Smart Systems RDMA, Acceleration Engines, Programmability Higher Performance Unlimited Scalability Higher Resiliency Proven!