SlideShare ist ein Scribd-Unternehmen logo
1 von 37
Downloaden Sie, um offline zu lesen
IBM Power System AC922 : The
Brain Behind the Supercomputer
—
Pidad D’Souza(pidsouza@in.ibm.com)
Aditya Nitsure(anitsure@in.ibm.com)
Power System Performance, ISDL, IBM, Bengaluru
Agenda
● AC922 System Components
● AC922 Characteristics
● System Features
IBM Systems at Supercomputing 2019 / © 2019 IBM Corporation
The most powerful supercomputer on the planet
4 out of 11 Top500 are IBM Power9 Systems
3
▪ 4,608 IBM AC922 nodes
▪ 200 peta FLOPS
▪ 27,648 NVIDIA Tesla GPUs
▪ 25 gigabytes per second between nodes
▪ 13 MW Energy
Exascale Energy Budget: 20-40 megawatts
(MW)
Innovations in Hardware and Software
▪ Processor/Accelerators
▪ Memory
▪ Interconnect
▪ Spectrum MPI, Math Libraries
4 out of Top 10 Green500 systems are IBM
Power9 systems
AiMOS – Green500 No. 3 with 15.72 GFlops/Watt
Heterogenous Systems
4
+
Rest of Sequential
CPU Code
Compute-Intensive Code
Application Code
GPU Acceleration
5
CPU
– Large and broad instruction set to perform complex operations
GPU
– High throughput – Massive parallelization through large number of cores
– Specialized for SIMD/SIMT
Heterogenous Computing
Maximize
performance
and energy
efficiency
– NVLink 2.0 : High-Bandwidth Interconnect
o 150 bi-directional bandwidth (or 100 GB/s for 6 GPU
config) between CPU-GPU and GPU-GPU
– Coherent access to CPU memory
Summit and Sierra Supercomputer configurations
6
Nvidia V100
NVLink
150GB/s
DDR4
170GB/s
POWER9
PCIe4.0
CAPI 2.0
NVLink
150GB/s
NVLink
100GB/s
DDR4
170GB/s
POWER9
NVLink
100GB/s
Sierra
(4 GPU Half Node)
Summit
(6 GPU Half Node)
IB
PCIe4.0
CAPI 2.0
Coherent access to
system memory
Nvidia V100
• CPU and GPU co-operate in execution of
work
• GPU coherently access to CPU memory
Coherent access to
system memory
IB
7
– Delivers unprecedented performance for modern
HPC, analytics, and artificial intelligence (AI)
– Designed to fully exploit the capabilities of CPU
and GPU accelerators
– Eliminates I/O bottlenecks and allows sharing
memory across GPUs and CPUs
– Extraordinary POWER9 CPUs
– 2-6 NVIDIA® Tesla® V100 GPUs with NVLink
– Co-optimized hardware and software for deep
learning and AI
– Supports up to 5.6x more I/O bandwidth than
competitive servers
– Combines the cutting edge AI innovation Data
Scientists desire with the dependability IT
– Next Gen PCIe - PCIe Gen4 2x faster
IBM POWER9 AC922 Server
7
8
– Designed for AI Computing and HPC
– Second-Generation NVLink™
– HBM2 Memory: Faster, Higher Efficiency
– Enhanced Unified Memory and Address Translation
Services
– Maximum Performance and Maximum Efficiency Modes
– Number of SM/cores : 80/5120
– Double Precision Performance : 7.5 TFLOPS
– Single Precision Performance : 15 TFLOPS
– 125 Tensor TFLOPS
– GPU Memory : 16 or 32 GB
– Memory bandwidth : 900 GB/s
https://devblogs.nvidia.com/inside-volta
Nvidia Tesla V100 GPU
AC922 SYSTEM
CHARACTERISTICS
9
Boost application performance with sustained peak memory bandwidth of
~280GB/s
CPU STREAM Bandwidth
STREAM benchmark ( https://www.cs.virginia.edu/stream/) *not submitted
10
NVIDIA Volta 100 Compute – Single and Double Precision
–Applications to have more compute
power
–Shorten time to completion
–Accomplish more
simulation/experiment
–1.5x higher compute than NVIDIA
P100 GPUs
0
2
4
6
8
10
12
14
16
S822LC + P100 AC922 + V100
4.8
7.45
9.8
15.3
Compute-TFLOPS
NVIDIA V100 SGEMM and DGEMM
DGEMM SGEMM
1.5x higher
11
NVIDIA V100 GPU memory bandwidth (GPU STREAM)
0
100
200
300
400
500
600
700
800
900
S822LC + P100 AC922 + V100
512
840
Bandwidth-GB/s
840 GB/s
1.6x Higher
–1.6x Higher Bandwidth than NVIDIA
P100
–Speed up of memory intensive
applications
Theoretical
12
0
10
20
30
40
50
60
70
Xeon E5-
2640 V4 +
P100
S822LC +
P100
AC922 + 6
V100
AC922 + 4
V100
12
34.16
45.9
68
Bandwidth–GB/s
CPU to GPU NVLink Vs PCIe3 bandwidth
5.6x better
3.8x better
2x
–NVLink 2.0 is 5.6x better than PCIe3
–Remove CPU-GPU Data transfer
bottlenecks
2.8x better
1.34x
Note: NVIDIA bandwidth test used for measurement 13
NVLINK Bandwidth with varied data sizes
–Minimize communication latencies
–Unlock PCIe bottlenecks
–Transfer larger data at high speed
–Ideal for data size larger than GPU
memory
0
10000
20000
30000
40000
50000
60000
70000
80000
1
2
4
8
16
32
64
128
256
512
1024
2048
4096
8192
16384
32768
65536
131072
262144
524288
1048576
Bandwidth-MB/s
Data Size - KBytes
NVLink2.0 vs PCIe3 Host to Device
Bandwidth
2NVLinkPerGPU 3NVLinkPerGPU PCIe3
14
Workload Optimized
Frequency (WOF)
– Boost performance of less active workload
through higher frequency
– Lower the frequency to save power or boost other
cores
– Maximize performance through dynamically
adjusting processor frequency
– Governing factors
• Processor utilization, Number of active cores &
Environment condition
– Power Saver Modes
• Dynamic Performance Mode(DPM)
• Maximum Performance Mode(MPM)
15
IBM Systems at Supercomputing 2019 / © 2019 IBM Corporation
HPC Interconnect
–Multi-Host Adapter (Mellanox
ConnectX-5 EDR)
• Latency : sub-600 nanoseconds
• Bandwidth : 2 ports of 100Gb/s
• Message Rate : 200M messages/second
–Adapter Features
• Switch based collectives - SHARP
• Hardware Tag Matching
• User mode memory registration(UMR)
• GPU Direct RDMA
• Tunneled Atomics
P9
X-Bus
x8x8
IB - EDR
P9
16
The Design, Deployment, and Evaluation of the CORAL Pre-Exascale Systems, SC18.
Bi-section Bandwidth & All Reduce scaling on Summit
– Good scaling at large scale due to ~74% of
bisection bandwidth with adaptive routing
enabled
– SMPI supports HCOLL(FCA) & SHARP,
enables applications to run with best
collective performance
The Design, Deployment, and Evaluation of the CORAL Pre-Exascale Systems, SC18.
– Imporved application performance through
burst buffer
– Applications not bottlenecked on I/O
operations on Parallel file systems
Burst buffer performance on Summit
HPC APPLICATION
PERFORMANCE Memory
Capacity
IO
Compute
Interconnect
Memory
Bandwidth
Parameters impacting HPC
Application Performance
19
HPC APPLICATION
ACCELERATION
METHODOLOGIES
o Unified Memory
o Coherency
o ATS
o OpenMP
20
CUDA Programming
21
h_data = cudaMallocHost(size) // Allocate
memory on the host
d_data = cudaMalloc(size) // Allocate memory on
the GPU
init_dataCPU(h_data)
cudaMemcpy(h_data, d_data, size,
HostToDevice) // Move data to GPU
gpu_kernel<<<…>>> // GPU compute
cudaMemcpy(d_data, h_data, size,
DeviceToHost) // Move results back to CPU
cpu_processing(h_data)
21
Unified Memory Programming
22
– Single memory address space accessible to
both CPU & GPU
– Enables oversubscribing memory
• Computation of data size larger than GPU
memory
– System wide atomic memory operations
– Transparent Memory migration between CPU
and GPU depending on who accesses it
• Explicit migration through
cudaMemPrefetchAsyn()
– Allocating Unified memory
• Replace “malloc” & “new” with
“cudaMallocManaged”
GPU CPU
Unified Memory
22
Unified Memory Advises
23
– ReadMostly
• Data is mostly read, occasionally written
• Duplicate pages, writes possible but expensive
– PreferredLocation
• Specify preferred location for data
• “resist” migrations from the preferred location
– AccessedBy
• Establish mappings to avoid migrations and
access directly
char *data;
cudaMallocManaged(&data, size);
init_dataCPU(data, size);
cudaMemPrefetchAsync(data, size, gpuID);
cudaMemAdvise(data, size, …ReadMostly,
gpuID);
gpuKernel<<<… >>>(data, size);
// Transparent data migration to GPU
cudaDeviceSynchronize();
use_dataCPU(data, size);
//Data migrate back to CPU
23
*data = malloc(size);
gpu_kernel<<<…>>>(data);
data[1024];
gpu_kernel<<<…>>>(data);
extern float *data;
gpu_kernel<<<…>>>(data);
Hardware Coherency & ATS
24
– Hardware coherency
• CPU can directly access and cache GPU
memory
• Native atomics support
– Address Translation Services(ATS)
• Allows the GPU to access the CPU’s page
tables directly
• System allocator support – malloc, stack,
global, file system
Simplifiedprogramming with
ATS
24
CUDA Aware MPI
25
–Avoid staging of GPU buffers in
host memory
–Run applications efficiently
–IBM SpectrumMPI is CUDA-
Aware
25
Code without CUDA-Aware MPI (using GPU buffers)
//MPI Rank 0
CudaMemcpy(…, DeviceToHost)
MPI_Send()
//MPI Rank 1
MPI_Recv()
CudaMemcpy(…, HostToDevice)
Code with CUDA-Aware MPI (using GPU buffers)
//MPI Rank 0
CudaMemcpy(…, DeviceToHost)
MPI_Send()
//MPI Rank 1
MPI_Recv()
CudaMemcpy(…, HostToDevice)
https://devblogs.nvidia.com/introduction-cuda-aware-mpi/
GPU Direct RDMA
26
– Data exchange between GPU and other Peer
devices using PCIe standards
– Network devices directly access GPU memory
bypassing host
26
Monitoring and Profiling
tools
27
Monitoring and Profiling tools
Monitoring
➢ mpstat, vmstat – CPU and memory utilization
➢ numastat – numa memory statistics
➢ top/htop – real-time view of system usage
Profiling
➢ Perf record/report – CPU profiling
➢ nvprof – GPU profiling
CPU memory GPU
memory
numastat
nvidia-smi
Also check “nvidia-smi –query-gpu” more monitoring options
Monitoring and Profiling tools
nvprof
• The nvprof is command-line profiling tool which enables you to collect and view
profiling data
• Using nvprof one can collect –
• kernel execution time
• memory transfers
• memory set and CUDA API calls
• events or metrics for CUDA kernels
NVVP (NVIDIA Visual Profiler)
• The Visual Profiler displays a timeline of your application's activity on both the CPU
and GPU so that one can identify opportunities for performance improvement.
• Visualize profile data collected from nvprof
• More documentation can be found @ https://docs.nvidia.com/cuda/profiler-users-guide/index.html
Monitoring and Profiling tools
Nvidia Visual Profiler
Data
Transfer
Compute
Conclusion
32
➢ AC922 Designed for Super Computers
➢ Better performance for HPC applications
➢ High speed interconnect NVLink between CPU & GPU
➢ Simplified programming using Unified memory, ATS, and
OpenMP
References
–IBM Power System AC922 Introduction and Technical Overview
–NVIDIA Volta GPU
–IBM Power Systems Proof Points
–Unified Memory on P9+V100
–Summit SuperComputer
–Sierra SuperComputer
33
Notices and disclaimers
© 2018 International Business Machines Corporation. No part of this
document may be reproduced or transmitted in any form without
written permission from IBM.
U.S. Government Users Restricted Rights — use, duplication or
disclosurerestricted by GSA ADP Schedule Contract with IBM.
Information in these presentations (including informationrelating to
products that have not yet been announced by IBM) has been reviewed
for accuracy as of the date of initial publication and could include
unintentional technical or typographical errors. IBM shall have no
responsibility to update this information. This document is distributed
“as is” without any warranty, either express or implied. In no event,
shall IBM be liable for any damage arising from the use of this
information, including but not limited to, loss of data, business
interruption, loss of profit or loss of opportunity. IBM products and
services are warranted per the terms and conditions of the agreements
under which they are provided.
IBM products are manufactured from new parts or new and used parts.
In some cases, a product may not be new and may have been previously
installed. Regardless, our warranty terms apply.”
Any statements regarding IBM's future direction, intent or product
plans are subject to change or withdrawal without notice.
Performance data contained herein was generally obtained in a
controlled, isolated environments. Customer examples are presented as
illustrations of how those customers have used IBM products and the
results they may have achieved. Actual performance, cost, savings or
other results in other operating environments may vary.
References in this document to IBM products, programs, or services
does not imply that IBM intends to make such products, programs or
services available in all countries in which IBM operates or does
business.
Workshops, sessions and associated materials may have been prepared
by independent session speakers, and do not necessarily reflect the
views of IBM. All materials and discussions are provided for
informational purposes only, and are neither intended to, nor shall
constitutelegal or other guidance or advice to any individual participant
or their specific situation.
It is the customer’s responsibility to insure its own compliance
with legal requirements and to obtain advice of competent legal counsel
as to the identificationand interpretationof any relevant laws and
regulatory requirements that may affect the customer’s business and
any actions the customer may need to take to comply with such
laws. IBM does not provide legal advice or represent or warrant that its
services or products will ensure that the customer follows any law.
34
Notices and disclaimers
continued
Information concerningnon-IBM products was obtained from the
suppliers of those products, their published announcements or other
publicly available sources. IBM has not tested those products about this
publication and cannot confirm the accuracy of performance,
compatibility or any other claims related to non-IBM products.
Questions on the capabilities of non-IBM products should be addressed
to the suppliers of those products. IBM does not warrant the quality of
any third-party products, or the ability of any such third-party products
to interoperate with IBM’s products. IBM expressly disclaims all
warranties, expressed or implied, including but not limited to, the
implied warranties of merchantability and fitness for a purpose.
The provision of the information contained herein is not intended to, and
does not, grant any right or license under any IBM patents, copyrights,
trademarks or other intellectual property right.
IBM, the IBM logo, ibm.com and [names of other referenced IBM
products and services used in the presentation] are trademarks of
International Business Machines Corporation, registered in many
jurisdictions worldwide. Other product and service names might
be trademarks of IBM or other companies. A current list of IBM
trademarks is available on the Web at “Copyright and trademark
information”at: www.ibm.com/legal/copytrade.shtml.
35
Thank you
Pidad D’Souza
Power System Performance Architect
—
pidsouza@in.ibm.com
+91-80-4177 6526
ibm.com
36
®
37

Weitere ähnliche Inhalte

Was ist angesagt?

MIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platformMIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platformGanesan Narayanasamy
 
Covid-19 Response Capability with Power Systems
Covid-19 Response Capability with Power SystemsCovid-19 Response Capability with Power Systems
Covid-19 Response Capability with Power SystemsGanesan Narayanasamy
 
OpenPOWER Webinar on Machine Learning for Academic Research
OpenPOWER Webinar on Machine Learning for Academic Research OpenPOWER Webinar on Machine Learning for Academic Research
OpenPOWER Webinar on Machine Learning for Academic Research Ganesan Narayanasamy
 
Heterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of SystemsHeterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of SystemsAnand Haridass
 
EXTENT-2017: Heterogeneous Computing Trends and Business Value Creation
EXTENT-2017: Heterogeneous Computing Trends and Business Value CreationEXTENT-2017: Heterogeneous Computing Trends and Business Value Creation
EXTENT-2017: Heterogeneous Computing Trends and Business Value CreationIosif Itkin
 
Ac922 watson 180208 v1
Ac922 watson 180208 v1Ac922 watson 180208 v1
Ac922 watson 180208 v1IBM Sverige
 
Xilinx Edge Compute using Power 9 /OpenPOWER systems
Xilinx Edge Compute using Power 9 /OpenPOWER systemsXilinx Edge Compute using Power 9 /OpenPOWER systems
Xilinx Edge Compute using Power 9 /OpenPOWER systemsGanesan Narayanasamy
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLinside-BigData.com
 
Omp tutorial cpugpu_programming_cdac
Omp tutorial cpugpu_programming_cdacOmp tutorial cpugpu_programming_cdac
Omp tutorial cpugpu_programming_cdacGanesan Narayanasamy
 
Using a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application PerformanceUsing a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application PerformanceOdinot Stanislas
 
POWER9 AC922 Newell System - HPC & AI
POWER9 AC922 Newell System - HPC & AI POWER9 AC922 Newell System - HPC & AI
POWER9 AC922 Newell System - HPC & AI Anand Haridass
 
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021Deepak Shankar
 

Was ist angesagt? (20)

MIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platformMIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platform
 
Covid-19 Response Capability with Power Systems
Covid-19 Response Capability with Power SystemsCovid-19 Response Capability with Power Systems
Covid-19 Response Capability with Power Systems
 
Deeplearningusingcloudpakfordata
DeeplearningusingcloudpakfordataDeeplearningusingcloudpakfordata
Deeplearningusingcloudpakfordata
 
OpenPOWER Webinar on Machine Learning for Academic Research
OpenPOWER Webinar on Machine Learning for Academic Research OpenPOWER Webinar on Machine Learning for Academic Research
OpenPOWER Webinar on Machine Learning for Academic Research
 
Heterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of SystemsHeterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of Systems
 
SNAP MACHINE LEARNING
SNAP MACHINE LEARNINGSNAP MACHINE LEARNING
SNAP MACHINE LEARNING
 
EXTENT-2017: Heterogeneous Computing Trends and Business Value Creation
EXTENT-2017: Heterogeneous Computing Trends and Business Value CreationEXTENT-2017: Heterogeneous Computing Trends and Business Value Creation
EXTENT-2017: Heterogeneous Computing Trends and Business Value Creation
 
Ac922 watson 180208 v1
Ac922 watson 180208 v1Ac922 watson 180208 v1
Ac922 watson 180208 v1
 
Xilinx Edge Compute using Power 9 /OpenPOWER systems
Xilinx Edge Compute using Power 9 /OpenPOWER systemsXilinx Edge Compute using Power 9 /OpenPOWER systems
Xilinx Edge Compute using Power 9 /OpenPOWER systems
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and ML
 
Omp tutorial cpugpu_programming_cdac
Omp tutorial cpugpu_programming_cdacOmp tutorial cpugpu_programming_cdac
Omp tutorial cpugpu_programming_cdac
 
CFD on Power
CFD on Power CFD on Power
CFD on Power
 
@IBM Power roadmap 8
@IBM Power roadmap 8 @IBM Power roadmap 8
@IBM Power roadmap 8
 
2018 bsc power9 and power ai
2018   bsc power9 and power ai 2018   bsc power9 and power ai
2018 bsc power9 and power ai
 
BSC LMS DDL
BSC LMS DDL BSC LMS DDL
BSC LMS DDL
 
Using a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application PerformanceUsing a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application Performance
 
POWER9 AC922 Newell System - HPC & AI
POWER9 AC922 Newell System - HPC & AI POWER9 AC922 Newell System - HPC & AI
POWER9 AC922 Newell System - HPC & AI
 
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
 
AMD It's Time to ROC
AMD It's Time to ROCAMD It's Time to ROC
AMD It's Time to ROC
 
IBM BOA for POWER
IBM BOA for POWER IBM BOA for POWER
IBM BOA for POWER
 

Ähnlich wie IBM Power System AC922: The Brain Behind Blazing Fast Supercomputers

Stream Processing
Stream ProcessingStream Processing
Stream Processingarnamoy10
 
HPC Infrastructure To Solve The CFD Grand Challenge
HPC Infrastructure To Solve The CFD Grand ChallengeHPC Infrastructure To Solve The CFD Grand Challenge
HPC Infrastructure To Solve The CFD Grand ChallengeAnand Haridass
 
[OpenStack Days Korea 2016] Track3 - OpenStack on 64-bit ARM with X-Gene
[OpenStack Days Korea 2016] Track3 - OpenStack on 64-bit ARM with X-Gene[OpenStack Days Korea 2016] Track3 - OpenStack on 64-bit ARM with X-Gene
[OpenStack Days Korea 2016] Track3 - OpenStack on 64-bit ARM with X-GeneOpenStack Korea Community
 
APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2Junli Gu
 
Design installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttuDesign installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttuAlan Sill
 
Evolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server SolutionEvolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server SolutionNVIDIA Taiwan
 
High Performance Computing for LiDAR Data Production
High Performance Computing for LiDAR Data ProductionHigh Performance Computing for LiDAR Data Production
High Performance Computing for LiDAR Data ProductionMattBethel1
 
AI Accelerators for Cloud Datacenters
AI Accelerators for Cloud DatacentersAI Accelerators for Cloud Datacenters
AI Accelerators for Cloud DatacentersCastLabKAIST
 
Amd accelerated computing -ufrj
Amd   accelerated computing -ufrjAmd   accelerated computing -ufrj
Amd accelerated computing -ufrjRoberto Brandao
 
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU ServerModular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU ServerRebekah Rodriguez
 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Lablup Inc.
 
Kindratenko hpc day 2011 Kiev
Kindratenko hpc day 2011 KievKindratenko hpc day 2011 Kiev
Kindratenko hpc day 2011 KievVolodymyr Saviak
 
Supermicro’s Universal GPU: Modular, Standards Based and Built for the Future
Supermicro’s Universal GPU: Modular, Standards Based and Built for the FutureSupermicro’s Universal GPU: Modular, Standards Based and Built for the Future
Supermicro’s Universal GPU: Modular, Standards Based and Built for the FutureRebekah Rodriguez
 
lecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxlecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxssuser413a98
 
NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019
NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019
NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019NVIDIA
 
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...Databricks
 
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU ServerModular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU ServerRebekah Rodriguez
 
組み込みから HPC まで ARM コアで実現するエコシステム
組み込みから HPC まで ARM コアで実現するエコシステム組み込みから HPC まで ARM コアで実現するエコシステム
組み込みから HPC まで ARM コアで実現するエコシステムShinnosuke Furuya
 

Ähnlich wie IBM Power System AC922: The Brain Behind Blazing Fast Supercomputers (20)

Stream Processing
Stream ProcessingStream Processing
Stream Processing
 
HPC Infrastructure To Solve The CFD Grand Challenge
HPC Infrastructure To Solve The CFD Grand ChallengeHPC Infrastructure To Solve The CFD Grand Challenge
HPC Infrastructure To Solve The CFD Grand Challenge
 
[OpenStack Days Korea 2016] Track3 - OpenStack on 64-bit ARM with X-Gene
[OpenStack Days Korea 2016] Track3 - OpenStack on 64-bit ARM with X-Gene[OpenStack Days Korea 2016] Track3 - OpenStack on 64-bit ARM with X-Gene
[OpenStack Days Korea 2016] Track3 - OpenStack on 64-bit ARM with X-Gene
 
APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2
 
Design installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttuDesign installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttu
 
Evolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server SolutionEvolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server Solution
 
High Performance Computing for LiDAR Data Production
High Performance Computing for LiDAR Data ProductionHigh Performance Computing for LiDAR Data Production
High Performance Computing for LiDAR Data Production
 
AI Accelerators for Cloud Datacenters
AI Accelerators for Cloud DatacentersAI Accelerators for Cloud Datacenters
AI Accelerators for Cloud Datacenters
 
RAPIDS Overview
RAPIDS OverviewRAPIDS Overview
RAPIDS Overview
 
Amd accelerated computing -ufrj
Amd   accelerated computing -ufrjAmd   accelerated computing -ufrj
Amd accelerated computing -ufrj
 
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU ServerModular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
 
Advances in GPU Computing
Advances in GPU ComputingAdvances in GPU Computing
Advances in GPU Computing
 
Kindratenko hpc day 2011 Kiev
Kindratenko hpc day 2011 KievKindratenko hpc day 2011 Kiev
Kindratenko hpc day 2011 Kiev
 
Supermicro’s Universal GPU: Modular, Standards Based and Built for the Future
Supermicro’s Universal GPU: Modular, Standards Based and Built for the FutureSupermicro’s Universal GPU: Modular, Standards Based and Built for the Future
Supermicro’s Universal GPU: Modular, Standards Based and Built for the Future
 
lecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxlecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptx
 
NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019
NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019
NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019
 
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
 
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU ServerModular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
 
組み込みから HPC まで ARM コアで実現するエコシステム
組み込みから HPC まで ARM コアで実現するエコシステム組み込みから HPC まで ARM コアで実現するエコシステム
組み込みから HPC まで ARM コアで実現するエコシステム
 

Mehr von Ganesan Narayanasamy

Chip Design Curriculum development Residency program
Chip Design Curriculum development Residency programChip Design Curriculum development Residency program
Chip Design Curriculum development Residency programGanesan Narayanasamy
 
Basics of Digital Design and Verilog
Basics of Digital Design and VerilogBasics of Digital Design and Verilog
Basics of Digital Design and VerilogGanesan Narayanasamy
 
180 nm Tape out experience using Open POWER ISA
180 nm Tape out experience using Open POWER ISA180 nm Tape out experience using Open POWER ISA
180 nm Tape out experience using Open POWER ISAGanesan Narayanasamy
 
Workload Transformation and Innovations in POWER Architecture
Workload Transformation and Innovations in POWER Architecture Workload Transformation and Innovations in POWER Architecture
Workload Transformation and Innovations in POWER Architecture Ganesan Narayanasamy
 
Deep Learning Use Cases using OpenPOWER systems
Deep Learning Use Cases using OpenPOWER systemsDeep Learning Use Cases using OpenPOWER systems
Deep Learning Use Cases using OpenPOWER systemsGanesan Narayanasamy
 
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systemsAI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systemsGanesan Narayanasamy
 
AI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systemsAI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systemsGanesan Narayanasamy
 
AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems Ganesan Narayanasamy
 
Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9Ganesan Narayanasamy
 
OpenPOWER Foundation Introduction
OpenPOWER Foundation Introduction OpenPOWER Foundation Introduction
OpenPOWER Foundation Introduction Ganesan Narayanasamy
 

Mehr von Ganesan Narayanasamy (20)

Chip Design Curriculum development Residency program
Chip Design Curriculum development Residency programChip Design Curriculum development Residency program
Chip Design Curriculum development Residency program
 
Basics of Digital Design and Verilog
Basics of Digital Design and VerilogBasics of Digital Design and Verilog
Basics of Digital Design and Verilog
 
180 nm Tape out experience using Open POWER ISA
180 nm Tape out experience using Open POWER ISA180 nm Tape out experience using Open POWER ISA
180 nm Tape out experience using Open POWER ISA
 
Workload Transformation and Innovations in POWER Architecture
Workload Transformation and Innovations in POWER Architecture Workload Transformation and Innovations in POWER Architecture
Workload Transformation and Innovations in POWER Architecture
 
OpenPOWER Workshop at IIT Roorkee
OpenPOWER Workshop at IIT RoorkeeOpenPOWER Workshop at IIT Roorkee
OpenPOWER Workshop at IIT Roorkee
 
Deep Learning Use Cases using OpenPOWER systems
Deep Learning Use Cases using OpenPOWER systemsDeep Learning Use Cases using OpenPOWER systems
Deep Learning Use Cases using OpenPOWER systems
 
OpenPOWER System Marconi100
OpenPOWER System Marconi100OpenPOWER System Marconi100
OpenPOWER System Marconi100
 
OpenPOWER Latest Updates
OpenPOWER Latest UpdatesOpenPOWER Latest Updates
OpenPOWER Latest Updates
 
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systemsAI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
 
AI in healthcare - Use Cases
AI in healthcare - Use Cases AI in healthcare - Use Cases
AI in healthcare - Use Cases
 
AI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systemsAI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systems
 
AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems
 
Poster from NUS
Poster from NUSPoster from NUS
Poster from NUS
 
SAP HANA on POWER9 systems
SAP HANA on POWER9 systemsSAP HANA on POWER9 systems
SAP HANA on POWER9 systems
 
Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9
 
AI in the enterprise
AI in the enterprise AI in the enterprise
AI in the enterprise
 
Robustness in deep learning
Robustness in deep learningRobustness in deep learning
Robustness in deep learning
 
Perspectives of Frond end Design
Perspectives of Frond end DesignPerspectives of Frond end Design
Perspectives of Frond end Design
 
A2O Core implementation on FPGA
A2O Core implementation on FPGAA2O Core implementation on FPGA
A2O Core implementation on FPGA
 
OpenPOWER Foundation Introduction
OpenPOWER Foundation Introduction OpenPOWER Foundation Introduction
OpenPOWER Foundation Introduction
 

Kürzlich hochgeladen

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 

Kürzlich hochgeladen (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

IBM Power System AC922: The Brain Behind Blazing Fast Supercomputers

  • 1. IBM Power System AC922 : The Brain Behind the Supercomputer — Pidad D’Souza(pidsouza@in.ibm.com) Aditya Nitsure(anitsure@in.ibm.com) Power System Performance, ISDL, IBM, Bengaluru
  • 2. Agenda ● AC922 System Components ● AC922 Characteristics ● System Features
  • 3. IBM Systems at Supercomputing 2019 / © 2019 IBM Corporation The most powerful supercomputer on the planet 4 out of 11 Top500 are IBM Power9 Systems 3 ▪ 4,608 IBM AC922 nodes ▪ 200 peta FLOPS ▪ 27,648 NVIDIA Tesla GPUs ▪ 25 gigabytes per second between nodes ▪ 13 MW Energy Exascale Energy Budget: 20-40 megawatts (MW) Innovations in Hardware and Software ▪ Processor/Accelerators ▪ Memory ▪ Interconnect ▪ Spectrum MPI, Math Libraries 4 out of Top 10 Green500 systems are IBM Power9 systems AiMOS – Green500 No. 3 with 15.72 GFlops/Watt
  • 5. + Rest of Sequential CPU Code Compute-Intensive Code Application Code GPU Acceleration 5 CPU – Large and broad instruction set to perform complex operations GPU – High throughput – Massive parallelization through large number of cores – Specialized for SIMD/SIMT Heterogenous Computing Maximize performance and energy efficiency
  • 6. – NVLink 2.0 : High-Bandwidth Interconnect o 150 bi-directional bandwidth (or 100 GB/s for 6 GPU config) between CPU-GPU and GPU-GPU – Coherent access to CPU memory Summit and Sierra Supercomputer configurations 6 Nvidia V100 NVLink 150GB/s DDR4 170GB/s POWER9 PCIe4.0 CAPI 2.0 NVLink 150GB/s NVLink 100GB/s DDR4 170GB/s POWER9 NVLink 100GB/s Sierra (4 GPU Half Node) Summit (6 GPU Half Node) IB PCIe4.0 CAPI 2.0 Coherent access to system memory Nvidia V100 • CPU and GPU co-operate in execution of work • GPU coherently access to CPU memory Coherent access to system memory IB
  • 7. 7 – Delivers unprecedented performance for modern HPC, analytics, and artificial intelligence (AI) – Designed to fully exploit the capabilities of CPU and GPU accelerators – Eliminates I/O bottlenecks and allows sharing memory across GPUs and CPUs – Extraordinary POWER9 CPUs – 2-6 NVIDIA® Tesla® V100 GPUs with NVLink – Co-optimized hardware and software for deep learning and AI – Supports up to 5.6x more I/O bandwidth than competitive servers – Combines the cutting edge AI innovation Data Scientists desire with the dependability IT – Next Gen PCIe - PCIe Gen4 2x faster IBM POWER9 AC922 Server 7
  • 8. 8 – Designed for AI Computing and HPC – Second-Generation NVLink™ – HBM2 Memory: Faster, Higher Efficiency – Enhanced Unified Memory and Address Translation Services – Maximum Performance and Maximum Efficiency Modes – Number of SM/cores : 80/5120 – Double Precision Performance : 7.5 TFLOPS – Single Precision Performance : 15 TFLOPS – 125 Tensor TFLOPS – GPU Memory : 16 or 32 GB – Memory bandwidth : 900 GB/s https://devblogs.nvidia.com/inside-volta Nvidia Tesla V100 GPU
  • 10. Boost application performance with sustained peak memory bandwidth of ~280GB/s CPU STREAM Bandwidth STREAM benchmark ( https://www.cs.virginia.edu/stream/) *not submitted 10
  • 11. NVIDIA Volta 100 Compute – Single and Double Precision –Applications to have more compute power –Shorten time to completion –Accomplish more simulation/experiment –1.5x higher compute than NVIDIA P100 GPUs 0 2 4 6 8 10 12 14 16 S822LC + P100 AC922 + V100 4.8 7.45 9.8 15.3 Compute-TFLOPS NVIDIA V100 SGEMM and DGEMM DGEMM SGEMM 1.5x higher 11
  • 12. NVIDIA V100 GPU memory bandwidth (GPU STREAM) 0 100 200 300 400 500 600 700 800 900 S822LC + P100 AC922 + V100 512 840 Bandwidth-GB/s 840 GB/s 1.6x Higher –1.6x Higher Bandwidth than NVIDIA P100 –Speed up of memory intensive applications Theoretical 12
  • 13. 0 10 20 30 40 50 60 70 Xeon E5- 2640 V4 + P100 S822LC + P100 AC922 + 6 V100 AC922 + 4 V100 12 34.16 45.9 68 Bandwidth–GB/s CPU to GPU NVLink Vs PCIe3 bandwidth 5.6x better 3.8x better 2x –NVLink 2.0 is 5.6x better than PCIe3 –Remove CPU-GPU Data transfer bottlenecks 2.8x better 1.34x Note: NVIDIA bandwidth test used for measurement 13
  • 14. NVLINK Bandwidth with varied data sizes –Minimize communication latencies –Unlock PCIe bottlenecks –Transfer larger data at high speed –Ideal for data size larger than GPU memory 0 10000 20000 30000 40000 50000 60000 70000 80000 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536 131072 262144 524288 1048576 Bandwidth-MB/s Data Size - KBytes NVLink2.0 vs PCIe3 Host to Device Bandwidth 2NVLinkPerGPU 3NVLinkPerGPU PCIe3 14
  • 15. Workload Optimized Frequency (WOF) – Boost performance of less active workload through higher frequency – Lower the frequency to save power or boost other cores – Maximize performance through dynamically adjusting processor frequency – Governing factors • Processor utilization, Number of active cores & Environment condition – Power Saver Modes • Dynamic Performance Mode(DPM) • Maximum Performance Mode(MPM) 15
  • 16. IBM Systems at Supercomputing 2019 / © 2019 IBM Corporation HPC Interconnect –Multi-Host Adapter (Mellanox ConnectX-5 EDR) • Latency : sub-600 nanoseconds • Bandwidth : 2 ports of 100Gb/s • Message Rate : 200M messages/second –Adapter Features • Switch based collectives - SHARP • Hardware Tag Matching • User mode memory registration(UMR) • GPU Direct RDMA • Tunneled Atomics P9 X-Bus x8x8 IB - EDR P9 16
  • 17. The Design, Deployment, and Evaluation of the CORAL Pre-Exascale Systems, SC18. Bi-section Bandwidth & All Reduce scaling on Summit – Good scaling at large scale due to ~74% of bisection bandwidth with adaptive routing enabled – SMPI supports HCOLL(FCA) & SHARP, enables applications to run with best collective performance
  • 18. The Design, Deployment, and Evaluation of the CORAL Pre-Exascale Systems, SC18. – Imporved application performance through burst buffer – Applications not bottlenecked on I/O operations on Parallel file systems Burst buffer performance on Summit
  • 20. HPC APPLICATION ACCELERATION METHODOLOGIES o Unified Memory o Coherency o ATS o OpenMP 20
  • 21. CUDA Programming 21 h_data = cudaMallocHost(size) // Allocate memory on the host d_data = cudaMalloc(size) // Allocate memory on the GPU init_dataCPU(h_data) cudaMemcpy(h_data, d_data, size, HostToDevice) // Move data to GPU gpu_kernel<<<…>>> // GPU compute cudaMemcpy(d_data, h_data, size, DeviceToHost) // Move results back to CPU cpu_processing(h_data) 21
  • 22. Unified Memory Programming 22 – Single memory address space accessible to both CPU & GPU – Enables oversubscribing memory • Computation of data size larger than GPU memory – System wide atomic memory operations – Transparent Memory migration between CPU and GPU depending on who accesses it • Explicit migration through cudaMemPrefetchAsyn() – Allocating Unified memory • Replace “malloc” & “new” with “cudaMallocManaged” GPU CPU Unified Memory 22
  • 23. Unified Memory Advises 23 – ReadMostly • Data is mostly read, occasionally written • Duplicate pages, writes possible but expensive – PreferredLocation • Specify preferred location for data • “resist” migrations from the preferred location – AccessedBy • Establish mappings to avoid migrations and access directly char *data; cudaMallocManaged(&data, size); init_dataCPU(data, size); cudaMemPrefetchAsync(data, size, gpuID); cudaMemAdvise(data, size, …ReadMostly, gpuID); gpuKernel<<<… >>>(data, size); // Transparent data migration to GPU cudaDeviceSynchronize(); use_dataCPU(data, size); //Data migrate back to CPU 23
  • 24. *data = malloc(size); gpu_kernel<<<…>>>(data); data[1024]; gpu_kernel<<<…>>>(data); extern float *data; gpu_kernel<<<…>>>(data); Hardware Coherency & ATS 24 – Hardware coherency • CPU can directly access and cache GPU memory • Native atomics support – Address Translation Services(ATS) • Allows the GPU to access the CPU’s page tables directly • System allocator support – malloc, stack, global, file system Simplifiedprogramming with ATS 24
  • 25. CUDA Aware MPI 25 –Avoid staging of GPU buffers in host memory –Run applications efficiently –IBM SpectrumMPI is CUDA- Aware 25 Code without CUDA-Aware MPI (using GPU buffers) //MPI Rank 0 CudaMemcpy(…, DeviceToHost) MPI_Send() //MPI Rank 1 MPI_Recv() CudaMemcpy(…, HostToDevice) Code with CUDA-Aware MPI (using GPU buffers) //MPI Rank 0 CudaMemcpy(…, DeviceToHost) MPI_Send() //MPI Rank 1 MPI_Recv() CudaMemcpy(…, HostToDevice) https://devblogs.nvidia.com/introduction-cuda-aware-mpi/
  • 26. GPU Direct RDMA 26 – Data exchange between GPU and other Peer devices using PCIe standards – Network devices directly access GPU memory bypassing host 26
  • 28. Monitoring and Profiling tools Monitoring ➢ mpstat, vmstat – CPU and memory utilization ➢ numastat – numa memory statistics ➢ top/htop – real-time view of system usage Profiling ➢ Perf record/report – CPU profiling ➢ nvprof – GPU profiling CPU memory GPU memory numastat
  • 29. nvidia-smi Also check “nvidia-smi –query-gpu” more monitoring options Monitoring and Profiling tools
  • 30. nvprof • The nvprof is command-line profiling tool which enables you to collect and view profiling data • Using nvprof one can collect – • kernel execution time • memory transfers • memory set and CUDA API calls • events or metrics for CUDA kernels NVVP (NVIDIA Visual Profiler) • The Visual Profiler displays a timeline of your application's activity on both the CPU and GPU so that one can identify opportunities for performance improvement. • Visualize profile data collected from nvprof • More documentation can be found @ https://docs.nvidia.com/cuda/profiler-users-guide/index.html Monitoring and Profiling tools
  • 32. Conclusion 32 ➢ AC922 Designed for Super Computers ➢ Better performance for HPC applications ➢ High speed interconnect NVLink between CPU & GPU ➢ Simplified programming using Unified memory, ATS, and OpenMP
  • 33. References –IBM Power System AC922 Introduction and Technical Overview –NVIDIA Volta GPU –IBM Power Systems Proof Points –Unified Memory on P9+V100 –Summit SuperComputer –Sierra SuperComputer 33
  • 34. Notices and disclaimers © 2018 International Business Machines Corporation. No part of this document may be reproduced or transmitted in any form without written permission from IBM. U.S. Government Users Restricted Rights — use, duplication or disclosurerestricted by GSA ADP Schedule Contract with IBM. Information in these presentations (including informationrelating to products that have not yet been announced by IBM) has been reviewed for accuracy as of the date of initial publication and could include unintentional technical or typographical errors. IBM shall have no responsibility to update this information. This document is distributed “as is” without any warranty, either express or implied. In no event, shall IBM be liable for any damage arising from the use of this information, including but not limited to, loss of data, business interruption, loss of profit or loss of opportunity. IBM products and services are warranted per the terms and conditions of the agreements under which they are provided. IBM products are manufactured from new parts or new and used parts. In some cases, a product may not be new and may have been previously installed. Regardless, our warranty terms apply.” Any statements regarding IBM's future direction, intent or product plans are subject to change or withdrawal without notice. Performance data contained herein was generally obtained in a controlled, isolated environments. Customer examples are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual performance, cost, savings or other results in other operating environments may vary. References in this document to IBM products, programs, or services does not imply that IBM intends to make such products, programs or services available in all countries in which IBM operates or does business. Workshops, sessions and associated materials may have been prepared by independent session speakers, and do not necessarily reflect the views of IBM. All materials and discussions are provided for informational purposes only, and are neither intended to, nor shall constitutelegal or other guidance or advice to any individual participant or their specific situation. It is the customer’s responsibility to insure its own compliance with legal requirements and to obtain advice of competent legal counsel as to the identificationand interpretationof any relevant laws and regulatory requirements that may affect the customer’s business and any actions the customer may need to take to comply with such laws. IBM does not provide legal advice or represent or warrant that its services or products will ensure that the customer follows any law. 34
  • 35. Notices and disclaimers continued Information concerningnon-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products about this publication and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. IBM does not warrant the quality of any third-party products, or the ability of any such third-party products to interoperate with IBM’s products. IBM expressly disclaims all warranties, expressed or implied, including but not limited to, the implied warranties of merchantability and fitness for a purpose. The provision of the information contained herein is not intended to, and does not, grant any right or license under any IBM patents, copyrights, trademarks or other intellectual property right. IBM, the IBM logo, ibm.com and [names of other referenced IBM products and services used in the presentation] are trademarks of International Business Machines Corporation, registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at “Copyright and trademark information”at: www.ibm.com/legal/copytrade.shtml. 35
  • 36. Thank you Pidad D’Souza Power System Performance Architect — pidsouza@in.ibm.com +91-80-4177 6526 ibm.com 36
  • 37. ® 37