In this slidecast, Sumit Gupta from Nvidia discusses the latest product news on GPU computing for HPC.
* IBM and NVIDIA Partner to Build Next-Generation Supercomputers
* NVIDIA Launches the Tesla K40 GPU Accelerator, their fastest accelerator ever
Learn more: http://nvidianews.nvidia.com/Releases/NVIDIA-Launches-World-s-Fastest-Accelerator-for-Supercomputing-and-Big-Data-Analytics-a66.aspx
Watch the video presentation: http://wp.me/p3RLHQ-aRY
2. SC13
News
1
IBM Taps GPU Accelerators
2
New Product Announcements
3
New Supercomputer Announcements
3. Accelerated Computing Growing Fast
2x Growth in One Year
50%
Percent of HPC Systems
With Accelerators
44%
Hundreds of GPU
Accelerated Apps
300
242
250
40%
200
30%
22%
24%
150
20%
NVIDIA GPU is
Accelerator of Choice
INTEL PHI
4%
OTHERS
11%
182
113
100
10%
50
0%
0
2010
2011
2012
Intersect360 Research
HPC User Site Census: Systems, July 2013
NVIDIA GPUs
85%
2011
2012
2013
Intersect360 Research
HPC User Site Census: Systems, July 2013
4. IBM Using GPUs to Accelerate
Enterprise & Data Analytics Applications
Application
Infrastructure
Business Intelligence
Predictive Analytics
Risk Analytics
5. IBM Partners with NVIDIA to Build NextGeneration Supercomputers
+
Tesla
GPU
POWER8
CPU
GPU-Accelerated POWER-Based Systems Available in 2014
6. GPU Computing in Data Centers
Power
ARM64
x86
x86
2007
2008
2009
2010
2011
2012
2013
2014
7. Linux GCC Compiler to Support GPU Accelerators
Open Source
OpenACC in GCC by Mentor Graphics & Samsung
Pervasive Impact
Free to all Linux users
Mainstream
Most Widely Used HPC Compiler
“ Incorporating OpenACC into GCC is an excellent example of open source and
open standards working together to make accelerated computing broadly
accessible to all Linux developers.
”
7
OpenACC-standard.org confidential
Oscar Hernandez
Oak Ridge National Laboratory
8. SC13
News
1
IBM Taps GPU Accelerators
2
New Product Announcements
3
New Supercomputer Announcements
9. Tesla K40
World’s Fastest Accelerator
for Supercomputing and
Big Data Analytics
CUDA 6
Dramatically Simplifies
Parallel Programming with
Unified Memory
10. Tesla K40
World’s Fastest Accelerator
FASTER
1.4 TF| 2880 Cores | 288 GB/s
ns/day
5
LARGER
2x Memory Enables More Apps
AMBER Benchmark
4
SMARTER
Unlock Extra Performance
Using Power Headroom
6GB
3
2
Fluid
Rendering
Dynamics
Seismic
Analysis
1
0
CPU
K20X
K40
GPU Boost
12GB
AMBER Benchmark: SPFP-Nucleosome
CPU: Dual E5-2687W @ 3.10GHz, 64GB System Memory, CentOS 6.2, GPU systems: Single Tesla K20X or Single Tesla K40
11. GPU Boost
Up to 25% Extra Performance on Applications
Use Power Headroom to Run at Higher Clocks
1.40
25%
Faster
1.20
20%
Faster
14%
Faster
17%
Faster
1.00
0.80
13%
Faster
0.60
0.40
0.20
11%
Faster
0.00
AMBER SPFP-TRPCage
Tesla K40 (base)
LAMMPS-EAM
NAMD 2.9-APOA1
Tesla K40 with GPU Boost
14. Super Simplified Memory Management Code
CPU Code
void sortfile(FILE *fp, int N) {
char *data;
data = (char *)malloc(N);
CUDA 6 Code with Unified Memory
void sortfile(FILE *fp, int N) {
char *data;
cudaMallocManaged(&data, N);
fread(data, 1, N, fp);
qsort(data, N, 1, compare);
qsort<<<...>>>(data,N,1,compare);
cudaDeviceSynchronize();
use_data(data);
use_data(data);
free(data);
}
fread(data, 1, N, fp);
cudaFree(data);
}
15. SC13
News
1
IBM Taps GPU Accelerators
2
New Product Announcements
3
New Supercomputer Announcements
16. Fastest Supercomputer In Europe
6.27 PetaFLOPS (80% Linpack Efficiency)
Piz Daint
Greenest Petascale System
3110 MFLOPS/W
#2: JUQUEEN: 2176 MFLOPS/W
Production-Grade
Weather Forecasts: COSMO
7 National Weather Agencies
Germany | Greece | Italy | Poland | Russia |
Romania | Switzerland
17. Greenest Supercomputer in the World
Tokyo Tech KFC System
4000+ MFLOPS per Watt
25% Higher than #1 Green500 System
160 Tesla K20X GPUs
Oil Immersion Technology
Current Green500 #1: CINECA Eurora System, Italy, 3208 MF/W
18. ANSYS Fluent Doubles Performance with GPUs
Automobile Drag Simulation Throughput
30
Number of Jobs per Day
25
90%
Faster
20
15
2x
10
Better Insight for Low Drag Design
5
2%
0
CPU
K40
2 x E5-2680 CPUs 8 cores used; 2 Tesla K40s
Sedan Geometry, 3.6M mixed cells
Steady, turbulent, external aerodynamics- Coupled PBNS, DP Solver
1.5B
Less Drag
Gal. of Fuel Saved/Year