2. Benefits of GPU Accelerated Computing
Faster than CPU only systems in all tests
Large performance boost with marginal price increase
Energy usage cut by more than half
GPUs scale well within a node and over multiple nodes
K20 GPU is our fastest and lowest power high performance GPU yet
Try GPU accelerated GROMACS for free – www.nvidia.com/GPUTestDrive
2
3. Great Scaling in Small Systems
25.00
Running GROMACS 4.6 pre-beta with CUDA 4.1
21.68
Each blue node contains 1x Intel X5550 CPU
20.00 3.2x (95W TDP, 4 Cores per CPU)
3.2x Each green node contains 1x Intel X5550 CPU
Nanoseconds / Day
(95W TDP, 4 Cores per CPU) and 1x NVIDIA
15.00 M2090 (225W TDP per GPU)
13.01
CPU Only
10.00 3.6x With GPU
8.36
3.6x
5.00
3.7x
Benchmark systems: RNAse in water
with 16,816 atoms in truncated
dodecahedron box
0.00
1 2 3
Number of Nodes
Get up to 3.7x performance compared to CPU-only nodes
4. Additional Strong Scaling on Larger System
128K Water Molecules
160 Running GROMACS 4.6 pre-beta with CUDA 4.1
Each blue node contains 1x Intel X5670 (95W
140
TDP, 6 Cores per CPU)
120 Each green node contains 1x Intel X5670 (95W
2x TDP, 6 Cores per CPU) and 1x NVIDIA M2070
Nanoseconds / Day
100 (225W TDP per GPU)
80
CPU Only
60 With GPU
2.8x
40
20
3.1x
0
8 16 32 64 128
Number of Nodes
Up to 128 nodes, NVIDIA GPU-accelerated nodes deliver 2-3x performance
when compared to CPU-only nodes
5. Replace 3 Nodes with 2 GPUs
Running GROMACS 4.6 pre-beta with CUDA 4.1
ADH in Water (134K Atoms)
The blue node contains 2x Intel X5550 CPUs
9 4 CPU Nodes
9000 (95W TDP, 4 Cores, $1000 per CPU)
8.36
$8,000
8 8000 The green node contains 2x Intel X5550 CPUs
(95W TDP, 4 Cores, $1000 per CPU) and 2x
7 6.7 7000
$6,500 NVIDIA M2090s as the GPU (225W TDP, $2000
per GPU)
6 6000
5 5000
4 4000
3 3000
2 2000
1 1000
0 0
Nanoseconds/Day Cost
Save thousands of dollars and perform 25% faster
6. Greener Science
ADH in Water (134K Atoms)
Running GROMACS 4.6 with CUDA 4.1
12000
The blue nodes contain 2x Intel X5550 CPUs
Energy Expended (KiloJoules Consumed)
(95W TDP, 4 Cores per CPU)
10000
The green node contains 2x Intel X5550 CPUs,
Lower is better 4 Cores per CPU) and 2x NVIDIA M2090s GPUs
8000 (225W TDP per GPU)
6000
4000 Energy Expended
= Power x Time
2000
0
4 Nodes 1 Node + 2x M2090
(760 Watts) (640 Watts)
In simulating each nanosecond, the GPU-accelerated system uses 33% less energy
7. The Power of Kepler
RNase Solvated Protein 24k Atoms
140
Running GROMACS version 4.6 beta
120
The grey nodes contain 1 or 2 E5-2687W CPUs
(150W each, 8 Cores per CPU) and 1 or 2
100 NVIDIA M2090s.
The green nodes contain 1 or 2 E5-2687W
80 CPUs (8 Cores per CPU) and 1 or 2 NVIDIA
M2090 K20X GPUs (235W each).
60 K20X
40
20
0
1 CPU + 1 GPU 1 CPU + 2 GPU 2 CPU + 1 GPU 2 CPU + 2 GPU
Upgrading an M2090 to a K20X increases performance 10-45%
Ribonuclease
8. K20X – Fast
RNase Solvated Protein 24k Atoms
120
Running GROMACS version 4.6 beta
100
The blue nodes contain 1 or 2 E5-2687W CPUs
(150W each, 8 Cores per CPU).
80
Nanoseconds / Day
The green nodes contain 1 or 2 E5-2687W
CPUs (8 Cores per CPU) and 1 or 2 NVIDIA
K20X GPUs (235W each).
60 CPU Only
With 1 K20X
40
20
0
1 CPU 2 CPUs
Adding a K20X increases performance by up to 3x
Ribonuclease
9. K20X, the Fastest Yet
192K Water Molecules
16
Running GROMACS version 4.6-beta2 and
14 CUDA 5.0.35
12 The blue node contains 2 E5-2687W CPUs
(150W each, 8 Cores per CPU).
Nanoseconds / Day
10 The green nodes contain 2 E5-2687W CPUs (8
Cores per CPU) and 1 or 2 NVIDIA K20X GPUs
8 (235W each).
6
4
2
0
CPU CPU + K20X CPU + 2x K20X
Using K20X nodes increases performance by 2.5x
Water
10. Recommended GPU Node Configuration for
GROMACS Computational Chemistry
Workstation or Single Node Configuration
# of CPU sockets 2
Cores per CPU socket 6+
CPU speed (Ghz) 2.66+
System memory per socket (GB) 32
Kepler K10, K20, K20X
GPUs
Fermi M2090, M2075, C2075
1x
Kepler-based GPUs (K20X, K20 or K10): need fast Sandy
# of GPUs per CPU socket
Bridge or perhaps the very fastest Westmeres, or high-end
AMD Opterons
GPU memory preference (GB) 6
GPU to CPU connection PCIe 2.0 or higher
Server storage 500 GB or higher
Network configuration Gemini, InfiniBand
10 Scale to multiple nodes with same single node configuration
11. GPU Test Drive
Experience GPU Acceleration
For Computational Chemistry
Researchers, Biophysicists
Preconfigured with Molecular
Dynamics Apps
Remotely Hosted GPU Servers
Free & Easy – Sign up, Log in and
See Results
www.nvidia.com/gputestdrive
11
Editor's Notes
Nodes CPU only gpu1 2.26 8.362 3.58 13.014 6.7 21.68
Nodes CPU GPU86.61320.3351611.28237.01632 23.06763.8766442.28496.62812872.694 144.424
nanoseconds/day8 X5550 6.72M2090+2X5550 8.36CPU Node: 4 X 2 X $1000 = $8000CPU + GPU Node: 1 X 2 X $1000 + 2 X $2000 = $6000
Before we end this session I would like to tell you about GPU Test Drive. It is an excellent resource for computational chemistry researchers such as yourself to evaluate benefits of GPU computing in speeding up your simulations. Most importantly it is free.NVIDIA along with its partners is offering access to remotely hosted GPU cluster. You can run applications such as AMBER and NAMD to find out how your models speed up. You can also try code that you have developed to run on GPU and see how it scales on a 8 GPU cluster. All you need to do is sign up and log in – it is really that easy! We have several partners who are demonstrating the GPU Test Drive on the GTC show floor. Please plan on visiting them.Sign up forms have been given out. If you are interested please fill them out and return them to me.