SlideShare a Scribd company logo
1 of 11
GROMACS 4.6 Pre-Beta
    and 4.6 Beta
Benefits of GPU Accelerated Computing
     Faster than CPU only systems in all tests

     Large performance boost with marginal price increase

     Energy usage cut by more than half

     GPUs scale well within a node and over multiple nodes

     K20 GPU is our fastest and lowest power high performance GPU yet

       Try GPU accelerated GROMACS for free – www.nvidia.com/GPUTestDrive
2
Great Scaling in Small Systems
                    25.00
                                                                                               Running GROMACS 4.6 pre-beta with CUDA 4.1
                                                                            21.68
                                                                                               Each blue node contains 1x Intel X5550 CPU
                    20.00                                               3.2x                   (95W TDP, 4 Cores per CPU)

                                                                 3.2x                          Each green node contains 1x Intel X5550 CPU
Nanoseconds / Day




                                                                                               (95W TDP, 4 Cores per CPU) and 1x NVIDIA
                    15.00                                                                      M2090 (225W TDP per GPU)
                                                         13.01

                                                                                    CPU Only
                    10.00                            3.6x                           With GPU
                                       8.36
                                              3.6x

                     5.00
                            3.7x
                                                                                                   Benchmark systems: RNAse in water
                                                                                                   with 16,816 atoms in truncated
                                                                                                   dodecahedron box
                     0.00
                                   1                 2                  3
                                              Number of Nodes



                     Get up to 3.7x performance compared to CPU-only nodes
Additional Strong Scaling on Larger System
                                          128K Water Molecules
                    160                                                               Running GROMACS 4.6 pre-beta with CUDA 4.1

                                                                                      Each blue node contains 1x Intel X5670 (95W
                    140
                                                                                      TDP, 6 Cores per CPU)

                    120                                                               Each green node contains 1x Intel X5670 (95W
                                                                      2x              TDP, 6 Cores per CPU) and 1x NVIDIA M2070
Nanoseconds / Day




                    100                                                               (225W TDP per GPU)

                     80
                                                                           CPU Only
                     60                                                    With GPU

                                                        2.8x
                     40

                     20
                              3.1x
                      0
                          8          16            32          64   128
                                             Number of Nodes



Up to 128 nodes, NVIDIA GPU-accelerated nodes deliver 2-3x performance
                   when compared to CPU-only nodes
Replace 3 Nodes with 2 GPUs
                                                                            Running GROMACS 4.6 pre-beta with CUDA 4.1
                ADH in Water (134K Atoms)
                                                                            The blue node contains 2x Intel X5550 CPUs
9                                                           4 CPU Nodes
                                                                     9000   (95W TDP, 4 Cores, $1000 per CPU)
                  8.36
                                   $8,000
8                                                                    8000   The green node contains 2x Intel X5550 CPUs
                                                                            (95W TDP, 4 Cores, $1000 per CPU) and 2x
7      6.7                                                           7000
                                                   $6,500                   NVIDIA M2090s as the GPU (225W TDP, $2000
                                                                            per GPU)
6                                                                    6000

5                                                                    5000

4                                                                    4000

3                                                                    3000

2                                                                    2000

1                                                                    1000

0                                                                    0
       Nanoseconds/Day                      Cost



      Save thousands of dollars and perform 25% faster
Greener Science
                                                      ADH in Water (134K Atoms)
                                                                                                      Running GROMACS 4.6 with CUDA 4.1
                                        12000
                                                                                                      The blue nodes contain 2x Intel X5550 CPUs
Energy Expended (KiloJoules Consumed)




                                                                                                      (95W TDP, 4 Cores per CPU)
                                        10000
                                                                                                      The green node contains 2x Intel X5550 CPUs,
                                                                        Lower is better               4 Cores per CPU) and 2x NVIDIA M2090s GPUs
                                        8000                                                          (225W TDP per GPU)


                                        6000



                                        4000                                                                  Energy Expended
                                                                                                              = Power x Time
                                        2000



                                            0
                                                        4 Nodes                   1 Node + 2x M2090
                                                      (760 Watts)                    (640 Watts)




                                         In simulating each nanosecond, the GPU-accelerated system uses 33% less energy
The Power of Kepler
                RNase Solvated Protein 24k Atoms
140

                                                                              Running GROMACS version 4.6 beta
120
                                                                              The grey nodes contain 1 or 2 E5-2687W CPUs
                                                                              (150W each, 8 Cores per CPU) and 1 or 2
100                                                                           NVIDIA M2090s.

                                                                              The green nodes contain 1 or 2 E5-2687W
 80                                                                           CPUs (8 Cores per CPU) and 1 or 2 NVIDIA
                                                                      M2090   K20X GPUs (235W each).
 60                                                                   K20X


 40


 20


  0
      1 CPU + 1 GPU   1 CPU + 2 GPU   2 CPU + 1 GPU   2 CPU + 2 GPU



 Upgrading an M2090 to a K20X increases performance 10-45%
                                                                                      Ribonuclease
K20X – Fast
                                 RNase Solvated Protein 24k Atoms
                    120

                                                                                          Running GROMACS version 4.6 beta
                    100
                                                                                          The blue nodes contain 1 or 2 E5-2687W CPUs
                                                                                          (150W each, 8 Cores per CPU).
                     80
Nanoseconds / Day




                                                                                          The green nodes contain 1 or 2 E5-2687W
                                                                                          CPUs (8 Cores per CPU) and 1 or 2 NVIDIA
                                                                                          K20X GPUs (235W each).
                     60                                                     CPU Only
                                                                            With 1 K20X

                     40



                     20



                      0
                                   1 CPU                   2 CPUs




                          Adding a K20X increases performance by up to 3x
                                                                                                  Ribonuclease
K20X, the Fastest Yet
                                      192K Water Molecules
                    16

                                                                                 Running GROMACS version 4.6-beta2 and
                    14                                                           CUDA 5.0.35

                    12                                                           The blue node contains 2 E5-2687W CPUs
                                                                                 (150W each, 8 Cores per CPU).
Nanoseconds / Day




                    10                                                           The green nodes contain 2 E5-2687W CPUs (8
                                                                                 Cores per CPU) and 1 or 2 NVIDIA K20X GPUs
                     8                                                           (235W each).

                     6


                     4


                     2


                     0
                               CPU              CPU + K20X       CPU + 2x K20X



                         Using K20X nodes increases performance by 2.5x
                                                                                              Water
Recommended GPU Node Configuration for
        GROMACS Computational Chemistry
                      Workstation or Single Node Configuration
             # of CPU sockets                                      2
           Cores per CPU socket                                   6+
             CPU speed (Ghz)                                    2.66+
      System memory per socket (GB)                               32
                                                        Kepler K10, K20, K20X
                  GPUs
                                                     Fermi M2090, M2075, C2075
                                                                   1x
                                       Kepler-based GPUs (K20X, K20 or K10): need fast Sandy
         # of GPUs per CPU socket
                                       Bridge or perhaps the very fastest Westmeres, or high-end
                                                            AMD Opterons
       GPU memory preference (GB)                                  6
          GPU to CPU connection                           PCIe 2.0 or higher
              Server storage                               500 GB or higher

           Network configuration                          Gemini, InfiniBand

10   Scale to multiple nodes with same single node configuration
GPU Test Drive
     Experience GPU Acceleration
     For Computational Chemistry
     Researchers, Biophysicists

     Preconfigured with Molecular
     Dynamics Apps

     Remotely Hosted GPU Servers

     Free & Easy – Sign up, Log in and
     See Results

     www.nvidia.com/gputestdrive
11

More Related Content

What's hot

Pcs hd7850_sales_kit
Pcs  hd7850_sales_kitPcs  hd7850_sales_kit
Pcs hd7850_sales_kit
PowerColor
 
Hd7950 sales kit
Hd7950 sales kitHd7950 sales kit
Hd7950 sales kit
PowerColor
 
Turbo duo hd7790 sales kit
Turbo duo hd7790 sales kitTurbo duo hd7790 sales kit
Turbo duo hd7790 sales kit
PowerColor
 
PowerColor PCS+ Vortex II sales kit
PowerColor PCS+ Vortex II sales kitPowerColor PCS+ Vortex II sales kit
PowerColor PCS+ Vortex II sales kit
PowerColor
 
Sun fire x2100 m2 and x2200 m2 technical presentation
Sun fire x2100 m2 and x2200 m2 technical presentationSun fire x2100 m2 and x2200 m2 technical presentation
Sun fire x2100 m2 and x2200 m2 technical presentation
xKinAnx
 

What's hot (18)

Mateo valero p1
Mateo valero p1Mateo valero p1
Mateo valero p1
 
Pcs hd7850_sales_kit
Pcs  hd7850_sales_kitPcs  hd7850_sales_kit
Pcs hd7850_sales_kit
 
Hd7950 sales kit
Hd7950 sales kitHd7950 sales kit
Hd7950 sales kit
 
MSI X79 OC Guide
MSI X79 OC GuideMSI X79 OC Guide
MSI X79 OC Guide
 
Insist On DrMOS v1.0
Insist On DrMOS v1.0Insist On DrMOS v1.0
Insist On DrMOS v1.0
 
Turbo duo hd7790 sales kit
Turbo duo hd7790 sales kitTurbo duo hd7790 sales kit
Turbo duo hd7790 sales kit
 
Vigor Ex
Vigor ExVigor Ex
Vigor Ex
 
HPCMPUG2011 cray tutorial
HPCMPUG2011 cray tutorialHPCMPUG2011 cray tutorial
HPCMPUG2011 cray tutorial
 
ICDE2010 Nb-GCLOCK
ICDE2010 Nb-GCLOCKICDE2010 Nb-GCLOCK
ICDE2010 Nb-GCLOCK
 
VMware - EMC vs NetApp
VMware - EMC vs NetAppVMware - EMC vs NetApp
VMware - EMC vs NetApp
 
CloudStackユーザ会〜仮想ルータの謎に迫る
CloudStackユーザ会〜仮想ルータの謎に迫るCloudStackユーザ会〜仮想ルータの謎に迫る
CloudStackユーザ会〜仮想ルータの謎に迫る
 
PowerColor PCS+ Vortex II sales kit
PowerColor PCS+ Vortex II sales kitPowerColor PCS+ Vortex II sales kit
PowerColor PCS+ Vortex II sales kit
 
Core I7
Core I7Core I7
Core I7
 
16 August 2012 - SWUG - Hyper-V in Windows 2012
16 August 2012 - SWUG - Hyper-V in Windows 201216 August 2012 - SWUG - Hyper-V in Windows 2012
16 August 2012 - SWUG - Hyper-V in Windows 2012
 
54603 vsp vs300_fl5_ccah
54603 vsp vs300_fl5_ccah54603 vsp vs300_fl5_ccah
54603 vsp vs300_fl5_ccah
 
Power7 facts and features 17 aug
Power7 facts and features 17 augPower7 facts and features 17 aug
Power7 facts and features 17 aug
 
Sun fire x2100 m2 and x2200 m2 technical presentation
Sun fire x2100 m2 and x2200 m2 technical presentationSun fire x2100 m2 and x2200 m2 technical presentation
Sun fire x2100 m2 and x2200 m2 technical presentation
 
Hyper v.nu-windows serverhyperv-networkingevolved
Hyper v.nu-windows serverhyperv-networkingevolvedHyper v.nu-windows serverhyperv-networkingevolved
Hyper v.nu-windows serverhyperv-networkingevolved
 

Viewers also liked

Thomas_Lipscomb_Maximin3_Thesis_approvedShorterAbstractSingleSpace
Thomas_Lipscomb_Maximin3_Thesis_approvedShorterAbstractSingleSpaceThomas_Lipscomb_Maximin3_Thesis_approvedShorterAbstractSingleSpace
Thomas_Lipscomb_Maximin3_Thesis_approvedShorterAbstractSingleSpace
Thomas H Lipscomb
 
Gromacs on Science Gateway
Gromacs on Science GatewayGromacs on Science Gateway
Gromacs on Science Gateway
riround
 
Molecular dynamics and Simulations
Molecular dynamics and SimulationsMolecular dynamics and Simulations
Molecular dynamics and Simulations
Abhilash Kannan
 

Viewers also liked (14)

HPC Best Practices: Application Performance Optimization
HPC Best Practices: Application Performance OptimizationHPC Best Practices: Application Performance Optimization
HPC Best Practices: Application Performance Optimization
 
Bio Linux
Bio LinuxBio Linux
Bio Linux
 
Thomas_Lipscomb_Maximin3_Thesis_approvedShorterAbstractSingleSpace
Thomas_Lipscomb_Maximin3_Thesis_approvedShorterAbstractSingleSpaceThomas_Lipscomb_Maximin3_Thesis_approvedShorterAbstractSingleSpace
Thomas_Lipscomb_Maximin3_Thesis_approvedShorterAbstractSingleSpace
 
Gromacs Tutorial
Gromacs TutorialGromacs Tutorial
Gromacs Tutorial
 
Michelle Groman, "Hot Topics at the Presidential Commission for the Study of ...
Michelle Groman, "Hot Topics at the Presidential Commission for the Study of ...Michelle Groman, "Hot Topics at the Presidential Commission for the Study of ...
Michelle Groman, "Hot Topics at the Presidential Commission for the Study of ...
 
Introduction to Electron Correlation
Introduction to Electron CorrelationIntroduction to Electron Correlation
Introduction to Electron Correlation
 
Force field analysis april2011
Force field analysis april2011Force field analysis april2011
Force field analysis april2011
 
Example of force fields
Example of force fieldsExample of force fields
Example of force fields
 
Gromacs on Science Gateway
Gromacs on Science GatewayGromacs on Science Gateway
Gromacs on Science Gateway
 
Force Field Analysis by Slideshop
Force Field Analysis by SlideshopForce Field Analysis by Slideshop
Force Field Analysis by Slideshop
 
Force field analysis
Force field analysisForce field analysis
Force field analysis
 
Molecular dynamics and Simulations
Molecular dynamics and SimulationsMolecular dynamics and Simulations
Molecular dynamics and Simulations
 
Jak napisać CV, które zapewni Ci pracę? 9 wskazówek
Jak napisać CV, które zapewni Ci pracę? 9 wskazówekJak napisać CV, które zapewni Ci pracę? 9 wskazówek
Jak napisać CV, które zapewni Ci pracę? 9 wskazówek
 
Force Field Analysis
Force  Field  AnalysisForce  Field  Analysis
Force Field Analysis
 

Similar to GROMACS Molecular Dynamics on GPU

Exaflop In 2018 Hardware
Exaflop In 2018   HardwareExaflop In 2018   Hardware
Exaflop In 2018 Hardware
Jacob Wu
 
Kindratenko hpc day 2011 Kiev
Kindratenko hpc day 2011 KievKindratenko hpc day 2011 Kiev
Kindratenko hpc day 2011 Kiev
Volodymyr Saviak
 
PG-Strom - GPU Accelerated Asyncr
PG-Strom - GPU Accelerated AsyncrPG-Strom - GPU Accelerated Asyncr
PG-Strom - GPU Accelerated Asyncr
Kohei KaiGai
 

Similar to GROMACS Molecular Dynamics on GPU (20)

NAMD Molecular Dynamics on GPU
NAMD Molecular Dynamics on GPUNAMD Molecular Dynamics on GPU
NAMD Molecular Dynamics on GPU
 
Accelerating Scientific Discovery V1
Accelerating Scientific Discovery V1Accelerating Scientific Discovery V1
Accelerating Scientific Discovery V1
 
計算力学シミュレーションに GPU は役立つのか?
計算力学シミュレーションに GPU は役立つのか?計算力学シミュレーションに GPU は役立つのか?
計算力学シミュレーションに GPU は役立つのか?
 
한컴MDS_NVIDIA Enterprise Platform
한컴MDS_NVIDIA Enterprise Platform한컴MDS_NVIDIA Enterprise Platform
한컴MDS_NVIDIA Enterprise Platform
 
Exaflop In 2018 Hardware
Exaflop In 2018   HardwareExaflop In 2018   Hardware
Exaflop In 2018 Hardware
 
Kindratenko hpc day 2011 Kiev
Kindratenko hpc day 2011 KievKindratenko hpc day 2011 Kiev
Kindratenko hpc day 2011 Kiev
 
Fujitsu PRIMERGY RX200 S7
Fujitsu PRIMERGY RX200 S7Fujitsu PRIMERGY RX200 S7
Fujitsu PRIMERGY RX200 S7
 
Compute Blades
Compute BladesCompute Blades
Compute Blades
 
Amd accelerated computing -ufrj
Amd   accelerated computing -ufrjAmd   accelerated computing -ufrj
Amd accelerated computing -ufrj
 
Workshop actualización SVG CESGA 2012
Workshop actualización SVG CESGA 2012 Workshop actualización SVG CESGA 2012
Workshop actualización SVG CESGA 2012
 
GPU for DL
GPU for DLGPU for DL
GPU for DL
 
NVIDIA GPUs Power HPC & AI Workloads in Cloud with Univa
NVIDIA GPUs Power HPC & AI Workloads in Cloud with UnivaNVIDIA GPUs Power HPC & AI Workloads in Cloud with Univa
NVIDIA GPUs Power HPC & AI Workloads in Cloud with Univa
 
Latest HPC News from NVIDIA
Latest HPC News from NVIDIALatest HPC News from NVIDIA
Latest HPC News from NVIDIA
 
BURA Supercomputer
BURA SupercomputerBURA Supercomputer
BURA Supercomputer
 
PG-Strom
PG-StromPG-Strom
PG-Strom
 
PG-Strom - GPU Accelerated Asyncr
PG-Strom - GPU Accelerated AsyncrPG-Strom - GPU Accelerated Asyncr
PG-Strom - GPU Accelerated Asyncr
 
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdf
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdfNVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdf
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdf
 
Introduction to Parallel Distributed Computer Systems
Introduction to Parallel Distributed Computer SystemsIntroduction to Parallel Distributed Computer Systems
Introduction to Parallel Distributed Computer Systems
 
Eclipse Sli Kit 1.03eu
Eclipse Sli Kit 1.03euEclipse Sli Kit 1.03eu
Eclipse Sli Kit 1.03eu
 
How To Train Your Calxeda EnergyCore
How To Train Your  Calxeda EnergyCoreHow To Train Your  Calxeda EnergyCore
How To Train Your Calxeda EnergyCore
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 

GROMACS Molecular Dynamics on GPU

  • 1. GROMACS 4.6 Pre-Beta and 4.6 Beta
  • 2. Benefits of GPU Accelerated Computing Faster than CPU only systems in all tests Large performance boost with marginal price increase Energy usage cut by more than half GPUs scale well within a node and over multiple nodes K20 GPU is our fastest and lowest power high performance GPU yet Try GPU accelerated GROMACS for free – www.nvidia.com/GPUTestDrive 2
  • 3. Great Scaling in Small Systems 25.00 Running GROMACS 4.6 pre-beta with CUDA 4.1 21.68 Each blue node contains 1x Intel X5550 CPU 20.00 3.2x (95W TDP, 4 Cores per CPU) 3.2x Each green node contains 1x Intel X5550 CPU Nanoseconds / Day (95W TDP, 4 Cores per CPU) and 1x NVIDIA 15.00 M2090 (225W TDP per GPU) 13.01 CPU Only 10.00 3.6x With GPU 8.36 3.6x 5.00 3.7x Benchmark systems: RNAse in water with 16,816 atoms in truncated dodecahedron box 0.00 1 2 3 Number of Nodes Get up to 3.7x performance compared to CPU-only nodes
  • 4. Additional Strong Scaling on Larger System 128K Water Molecules 160 Running GROMACS 4.6 pre-beta with CUDA 4.1 Each blue node contains 1x Intel X5670 (95W 140 TDP, 6 Cores per CPU) 120 Each green node contains 1x Intel X5670 (95W 2x TDP, 6 Cores per CPU) and 1x NVIDIA M2070 Nanoseconds / Day 100 (225W TDP per GPU) 80 CPU Only 60 With GPU 2.8x 40 20 3.1x 0 8 16 32 64 128 Number of Nodes Up to 128 nodes, NVIDIA GPU-accelerated nodes deliver 2-3x performance when compared to CPU-only nodes
  • 5. Replace 3 Nodes with 2 GPUs Running GROMACS 4.6 pre-beta with CUDA 4.1 ADH in Water (134K Atoms) The blue node contains 2x Intel X5550 CPUs 9 4 CPU Nodes 9000 (95W TDP, 4 Cores, $1000 per CPU) 8.36 $8,000 8 8000 The green node contains 2x Intel X5550 CPUs (95W TDP, 4 Cores, $1000 per CPU) and 2x 7 6.7 7000 $6,500 NVIDIA M2090s as the GPU (225W TDP, $2000 per GPU) 6 6000 5 5000 4 4000 3 3000 2 2000 1 1000 0 0 Nanoseconds/Day Cost Save thousands of dollars and perform 25% faster
  • 6. Greener Science ADH in Water (134K Atoms) Running GROMACS 4.6 with CUDA 4.1 12000 The blue nodes contain 2x Intel X5550 CPUs Energy Expended (KiloJoules Consumed) (95W TDP, 4 Cores per CPU) 10000 The green node contains 2x Intel X5550 CPUs, Lower is better 4 Cores per CPU) and 2x NVIDIA M2090s GPUs 8000 (225W TDP per GPU) 6000 4000 Energy Expended = Power x Time 2000 0 4 Nodes 1 Node + 2x M2090 (760 Watts) (640 Watts) In simulating each nanosecond, the GPU-accelerated system uses 33% less energy
  • 7. The Power of Kepler RNase Solvated Protein 24k Atoms 140 Running GROMACS version 4.6 beta 120 The grey nodes contain 1 or 2 E5-2687W CPUs (150W each, 8 Cores per CPU) and 1 or 2 100 NVIDIA M2090s. The green nodes contain 1 or 2 E5-2687W 80 CPUs (8 Cores per CPU) and 1 or 2 NVIDIA M2090 K20X GPUs (235W each). 60 K20X 40 20 0 1 CPU + 1 GPU 1 CPU + 2 GPU 2 CPU + 1 GPU 2 CPU + 2 GPU Upgrading an M2090 to a K20X increases performance 10-45% Ribonuclease
  • 8. K20X – Fast RNase Solvated Protein 24k Atoms 120 Running GROMACS version 4.6 beta 100 The blue nodes contain 1 or 2 E5-2687W CPUs (150W each, 8 Cores per CPU). 80 Nanoseconds / Day The green nodes contain 1 or 2 E5-2687W CPUs (8 Cores per CPU) and 1 or 2 NVIDIA K20X GPUs (235W each). 60 CPU Only With 1 K20X 40 20 0 1 CPU 2 CPUs Adding a K20X increases performance by up to 3x Ribonuclease
  • 9. K20X, the Fastest Yet 192K Water Molecules 16 Running GROMACS version 4.6-beta2 and 14 CUDA 5.0.35 12 The blue node contains 2 E5-2687W CPUs (150W each, 8 Cores per CPU). Nanoseconds / Day 10 The green nodes contain 2 E5-2687W CPUs (8 Cores per CPU) and 1 or 2 NVIDIA K20X GPUs 8 (235W each). 6 4 2 0 CPU CPU + K20X CPU + 2x K20X Using K20X nodes increases performance by 2.5x Water
  • 10. Recommended GPU Node Configuration for GROMACS Computational Chemistry Workstation or Single Node Configuration # of CPU sockets 2 Cores per CPU socket 6+ CPU speed (Ghz) 2.66+ System memory per socket (GB) 32 Kepler K10, K20, K20X GPUs Fermi M2090, M2075, C2075 1x Kepler-based GPUs (K20X, K20 or K10): need fast Sandy # of GPUs per CPU socket Bridge or perhaps the very fastest Westmeres, or high-end AMD Opterons GPU memory preference (GB) 6 GPU to CPU connection PCIe 2.0 or higher Server storage 500 GB or higher Network configuration Gemini, InfiniBand 10 Scale to multiple nodes with same single node configuration
  • 11. GPU Test Drive Experience GPU Acceleration For Computational Chemistry Researchers, Biophysicists Preconfigured with Molecular Dynamics Apps Remotely Hosted GPU Servers Free & Easy – Sign up, Log in and See Results www.nvidia.com/gputestdrive 11

Editor's Notes

  1. Nodes CPU only gpu1 2.26 8.362 3.58 13.014 6.7 21.68
  2. Nodes CPU GPU86.61320.3351611.28237.01632 23.06763.8766442.28496.62812872.694 144.424
  3. nanoseconds/day8 X5550 6.72M2090+2X5550 8.36CPU Node: 4 X 2 X $1000 = $8000CPU + GPU Node: 1 X 2 X $1000 + 2 X $2000 = $6000
  4. GPU: 640 (watts) * 10,334 (seconds/nanosecond) = 6.6 MegaJoulesCPU: 760 (watts) * 12,895 (seconds/nanosecond) = 9.8 MegaJoules
  5. Before we end this session I would like to tell you about GPU Test Drive. It is an excellent resource for computational chemistry researchers such as yourself to evaluate benefits of GPU computing in speeding up your simulations. Most importantly it is free.NVIDIA along with its partners is offering access to remotely hosted GPU cluster. You can run applications such as AMBER and NAMD to find out how your models speed up. You can also try code that you have developed to run on GPU and see how it scales on a 8 GPU cluster. All you need to do is sign up and log in – it is really that easy! We have several partners who are demonstrating the GPU Test Drive on the GTC show floor. Please plan on visiting them.Sign up forms have been given out. If you are interested please fill them out and return them to me.