SlideShare ist ein Scribd-Unternehmen logo
1 von 11
NAMD 2.9
Summary/Conclusions
Benefits of GPU Accelerated Computing
Faster than CPU only systems in all tests

Large performance boost with small marginal price increase

Energy usage cut in half

GPUs scale very well within a node and over multiple nodes

Tesla K20 GPU is our fastest and lowest power high performance GPU to date

      Try GPU accelerated NAMD for free – www.nvidia.com/GPUTestDrive
Kepler - Our Fastest Family of GPUs Yet
                      4.50
                                                         ApoA1                                                   Running NAMD version 2.9
                                                                                                   4.00
                      4.00                                                                                       The blue node contains Dual E5-2687W CPUs
                                                                                  3.57                           (8 Cores per CPU).
                                                               3.45
                      3.50
                                                                                                                 The green nodes contain Dual E5-2687W CPUs (8
                                                                                                  2.9x           Cores per CPU) and either 1x NVIDIA M2090, 1x K10
                      3.00                                                                                       or 1x K20 for the GPU
    Nanoseconds/Day




                                              2.63
                                                                                2.6x
                      2.50

                                                             2.5x
                      2.00


                      1.50      1.37        1.9x

                      1.00


                      0.50


                      0.00
                             1 CPU Node   1 CPU Node +   1 CPU Node + K10   1 CPU Node + K20 1 CPU Node + K20X
                                                                                                                                       Apolipoprotein A1
                                             M2090


                 GPU speedup/throughput increased from 1.9x (with M2090) to 2.9x (with K20X)
                 when compared to a CPU only node
3                                                                                                                NAMD Benchmark Report, Revision 2.0, dated Nov. 5, 2012
Run NAMD 2.5x Faster with GPUs
                                    3
                                                                                                     Running NAMD 2.9 with CUDA 4.0 ECC Off
                                                                                        2.7
                                                              2.6
                                                                                                     The blue node contains 2x Intel E5-2687W CPUs
                                   2.5                                     2.4
                                                                                                     (8 Cores per CPU)
    Speedup Compared to CPU Only




                                                                                                     Each green node contains 2x Intel E5-2687W
                                    2                                                                CPUs (8 Cores per CPU) plus 1x NVIDIA K20 GPUs



                                   1.5



                                    1



                                   0.5



                                    0
                                         CPU All Molecules   ApoA1      F1-ATPase       STMV
                                                                                                                       Apolipoprotein A1

                                         Gain 2.5x throughput/performance by adding just 1 GPU
                                         when compared to dual CPU performance

4                                                                                                NAMD Benchmark Report, Revision 2.0, dated Nov. 5, 2012
Kepler – Universally Faster
                               6
                                                                                                                               Running NAMD version 2.9

                                                                                                                               The CPU Only node contains Dual E5-2687W CPUs
                               5                                                                                               (8 Cores per CPU).
Speedup Compared to CPU Only




                                                                                                      5.1x                     The Kepler nodes contain Dual E5-2687W CPUs (8
                               4                                                            4.7x                               Cores per CPU) and 1 or two NVIDIA K10, K20, or
                                                                                                                               K20X GPUs.
                                                                                  4.3x
                                                                                                                   F1-ATPase
                               3
                                                                                                                   ApoA1
                                                                                                                   STMV
                                                                        2.9x
                               2
                                                           2.6x
                                                  2.4x

                               1



                               0
                                   CPU Only       1x K10   1x K20      1x K20X    2x K10    2x K20   2x K20X
                                                                                                                                           F1-ATPase
                                              |                     Kepler nodes use Dual CPUs                 |

                                        The Kepler GPUs accelerate all simulations, up to 5x
                                        Average acceleration printed in bars
Outstanding Strong Scaling with Multi-STMV
                                                                                            Running NAMD version 2.9
                                                                                            Each blue XE6 CPU node contains 1x AMD
                                     100 STMV on Hundreds of Nodes                          1600 Opteron (16 Cores per CPU).
                    1.2

                                  Fermi XK6                                                 Each green XK6 CPU+GPU node contains
                                                                                            1x AMD 1600 Opteron (16 Cores per CPU)
                     1                                                                      and an additional 1x NVIDIA X2090 GPU.
                                  CPU XK6
                                                                                     2.7x
Nanoseconds / Day




                    0.8

                                                                      2.9x
                    0.6



                    0.4



                    0.2
                                                3.6x
                          3.8x                                                                       Concatenation of 100
                     0                                                                           Satellite Tobacco Mosaic Virus
                             32      64       128          256      512      640   768
                                                       # of Nodes


                    Accelerate your science by 2.7-3.8x when compared to CPU-based supercomputers
Replace 3 Nodes with 1 2090 GPU
                                                                                     Running NAMD version 2.9
                                                                                     Each blue node contains 2x Intel Xeon X5550 CPUs
                           F1-ATPase                                                 (4 Cores, $1000 per CPU).
                                                                4 CPU Nodes
0.8                                                                           9000
                    0.74                                                             The green node contains 2x Intel Xeon X5550 CPUs
                                       $8,000
                                                                1 CPU Node +8000     (4 Cores, $1000 per CPU) and 1x NVIDIA M2090 GPU
0.7                                                             1x M2090 GPUs
         0.63                                                                        ($2000 each)
                                                                              7000
0.6
                                                                              6000
0.5
                                                                              5000
0.4                                                    $4,000
                                                                              4000
0.3
                                                                              3000
0.2
                                                                              2000

0.1                                                                           1000

 0                                                                            0                       F1-ATPase
         Nanoseconds/Day                        Cost




         Speedup of 1.2x for 50% the cost
K20 - Greener: Twice The Science Per Watt
                           1200000
                                     Energy Used in Simulating 1 Nanosecond of ApoA1
                                                                                                       Running NAMD version 2.9
                           1000000                                                                     Each blue node contains Dual E5-2687W
                                                                                                       CPUs (95W, 4 Cores per CPU).

                                                                                                       Each green node contains 2x Intel Xeon X5550
    Energy Expended (kJ)




                            800000
                                                                                                       CPUs (95W, 4 Cores per CPU) and 2x NVIDIA
                                                            Lower is better                            K20 GPUs (225W per GPU)

                            600000


                                                                                                              Energy Expended
                            400000
                                                                                                              = Power x Time

                            200000



                                 0
                                             1 Node                           1 Node + 2x K20


                                         Cut down energy usage by ½ with GPUs

8                                                                                               NAMD Benchmark Report, Revision 2.0, dated Nov. 5, 2012
Kepler - Greener: Twice The Science/Joule
                                Energy used in simulating 1 ns of SMTV
                       250000
                                                                                              Running NAMD version 2.9

                                                                                              The blue node contains Dual E5-2687W CPUs
                       200000                                                                 (150W each, 8 Cores per CPU).
Energy Expended (kJ)




                                                                            Lower is better   The green nodes contain Dual E5-2687W CPUs
                                                                                              (8 Cores per CPU) and 2x NVIDIA K10, K20, or
                       150000
                                                                                              K20X GPUs (235W each).

                                                                                                   Energy Expended
                       100000
                                                                                                   = Power x Time

                        50000



                            0
                                CPU Only      CPU + 2 K10s   CPU + 2 K20s     CPU + 2 K20Xs




                                       Cut down energy usage by ½ with GPUs

                                                                                                   Satellite Tobacco Mosaic Virus
Recommended GPU Node Configuration for
         NAMD Computational Chemistry
                   Workstation or Single Node Configuration
                      # of CPU sockets                                  2
                    Cores per CPU socket                               6+
                      CPU speed (Ghz)                                2.66+
               System memory per socket (GB)                           32
                                                             Kepler K10, K20, K20X
                           GPUs
                                                          Fermi M2090, M2075, C2075
                  # of GPUs per CPU socket                            1-2
                GPU memory preference (GB)                              6
                   GPU to CPU connection                       PCIe 2.0 or higher

                       Server storage                          500 GB or higher

                    Network configuration                      Gemini, InfiniBand


10   Scale to multiple nodes with same single node configuration    NAMD Benchmark Report, Revision 2.0, dated Nov. 5, 2012
Summary/Conclusions
     Benefits of GPU Accelerated Computing
     Faster than CPU only systems in all tests

     Large performance boost with small marginal price increase

     Energy usage cut in half

     GPUs scale very well within a node and over multiple nodes

     Tesla K20 GPU is our fastest and lowest power high performance GPU to date

           Try GPU accelerated NAMD for free – www.nvidia.com/GPUTestDrive
11                                                           NAMD Benchmark Report, Revision 2.0, dated Nov. 5, 2012

Weitere ähnliche Inhalte

Was ist angesagt?

Java Standard Edition 5 Performance
Java Standard Edition 5 PerformanceJava Standard Edition 5 Performance
Java Standard Edition 5 Performancewhite paper
 
MSI N480GTX Lightning Infokit
MSI N480GTX Lightning InfokitMSI N480GTX Lightning Infokit
MSI N480GTX Lightning InfokitMSI
 
Performance of three Intel-based SMB servers running Web, email, and database...
Performance of three Intel-based SMB servers running Web, email, and database...Performance of three Intel-based SMB servers running Web, email, and database...
Performance of three Intel-based SMB servers running Web, email, and database...Principled Technologies
 
Intel speed-select-technology-base-frequency-enhancing-performance
Intel speed-select-technology-base-frequency-enhancing-performanceIntel speed-select-technology-base-frequency-enhancing-performance
Intel speed-select-technology-base-frequency-enhancing-performanceVijaianand Sundaramoorthy
 
ICDE2010 Nb-GCLOCK
ICDE2010 Nb-GCLOCKICDE2010 Nb-GCLOCK
ICDE2010 Nb-GCLOCKMakoto Yui
 
Exaflop In 2018 Hardware
Exaflop In 2018   HardwareExaflop In 2018   Hardware
Exaflop In 2018 HardwareJacob Wu
 
intel speed-select-technology-base-frequency-enhancing-performance
intel speed-select-technology-base-frequency-enhancing-performanceintel speed-select-technology-base-frequency-enhancing-performance
intel speed-select-technology-base-frequency-enhancing-performanceDESMOND YUEN
 
Hd7950 sales kit
Hd7950 sales kitHd7950 sales kit
Hd7950 sales kitPowerColor
 
Pcs hd7850_sales_kit
Pcs  hd7850_sales_kitPcs  hd7850_sales_kit
Pcs hd7850_sales_kitPowerColor
 
Better email response time using Microsoft Exchange 2013 with the Dell PowerE...
Better email response time using Microsoft Exchange 2013 with the Dell PowerE...Better email response time using Microsoft Exchange 2013 with the Dell PowerE...
Better email response time using Microsoft Exchange 2013 with the Dell PowerE...Principled Technologies
 
3rd Generation Intel® Xeon® Scalable Processor - Achieving 1 Tbps IPsec with ...
3rd Generation Intel® Xeon® Scalable Processor - Achieving 1 Tbps IPsec with ...3rd Generation Intel® Xeon® Scalable Processor - Achieving 1 Tbps IPsec with ...
3rd Generation Intel® Xeon® Scalable Processor - Achieving 1 Tbps IPsec with ...DESMOND YUEN
 
Accelerate Game Development and Enhance Game Experience with Intel® Optane™ T...
Accelerate Game Development and Enhance Game Experience with Intel® Optane™ T...Accelerate Game Development and Enhance Game Experience with Intel® Optane™ T...
Accelerate Game Development and Enhance Game Experience with Intel® Optane™ T...Intel® Software
 
Improve deep learning inference  performance with Microsoft Azure Esv4 VMs wi...
Improve deep learning inference  performance with Microsoft Azure Esv4 VMs wi...Improve deep learning inference  performance with Microsoft Azure Esv4 VMs wi...
Improve deep learning inference  performance with Microsoft Azure Esv4 VMs wi...Principled Technologies
 
Cascade lake-advanced-performance-press-deck
Cascade lake-advanced-performance-press-deckCascade lake-advanced-performance-press-deck
Cascade lake-advanced-performance-press-deckDESMOND YUEN
 
Percona XtraDB 集群文档
Percona XtraDB 集群文档Percona XtraDB 集群文档
Percona XtraDB 集群文档YUCHENG HU
 

Was ist angesagt? (20)

Java Standard Edition 5 Performance
Java Standard Edition 5 PerformanceJava Standard Edition 5 Performance
Java Standard Edition 5 Performance
 
Ron perrot
Ron perrotRon perrot
Ron perrot
 
Mateo valero p1
Mateo valero p1Mateo valero p1
Mateo valero p1
 
Mateo valero p2
Mateo valero p2Mateo valero p2
Mateo valero p2
 
MSI N480GTX Lightning Infokit
MSI N480GTX Lightning InfokitMSI N480GTX Lightning Infokit
MSI N480GTX Lightning Infokit
 
Performance of three Intel-based SMB servers running Web, email, and database...
Performance of three Intel-based SMB servers running Web, email, and database...Performance of three Intel-based SMB servers running Web, email, and database...
Performance of three Intel-based SMB servers running Web, email, and database...
 
Intel speed-select-technology-base-frequency-enhancing-performance
Intel speed-select-technology-base-frequency-enhancing-performanceIntel speed-select-technology-base-frequency-enhancing-performance
Intel speed-select-technology-base-frequency-enhancing-performance
 
ICDE2010 Nb-GCLOCK
ICDE2010 Nb-GCLOCKICDE2010 Nb-GCLOCK
ICDE2010 Nb-GCLOCK
 
Exaflop In 2018 Hardware
Exaflop In 2018   HardwareExaflop In 2018   Hardware
Exaflop In 2018 Hardware
 
Energy efficient storage in vm
Energy efficient storage in vmEnergy efficient storage in vm
Energy efficient storage in vm
 
intel speed-select-technology-base-frequency-enhancing-performance
intel speed-select-technology-base-frequency-enhancing-performanceintel speed-select-technology-base-frequency-enhancing-performance
intel speed-select-technology-base-frequency-enhancing-performance
 
Hd7950 sales kit
Hd7950 sales kitHd7950 sales kit
Hd7950 sales kit
 
Pcs hd7850_sales_kit
Pcs  hd7850_sales_kitPcs  hd7850_sales_kit
Pcs hd7850_sales_kit
 
OpenDBCamp Virtualization
OpenDBCamp VirtualizationOpenDBCamp Virtualization
OpenDBCamp Virtualization
 
Better email response time using Microsoft Exchange 2013 with the Dell PowerE...
Better email response time using Microsoft Exchange 2013 with the Dell PowerE...Better email response time using Microsoft Exchange 2013 with the Dell PowerE...
Better email response time using Microsoft Exchange 2013 with the Dell PowerE...
 
3rd Generation Intel® Xeon® Scalable Processor - Achieving 1 Tbps IPsec with ...
3rd Generation Intel® Xeon® Scalable Processor - Achieving 1 Tbps IPsec with ...3rd Generation Intel® Xeon® Scalable Processor - Achieving 1 Tbps IPsec with ...
3rd Generation Intel® Xeon® Scalable Processor - Achieving 1 Tbps IPsec with ...
 
Accelerate Game Development and Enhance Game Experience with Intel® Optane™ T...
Accelerate Game Development and Enhance Game Experience with Intel® Optane™ T...Accelerate Game Development and Enhance Game Experience with Intel® Optane™ T...
Accelerate Game Development and Enhance Game Experience with Intel® Optane™ T...
 
Improve deep learning inference  performance with Microsoft Azure Esv4 VMs wi...
Improve deep learning inference  performance with Microsoft Azure Esv4 VMs wi...Improve deep learning inference  performance with Microsoft Azure Esv4 VMs wi...
Improve deep learning inference  performance with Microsoft Azure Esv4 VMs wi...
 
Cascade lake-advanced-performance-press-deck
Cascade lake-advanced-performance-press-deckCascade lake-advanced-performance-press-deck
Cascade lake-advanced-performance-press-deck
 
Percona XtraDB 集群文档
Percona XtraDB 集群文档Percona XtraDB 集群文档
Percona XtraDB 集群文档
 

Ähnlich wie NAMD Molecular Dynamics on GPU

PG-Strom - GPU Accelerated Asyncr
PG-Strom - GPU Accelerated AsyncrPG-Strom - GPU Accelerated Asyncr
PG-Strom - GPU Accelerated AsyncrKohei KaiGai
 
Accelerating Scientific Discovery V1
Accelerating Scientific Discovery V1Accelerating Scientific Discovery V1
Accelerating Scientific Discovery V1Shanker Trivedi
 
Symposium on HPC Applications – IIT Kanpur
Symposium on HPC Applications – IIT KanpurSymposium on HPC Applications – IIT Kanpur
Symposium on HPC Applications – IIT KanpurRishi Pathak
 
Core 2 Duo Processor
Core 2 Duo ProcessorCore 2 Duo Processor
Core 2 Duo ProcessorKashif Latif
 
Multi core processors
Multi core processorsMulti core processors
Multi core processorsAdithya Bhat
 
Kindratenko hpc day 2011 Kiev
Kindratenko hpc day 2011 KievKindratenko hpc day 2011 Kiev
Kindratenko hpc day 2011 KievVolodymyr Saviak
 
In-Network Acceleration with FPGA (MEMO)
In-Network Acceleration with FPGA (MEMO)In-Network Acceleration with FPGA (MEMO)
In-Network Acceleration with FPGA (MEMO)Naoto MATSUMOTO
 
Intel new processors
Intel new processorsIntel new processors
Intel new processorszaid_b
 
Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer
Introduction to National Supercomputer center in Tianjin TH-1A SupercomputerIntroduction to National Supercomputer center in Tianjin TH-1A Supercomputer
Introduction to National Supercomputer center in Tianjin TH-1A SupercomputerFörderverein Technische Fakultät
 
Threading Successes 06 Allegorithmic
Threading Successes 06   AllegorithmicThreading Successes 06   Allegorithmic
Threading Successes 06 Allegorithmicguest40fc7cd
 
Q&a on running the elastic stack on kubernetes
Q&a on running the elastic stack on kubernetesQ&a on running the elastic stack on kubernetes
Q&a on running the elastic stack on kubernetesDaliya Spasova
 
Cuda 6 performance_report
Cuda 6 performance_reportCuda 6 performance_report
Cuda 6 performance_reportMichael Zhang
 
Boyang gao gpu k-means_gmm_final_v1
Boyang gao gpu k-means_gmm_final_v1Boyang gao gpu k-means_gmm_final_v1
Boyang gao gpu k-means_gmm_final_v1Gao Boyang
 
Rocketick accelerated verilog simulations
Rocketick  accelerated verilog simulationsRocketick  accelerated verilog simulations
Rocketick accelerated verilog simulationschiportal
 
Introduction to Parallel Distributed Computer Systems
Introduction to Parallel Distributed Computer SystemsIntroduction to Parallel Distributed Computer Systems
Introduction to Parallel Distributed Computer SystemsMrMaKKaWi
 
Bobcat to jaguar_v2
Bobcat to jaguar_v2Bobcat to jaguar_v2
Bobcat to jaguar_v2AMDPhil
 

Ähnlich wie NAMD Molecular Dynamics on GPU (20)

PG-Strom - GPU Accelerated Asyncr
PG-Strom - GPU Accelerated AsyncrPG-Strom - GPU Accelerated Asyncr
PG-Strom - GPU Accelerated Asyncr
 
Accelerating Scientific Discovery V1
Accelerating Scientific Discovery V1Accelerating Scientific Discovery V1
Accelerating Scientific Discovery V1
 
Symposium on HPC Applications – IIT Kanpur
Symposium on HPC Applications – IIT KanpurSymposium on HPC Applications – IIT Kanpur
Symposium on HPC Applications – IIT Kanpur
 
Core 2 Duo Processor
Core 2 Duo ProcessorCore 2 Duo Processor
Core 2 Duo Processor
 
11.pdf
11.pdf11.pdf
11.pdf
 
11.pptx
11.pptx11.pptx
11.pptx
 
Multi core processors
Multi core processorsMulti core processors
Multi core processors
 
Kindratenko hpc day 2011 Kiev
Kindratenko hpc day 2011 KievKindratenko hpc day 2011 Kiev
Kindratenko hpc day 2011 Kiev
 
Nvidia Cuda Apps Jun27 11
Nvidia Cuda Apps Jun27 11Nvidia Cuda Apps Jun27 11
Nvidia Cuda Apps Jun27 11
 
Introduction to GPU Programming
Introduction to GPU ProgrammingIntroduction to GPU Programming
Introduction to GPU Programming
 
In-Network Acceleration with FPGA (MEMO)
In-Network Acceleration with FPGA (MEMO)In-Network Acceleration with FPGA (MEMO)
In-Network Acceleration with FPGA (MEMO)
 
Intel new processors
Intel new processorsIntel new processors
Intel new processors
 
Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer
Introduction to National Supercomputer center in Tianjin TH-1A SupercomputerIntroduction to National Supercomputer center in Tianjin TH-1A Supercomputer
Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer
 
Threading Successes 06 Allegorithmic
Threading Successes 06   AllegorithmicThreading Successes 06   Allegorithmic
Threading Successes 06 Allegorithmic
 
Q&a on running the elastic stack on kubernetes
Q&a on running the elastic stack on kubernetesQ&a on running the elastic stack on kubernetes
Q&a on running the elastic stack on kubernetes
 
Cuda 6 performance_report
Cuda 6 performance_reportCuda 6 performance_report
Cuda 6 performance_report
 
Boyang gao gpu k-means_gmm_final_v1
Boyang gao gpu k-means_gmm_final_v1Boyang gao gpu k-means_gmm_final_v1
Boyang gao gpu k-means_gmm_final_v1
 
Rocketick accelerated verilog simulations
Rocketick  accelerated verilog simulationsRocketick  accelerated verilog simulations
Rocketick accelerated verilog simulations
 
Introduction to Parallel Distributed Computer Systems
Introduction to Parallel Distributed Computer SystemsIntroduction to Parallel Distributed Computer Systems
Introduction to Parallel Distributed Computer Systems
 
Bobcat to jaguar_v2
Bobcat to jaguar_v2Bobcat to jaguar_v2
Bobcat to jaguar_v2
 

NAMD Molecular Dynamics on GPU

  • 2. Summary/Conclusions Benefits of GPU Accelerated Computing Faster than CPU only systems in all tests Large performance boost with small marginal price increase Energy usage cut in half GPUs scale very well within a node and over multiple nodes Tesla K20 GPU is our fastest and lowest power high performance GPU to date Try GPU accelerated NAMD for free – www.nvidia.com/GPUTestDrive
  • 3. Kepler - Our Fastest Family of GPUs Yet 4.50 ApoA1 Running NAMD version 2.9 4.00 4.00 The blue node contains Dual E5-2687W CPUs 3.57 (8 Cores per CPU). 3.45 3.50 The green nodes contain Dual E5-2687W CPUs (8 2.9x Cores per CPU) and either 1x NVIDIA M2090, 1x K10 3.00 or 1x K20 for the GPU Nanoseconds/Day 2.63 2.6x 2.50 2.5x 2.00 1.50 1.37 1.9x 1.00 0.50 0.00 1 CPU Node 1 CPU Node + 1 CPU Node + K10 1 CPU Node + K20 1 CPU Node + K20X Apolipoprotein A1 M2090 GPU speedup/throughput increased from 1.9x (with M2090) to 2.9x (with K20X) when compared to a CPU only node 3 NAMD Benchmark Report, Revision 2.0, dated Nov. 5, 2012
  • 4. Run NAMD 2.5x Faster with GPUs 3 Running NAMD 2.9 with CUDA 4.0 ECC Off 2.7 2.6 The blue node contains 2x Intel E5-2687W CPUs 2.5 2.4 (8 Cores per CPU) Speedup Compared to CPU Only Each green node contains 2x Intel E5-2687W 2 CPUs (8 Cores per CPU) plus 1x NVIDIA K20 GPUs 1.5 1 0.5 0 CPU All Molecules ApoA1 F1-ATPase STMV Apolipoprotein A1 Gain 2.5x throughput/performance by adding just 1 GPU when compared to dual CPU performance 4 NAMD Benchmark Report, Revision 2.0, dated Nov. 5, 2012
  • 5. Kepler – Universally Faster 6 Running NAMD version 2.9 The CPU Only node contains Dual E5-2687W CPUs 5 (8 Cores per CPU). Speedup Compared to CPU Only 5.1x The Kepler nodes contain Dual E5-2687W CPUs (8 4 4.7x Cores per CPU) and 1 or two NVIDIA K10, K20, or K20X GPUs. 4.3x F1-ATPase 3 ApoA1 STMV 2.9x 2 2.6x 2.4x 1 0 CPU Only 1x K10 1x K20 1x K20X 2x K10 2x K20 2x K20X F1-ATPase | Kepler nodes use Dual CPUs | The Kepler GPUs accelerate all simulations, up to 5x Average acceleration printed in bars
  • 6. Outstanding Strong Scaling with Multi-STMV Running NAMD version 2.9 Each blue XE6 CPU node contains 1x AMD 100 STMV on Hundreds of Nodes 1600 Opteron (16 Cores per CPU). 1.2 Fermi XK6 Each green XK6 CPU+GPU node contains 1x AMD 1600 Opteron (16 Cores per CPU) 1 and an additional 1x NVIDIA X2090 GPU. CPU XK6 2.7x Nanoseconds / Day 0.8 2.9x 0.6 0.4 0.2 3.6x 3.8x Concatenation of 100 0 Satellite Tobacco Mosaic Virus 32 64 128 256 512 640 768 # of Nodes Accelerate your science by 2.7-3.8x when compared to CPU-based supercomputers
  • 7. Replace 3 Nodes with 1 2090 GPU Running NAMD version 2.9 Each blue node contains 2x Intel Xeon X5550 CPUs F1-ATPase (4 Cores, $1000 per CPU). 4 CPU Nodes 0.8 9000 0.74 The green node contains 2x Intel Xeon X5550 CPUs $8,000 1 CPU Node +8000 (4 Cores, $1000 per CPU) and 1x NVIDIA M2090 GPU 0.7 1x M2090 GPUs 0.63 ($2000 each) 7000 0.6 6000 0.5 5000 0.4 $4,000 4000 0.3 3000 0.2 2000 0.1 1000 0 0 F1-ATPase Nanoseconds/Day Cost Speedup of 1.2x for 50% the cost
  • 8. K20 - Greener: Twice The Science Per Watt 1200000 Energy Used in Simulating 1 Nanosecond of ApoA1 Running NAMD version 2.9 1000000 Each blue node contains Dual E5-2687W CPUs (95W, 4 Cores per CPU). Each green node contains 2x Intel Xeon X5550 Energy Expended (kJ) 800000 CPUs (95W, 4 Cores per CPU) and 2x NVIDIA Lower is better K20 GPUs (225W per GPU) 600000 Energy Expended 400000 = Power x Time 200000 0 1 Node 1 Node + 2x K20 Cut down energy usage by ½ with GPUs 8 NAMD Benchmark Report, Revision 2.0, dated Nov. 5, 2012
  • 9. Kepler - Greener: Twice The Science/Joule Energy used in simulating 1 ns of SMTV 250000 Running NAMD version 2.9 The blue node contains Dual E5-2687W CPUs 200000 (150W each, 8 Cores per CPU). Energy Expended (kJ) Lower is better The green nodes contain Dual E5-2687W CPUs (8 Cores per CPU) and 2x NVIDIA K10, K20, or 150000 K20X GPUs (235W each). Energy Expended 100000 = Power x Time 50000 0 CPU Only CPU + 2 K10s CPU + 2 K20s CPU + 2 K20Xs Cut down energy usage by ½ with GPUs Satellite Tobacco Mosaic Virus
  • 10. Recommended GPU Node Configuration for NAMD Computational Chemistry Workstation or Single Node Configuration # of CPU sockets 2 Cores per CPU socket 6+ CPU speed (Ghz) 2.66+ System memory per socket (GB) 32 Kepler K10, K20, K20X GPUs Fermi M2090, M2075, C2075 # of GPUs per CPU socket 1-2 GPU memory preference (GB) 6 GPU to CPU connection PCIe 2.0 or higher Server storage 500 GB or higher Network configuration Gemini, InfiniBand 10 Scale to multiple nodes with same single node configuration NAMD Benchmark Report, Revision 2.0, dated Nov. 5, 2012
  • 11. Summary/Conclusions Benefits of GPU Accelerated Computing Faster than CPU only systems in all tests Large performance boost with small marginal price increase Energy usage cut in half GPUs scale very well within a node and over multiple nodes Tesla K20 GPU is our fastest and lowest power high performance GPU to date Try GPU accelerated NAMD for free – www.nvidia.com/GPUTestDrive 11 NAMD Benchmark Report, Revision 2.0, dated Nov. 5, 2012

Hinweis der Redaktion

  1. ns/dayDual E5-2687W CPUs 1.370Dual E5-2687W CPUs + M2090 2.632Dual E5-2687W CPUs + K10 3.448Dual E5-2687W CPUs + K20 3.571Dual E5-2687W CPUs + K20X 4.000
  2. cpu ns/day gpu ns/dayApoA1 1.370 3.571F1-ATPase 0.461 1.124STMV 0.116 0.314ECC off
  3. All #s are days/ns apoa1atpasestmvCPU Only0.732.178.641x K100.291.043.51x K200.28 0.893.181x K20X0.250.82.872x K100.160.561.932x K200.150.491.772x K20X0.140.451.63
  4. 32 64 128 256 512 640 768s/step GPU XK6 1.2414 0.660887 0.342743 0.199465 0.10837 0.089752 0.0774948s/step CPU XK6 4.62633 2.36707 1.19722 0.609124 0.314745 0.255016 0.209511ns/day Fermi XK6 0.069599 0.13073339 0.252084 0.433159 0.797269 0.962655 1.114913517ns/day CPU XK6 0.018676 0.03650082 0.072167 0.141843 0.274508 0.338802 0.412388848
  5. Config: TDP sec/ns energy 2x E5-2687W 150 63,072.0 9,460,800.0 2x E5-2687W+ 2x K20 600 24,192.0 14,515,200 TDP = Thermal Design Power
  6. ns/day tdp energyCpu .115 300 223kK10s .518 770 128kK20s .565 770 117kK20xs .613 770 108k